Yes, you are correct I missed that part of the run

As you can see below MatScale calls only BLAS dscal() there is really no way to 
make that go faster.

How big is the matrix.

What are you doing with the matrix after you scale it? The only way to improve 
the time is to find some way to scale it less often.

It is curious that VecScale has a much higher flop rate when it has the same 
code see below. Unless the matrices are tiny I would expect similar flop rates.

  Barry




PetscErrorCode MatScale_SeqAIJ(Mat inA,PetscScalar alpha)
{
  Mat_SeqAIJ     *a     = (Mat_SeqAIJ*)inA->data;
  PetscScalar    oalpha = alpha;
  PetscErrorCode ierr;
  PetscBLASInt   one = 1,bnz;

  PetscFunctionBegin;
  ierr = PetscBLASIntCast(a->nz,&bnz);CHKERRQ(ierr);
  PetscStackCallBLAS("BLASscal",BLASscal_(&bnz,&oalpha,a->a,&one));
  ierr = PetscLogFlops(a->nz);CHKERRQ(ierr);
  ierr = MatSeqAIJInvalidateDiagonal(inA);CHKERRQ(ierr);
#if defined(PETSC_HAVE_DEVICE)
  if (inA->offloadmask != PETSC_OFFLOAD_UNALLOCATED) inA->offloadmask = 
PETSC_OFFLOAD_CPU;
#endif
  PetscFunctionReturn(0);
}

PetscErrorCode VecScale_Seq(Vec xin, PetscScalar alpha)
{
  PetscErrorCode ierr;
  PetscBLASInt   one = 1,bn;

  PetscFunctionBegin;
  ierr = PetscBLASIntCast(xin->map->n,&bn);CHKERRQ(ierr);
  if (alpha == (PetscScalar)0.0) {
    ierr = VecSet_Seq(xin,alpha);CHKERRQ(ierr);
  } else if (alpha != (PetscScalar)1.0) {
    PetscScalar a = alpha,*xarray;
    ierr = VecGetArray(xin,&xarray);CHKERRQ(ierr);
    PetscStackCallBLAS("BLASscal",BLASscal_(&bn,&a,xarray,&one));
    ierr = VecRestoreArray(xin,&xarray);CHKERRQ(ierr);
  }
  ierr = PetscLogFlops(xin->map->n);CHKERRQ(ierr);
  PetscFunctionReturn(0);
}



> On Oct 22, 2020, at 3:02 PM, Antoine Côté <[email protected]> 
> wrote:
> 
> Hi,
> 
> See attached files for both outputs. Tell me if you need any clarification. 
> It was run with a DMDA of 33x17x17 nodes (creating 32x16x16=8192 elements). 
> With 3 dof per nodes, problem has a total of 28611 dof.
> 
> Note : Stage "Stiff_Adj" is the part of the code modifying Mat K. 
> PetscLogStagePush/Pop was used.
> 
> Regards,
> 
> Antoine
> De : Matthew Knepley <[email protected] <mailto:[email protected]>>
> Envoyé : 22 octobre 2020 15:35
> À : Antoine Côté <[email protected] 
> <mailto:[email protected]>>
> Cc : [email protected] <mailto:[email protected]> 
> <[email protected] <mailto:[email protected]>>
> Objet : Re: [petsc-users] Enhancing MatScale computing time
>  
> On Thu, Oct 22, 2020 at 3:23 PM Antoine Côté <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> 
> I'm working with a 3D DMDA, with 3 dof per "node", used to create a sparse 
> matrix Mat K. The Mat is modified repeatedly by the program, using the 
> commands (in that order) :
> 
> MatZeroEntries(K)
> In a for loop : MatSetValuesLocal(K, 24, irow, 24, icol, vals, ADD_VALUES)
> MatAssemblyBegin(K, MAT_FINAL_ASSEMBLY)
> MatAssemblyEnd(K, MAT_FINAL_ASSEMBLY)
> MatDiagonalScale(K, vec1, vec1)
> MatDiagonalSet(K, vec2, ADD_VALUES)
> 
> Computing time seems high and I would like to improve it. Running tests with 
> "-log_view" tells me that MatScale() is the bottle neck (50% of total 
> computing time) . From manual pages, I've tried a few tweaks :
> DMSetMatType(da, MATMPIBAIJ) : "For problems with multiple degrees of freedom 
> per node, ... BAIJ can significantly enhance performance", Chapter 14.2.4
> Used MatMissingDiagonal() to confirm there is no missing diagonal entries : 
> "If the matrix Y is missing some diagonal entries this routine can be very 
> slow", MatDiagonalSet() manual
> Tried MatSetOption()
> MAT_NEW_NONZERO_LOCATIONS == PETSC_FALSE : to increase assembly efficiency
> MAT_NEW_NONZERO_LOCATION_ERR == PETSC_TRUE : "When true, assembly processes 
> have one less global reduction"
> MAT_NEW_NONZERO_ALLOCATION_ERR == PETSC_TRUE : "When true, assembly processes 
> have one less global reduction"
> MAT_USE_HASH_TABLE == PETSC_TRUE : "Improve the searches during matrix 
> assembly"
> According to "-log_view", assembly is fast (0% of total time), and the use of 
> a DMDA makes me believe preallocation isn't the cause of performance issue.
> 
> I would like to know how could I improve MatScale(). What are the best 
> practices (during allocation, when defining Vecs and Mats, the DMDA, etc.)? 
> Instead of MatDiagonalScale(), should I use another command to obtain the 
> same result faster?
> 
> Something is definitely strange. Can you please send the output of
> 
>   -log_view -info :mat
> 
>   Thanks,
> 
>      Matt
>  
> Thank you very much!
> 
> Antoine Côté
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ 
> <https://can01.safelinks.protection.outlook.com/?url=http:%2F%2Fwww.cse.buffalo.edu%2F~knepley%2F&data=04%7C01%7CAntoine.Cote3%40usherbrooke.ca%7C6b823852b3964170f52908d876c1bb0b%7C3a5a8744593545f99423b32c3a5de082%7C0%7C0%7C637389921724846720%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=w7%2Fc%2BSzAfTa02gxTS8VbB%2FVwIPpaKw%2F%2BeiiX4K9gd1k%3D&reserved=0>
> <LogView.out><mat.0>

Reply via email to