Fande and John, Could you try jczhang/feature-better-quicksort-pivot? It passed Jenkins tests and I could not imagine why it failed on yours. Hash table has its own cost. We'd better get quicksort right and see how it performs before rewriting code. --Junchao Zhang
On Tue, Jul 2, 2019 at 2:37 PM Fande Kong <fdkong...@gmail.com<mailto:fdkong...@gmail.com>> wrote: YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault: 11 (signal 11) Segmentation fault :-) As Jed said, it might be a good idea to rewrite the code using the hashing table. Fande, On Tue, Jul 2, 2019 at 1:27 PM Zhang, Junchao <jczh...@mcs.anl.gov<mailto:jczh...@mcs.anl.gov>> wrote: Try this to see if it helps: diff --git a/src/sys/utils/sorti.c b/src/sys/utils/sorti.c index 1b07205a..90779891 100644 --- a/src/sys/utils/sorti.c +++ b/src/sys/utils/sorti.c @@ -294,7 +294,8 @@ static PetscErrorCode PetscSortIntWithArrayPair_Private(PetscInt *L,PetscInt *J, } PetscFunctionReturn(0); } - SWAP3(L[0],L[right/2],J[0],J[right/2],K[0],K[right/2],tmp); + i = MEDIAN(L,right); + SWAP3(L[0],L[i],J[0],J[i],K[0],K[i],tmp); vl = L[0]; last = 0; for (i=1; i<=right; i++) { On Tue, Jul 2, 2019 at 12:14 PM Fande Kong via petsc-dev <petsc-dev@mcs.anl.gov<mailto:petsc-dev@mcs.anl.gov>> wrote: BTW, PetscSortIntWithArrayPair is used in MatStashSortCompress_Private. Any way to avoid to use PetscSortIntWithArrayPair in MatStashSortCompress_Private? Fande, On Tue, Jul 2, 2019 at 11:09 AM Fande Kong <fdkong...@gmail.com<mailto:fdkong...@gmail.com>> wrote: Hi Developers, John just noticed that the matrix assembly was slow when having sufficient amount of off-diagonal entries. It was not a MPI issue since I was able to reproduce the issue using two cores on my desktop, that is, "mpirun -n 2". I turned on a profiling, and 99.99% of the time was spent on PetscSortIntWithArrayPair (recursively calling). It took THREE MINUTES to get the assembly done. And then changed to use the option "-matstash_legacy" to restore the code to the old assembly routine, and the same code took ONE SECOND to get the matrix assembly done. Should write any better sorting algorithms? Fande,