Thanks for the fix. https://gitlab.com/petsc/petsc/pipelines/96957999
> On Nov 14, 2019, at 2:04 PM, hg <hgbk2...@gmail.com> wrote: > > Hello > > It turns out that hwloc is not installed on the cluster system that I'm > using. Without hwloc, pastix will run into the branch using sched_setaffinity > and caused error (see above at sopalin_thread.c). I'm not able to understand > and find a solution with sched_setaffinity so I think enabling hwloc is an > easier solution. Between, hwloc is recommended to compile Pastix according to > those threads: > > https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html > > hwloc is supported in PETSc so I assumed a clean and easy solution to compile > with --download-hwloc. I made some changes in > config/BuildSystem/config/packages/PaStiX.py to tell pastix to link to hwloc: > > ... > self.hwloc = framework.require('config.packages.hwloc',self) > ... > if self.hwloc.found: > g.write('CCPASTIX := $(CCPASTIX) -DWITH_HWLOC > '+self.headers.toString(self.hwloc.include)+'\n') > g.write('EXTRALIB := $(EXTRALIB) > '+self.libraries.toString(self.hwloc.dlib)+'\n') > > But it does not compile: > > Possible ERROR while running linker: exit code 1 > stderr: > /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_init': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:822: > undefined reference to `hwloc_topology_init' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:828: > undefined reference to `hwloc_topology_load' > /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_clean': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:4677: > undefined reference to `hwloc_topology_destroy' > /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function > `hwloc_get_obj_by_type': > /opt/petsc-dev/include/hwloc/inlines.h:76: undefined reference to > `hwloc_get_type_depth' > /opt/petsc-dev/include/hwloc/inlines.h:81: undefined reference to > `hwloc_get_obj_by_depth' > /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function > `sopalin_bindthread': > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:538: > undefined reference to `hwloc_bitmap_dup' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:539: > undefined reference to `hwloc_bitmap_singlify' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:543: > undefined reference to `hwloc_set_cpubind' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:567: > undefined reference to `hwloc_bitmap_free' > /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:548: > undefined reference to `hwloc_bitmap_asprintf' > > Any idea is appreciated. I can attach configure.log as needed. > > Giang > > > On Thu, Nov 7, 2019 at 12:18 AM hg <hgbk2...@gmail.com> wrote: > Hi Barry > > Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably the > scheduler does not allow the process to bind to thread on its own. > > Giang > > > On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > You can also just look at configure.log where it will show the calling > sequence of how PETSc configured and built Pastix. The recipe is in > config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level > things like the affinity of external packages. My guess is that your cluster > system has inconsistent parts related to this, that one tool works and > another does not indicates they are inconsistent with respect to each other > in what they expect. > > Barry > > > > > > On Nov 6, 2019, at 4:02 AM, Matthew Knepley <knep...@gmail.com> wrote: > > > > On Wed, Nov 6, 2019 at 4:40 AM hg <hgbk2...@gmail.com> wrote: > > Look into > > arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c > > I saw something like: > > > > #ifdef HAVE_OLD_SCHED_SETAFFINITY > > if(sched_setaffinity(0,&mask) < 0) > > #else /* HAVE_OLD_SCHED_SETAFFINITY */ > > if(sched_setaffinity(0,sizeof(mask),&mask) < 0) > > #endif /* HAVE_OLD_SCHED_SETAFFINITY */ > > { > > perror("sched_setaffinity"); > > EXIT(MOD_SOPALIN, INTERNAL_ERR); > > } > > > > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during > > compilation? > > > > May I know how to trigger re-compilation of external packages with petsc? I > > may go in there and check what's going on. > > > > If we built it during configure, then you can just go to > > > > $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/ > > > > and rebuild/install it to test. If you want configure to do it, you have to > > delete > > > > $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix > > > > and reconfigure. > > > > Thanks, > > > > Matt > > > > Giang > > > > > > On Wed, Nov 6, 2019 at 10:12 AM hg <hgbk2...@gmail.com> wrote: > > sched_setaffinity: Invalid argument only happens when I launch the job with > > sbatch. Running without scheduler is fine. I think this has something to do > > with pastix. > > > > Giang > > > > > > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > Google finds this > > https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186 > > > > > > > > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users > > > <petsc-users@mcs.anl.gov> wrote: > > > > > > I have no idea. That is a good question for the PasTix list. > > > > > > Thanks, > > > > > > Matt > > > > > > On Tue, Nov 5, 2019 at 5:32 PM hg <hgbk2...@gmail.com> wrote: > > > Should thread affinity be invoked? I set -mat_pastix_threadnbr 1 and > > > also OMP_NUM_THREADS to 1 > > > > > > Giang > > > > > > > > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley <knep...@gmail.com> wrote: > > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users > > > <petsc-users@mcs.anl.gov> wrote: > > > Hello > > > > > > I got crashed when using Pastix as solver for KSP. The error message > > > looks like: > > > > > > .... > > > NUMBER of BUBBLE 1 > > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0 > > > ** End of Partition & Distribution phase ** > > > Time to analyze 0.225 s > > > Number of nonzeros in factorized matrix 708784076 > > > Fill-in 12.2337 > > > Number of operations (LU) 2.80185e+12 > > > Prediction Time to factorize (AMD 6180 MKL) 394 s > > > 0 : SolverMatrix size (without coefficients) 32.4 MB > > > 0 : Number of nonzeros (local block structure) 365309391 > > > Numerical Factorization (LU) : > > > 0 : Internal CSC size 1.08 GB > > > Time to fill internal csc 6.66 s > > > --- Sopalin : Allocation de la structure globale --- > > > --- Fin Sopalin Init --- > > > --- Initialisation des tableaux globaux --- > > > sched_setaffinity: Invalid argument > > > [node083:165071] *** Process received signal *** > > > [node083:165071] Signal: Aborted (6) > > > [node083:165071] Signal code: (-6) > > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680] > > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207] > > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8] > > > [node083:165071] [ 3] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d] > > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 > > > communication, 0 out-of-core) > > > --- Sopalin : Local structure allocation --- > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2] > > > [node083:165071] [ 5] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2] > > > [node083:165071] [ 6] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31] > > > [node083:165071] [ 7] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170] > > > [node083:165071] [ 8] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2] > > > [node083:165071] [ 9] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325] > > > [node083:165071] [10] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b] > > > [node083:165071] [11] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552] > > > [node083:165071] [12] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09] > > > [node083:165071] [13] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9] > > > [node083:165071] [14] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81] > > > [node083:165071] [15] > > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e] > > > > > > Does anyone have an idea what is the problem and how to fix it? The PETSc > > > parameters I used are as below: > > > > > > It looks like PasTix is having trouble setting the thread affinity: > > > > > > sched_setaffinity: Invalid argument > > > > > > so it may be your build of PasTix. > > > > > > Thanks, > > > > > > Matt > > > > > > -pc_type lu > > > -pc_factor_mat_solver_package pastix > > > -mat_pastix_verbose 2 > > > -mat_pastix_threadnbr 1 > > > > > > Giang > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > > experiments is infinitely more interesting than any results to which > > > their experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > https://www.cse.buffalo.edu/~knepley/ >