Thanks for the fix.  https://gitlab.com/petsc/petsc/pipelines/96957999

> On Nov 14, 2019, at 2:04 PM, hg <hgbk2...@gmail.com> wrote:
> 
> Hello
> 
> It turns out that hwloc is not installed on the cluster system that I'm 
> using. Without hwloc, pastix will run into the branch using sched_setaffinity 
> and caused error (see above at sopalin_thread.c). I'm not able to understand 
> and find a solution with sched_setaffinity so I think enabling hwloc is an 
> easier solution. Between, hwloc is recommended to compile Pastix according to 
> those threads:
> 
> https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186
> https://solverstack.gitlabpages.inria.fr/pastix/Bindings.html
> 
> hwloc is supported in PETSc so I assumed a clean and easy solution to compile 
> with --download-hwloc. I made some changes in 
> config/BuildSystem/config/packages/PaStiX.py to tell pastix to link to hwloc:
> 
> ...
> self.hwloc          = framework.require('config.packages.hwloc',self)
> ...
> if self.hwloc.found:
>       g.write('CCPASTIX   := $(CCPASTIX) -DWITH_HWLOC 
> '+self.headers.toString(self.hwloc.include)+'\n')
>       g.write('EXTRALIB   := $(EXTRALIB) 
> '+self.libraries.toString(self.hwloc.dlib)+'\n')
> 
> But it does not compile:
> 
> Possible ERROR while running linker: exit code 1
> stderr:
> /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_init':
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:822:
>  undefined reference to `hwloc_topology_init'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:828:
>  undefined reference to `hwloc_topology_load'
> /opt/petsc-dev/lib/libpastix.a(pastix.o): In function `pastix_task_clean':
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/pastix.c:4677:
>  undefined reference to `hwloc_topology_destroy'
> /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function 
> `hwloc_get_obj_by_type':
> /opt/petsc-dev/include/hwloc/inlines.h:76: undefined reference to 
> `hwloc_get_type_depth'
> /opt/petsc-dev/include/hwloc/inlines.h:81: undefined reference to 
> `hwloc_get_obj_by_depth'
> /opt/petsc-dev/lib/libpastix.a(sopalin_thread.o): In function 
> `sopalin_bindthread':
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:538:
>  undefined reference to `hwloc_bitmap_dup'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:539:
>  undefined reference to `hwloc_bitmap_singlify'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:543:
>  undefined reference to `hwloc_set_cpubind'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:567:
>  undefined reference to `hwloc_bitmap_free'
> /home/hbui/sw2/petsc-dev/arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c:548:
>  undefined reference to `hwloc_bitmap_asprintf'
> 
> Any idea is appreciated. I can attach configure.log as needed.
> 
> Giang
> 
> 
> On Thu, Nov 7, 2019 at 12:18 AM hg <hgbk2...@gmail.com> wrote:
> Hi Barry
> 
> Maybe you're right, sched_setaffinity returns EINVAL in my case, Probably the 
> scheduler does not allow the process to bind to thread on its own.
> 
> Giang
> 
> 
> On Wed, Nov 6, 2019 at 4:52 PM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> 
>   You can also just look at configure.log where it will show the calling 
> sequence of how PETSc configured and built Pastix. The recipe is in 
> config/BuildSystem/config/packages/PaStiX.py we don't monkey with low level 
> things like the affinity of external packages. My guess is that your cluster 
> system has inconsistent parts related to this, that one tool works and 
> another does not indicates they are inconsistent with respect to each other 
> in what they expect.
> 
>    Barry
> 
> 
> 
> 
> > On Nov 6, 2019, at 4:02 AM, Matthew Knepley <knep...@gmail.com> wrote:
> > 
> > On Wed, Nov 6, 2019 at 4:40 AM hg <hgbk2...@gmail.com> wrote:
> > Look into 
> > arch-linux2-cxx-opt/externalpackages/pastix_5.2.3/src/sopalin/src/sopalin_thread.c
> >  I saw something like:
> > 
> > #ifdef HAVE_OLD_SCHED_SETAFFINITY
> >     if(sched_setaffinity(0,&mask) < 0)
> > #else /* HAVE_OLD_SCHED_SETAFFINITY */
> >     if(sched_setaffinity(0,sizeof(mask),&mask) < 0)
> > #endif /* HAVE_OLD_SCHED_SETAFFINITY */
> >       {
> >   perror("sched_setaffinity");
> >   EXIT(MOD_SOPALIN, INTERNAL_ERR);
> >       }
> > 
> > Is there possibility that Petsc turn on HAVE_OLD_SCHED_SETAFFINITY during 
> > compilation?
> > 
> > May I know how to trigger re-compilation of external packages with petsc? I 
> > may go in there and check what's going on.
> > 
> > If we built it during configure, then you can just go to
> > 
> >   $PETSC_DIR/$PETSC_ARCH/externalpackages/*pastix*/
> > 
> > and rebuild/install it to test. If you want configure to do it, you have to 
> > delete
> > 
> >   $PETSC_DIR/$PETSC_ARCH/lib/petsc/conf/pkg.conf.pastix
> > 
> > and reconfigure.
> > 
> >   Thanks,
> > 
> >      Matt
> >  
> > Giang
> > 
> > 
> > On Wed, Nov 6, 2019 at 10:12 AM hg <hgbk2...@gmail.com> wrote:
> > sched_setaffinity: Invalid argument only happens when I launch the job with 
> > sbatch. Running without scheduler is fine. I think this has something to do 
> > with pastix.
> > 
> > Giang
> > 
> > 
> > On Wed, Nov 6, 2019 at 4:37 AM Smith, Barry F. <bsm...@mcs.anl.gov> wrote:
> > 
> >   Google finds this 
> > https://gforge.inria.fr/forum/forum.php?thread_id=32824&forum_id=599&group_id=186
> > 
> > 
> > 
> > > On Nov 5, 2019, at 7:01 PM, Matthew Knepley via petsc-users 
> > > <petsc-users@mcs.anl.gov> wrote:
> > > 
> > > I have no idea. That is a good question for the PasTix list.
> > > 
> > >   Thanks,
> > > 
> > >     Matt
> > > 
> > > On Tue, Nov 5, 2019 at 5:32 PM hg <hgbk2...@gmail.com> wrote:
> > > Should thread affinity be invoked? I set  -mat_pastix_threadnbr 1 and 
> > > also OMP_NUM_THREADS to 1
> > > 
> > > Giang
> > > 
> > > 
> > > On Tue, Nov 5, 2019 at 10:50 PM Matthew Knepley <knep...@gmail.com> wrote:
> > > On Tue, Nov 5, 2019 at 4:11 PM hg via petsc-users 
> > > <petsc-users@mcs.anl.gov> wrote:
> > > Hello
> > > 
> > > I got crashed when using Pastix as solver for KSP. The error message 
> > > looks like:
> > > 
> > > ....
> > > NUMBER of BUBBLE 1
> > > COEFMAX 1735566 CPFTMAX 0 BPFTMAX 0 NBFTMAX 0 ARFTMAX 0
> > > ** End of Partition & Distribution phase **
> > >    Time to analyze                              0.225 s
> > >    Number of nonzeros in factorized matrix      708784076
> > >    Fill-in                                      12.2337
> > >    Number of operations (LU)                    2.80185e+12
> > >    Prediction Time to factorize (AMD 6180  MKL) 394 s
> > > 0 : SolverMatrix size (without coefficients)    32.4 MB
> > > 0 : Number of nonzeros (local block structure)  365309391
> > >  Numerical Factorization (LU) :
> > > 0 : Internal CSC size                           1.08 GB
> > >    Time to fill internal csc                    6.66 s
> > >    --- Sopalin : Allocation de la structure globale ---
> > >    --- Fin Sopalin Init                             ---
> > >    --- Initialisation des tableaux globaux          ---
> > > sched_setaffinity: Invalid argument
> > > [node083:165071] *** Process received signal ***
> > > [node083:165071] Signal: Aborted (6)
> > > [node083:165071] Signal code:  (-6)
> > > [node083:165071] [ 0] /lib64/libpthread.so.0(+0xf680)[0x2b8081845680]
> > > [node083:165071] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x2b8082191207]
> > > [node083:165071] [ 2] /lib64/libc.so.6(abort+0x148)[0x2b80821928f8]
> > > [node083:165071] [ 3] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_comm+0x0)[0x2b80a4124c9d]
> > > [node083:165071] [ 4] Launching 1 threads (1 commputation, 0 
> > > communication, 0 out-of-core)
> > >    --- Sopalin : Local structure allocation         ---
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_sopalin_init_smp+0x29b)[0x2b80a40c39d2]
> > > [node083:165071] [ 5] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_smp+0x68)[0x2b80a40cf4c2]
> > > [node083:165071] [ 6] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(sopalin_launch_thread+0x4ba)[0x2b80a4124a31]
> > > [node083:165071] [ 7] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_ge_sopalin_thread+0x94)[0x2b80a40d6170]
> > > [node083:165071] [ 8] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(D_pastix_task_sopalin+0x5ad)[0x2b80a40b09a2]
> > > [node083:165071] [ 9] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(d_pastix+0xa8a)[0x2b80a40b2325]
> > > [node083:165071] [10] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0x63927b)[0x2b80a35bf27b]
> > > [node083:165071] [11] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(MatLUFactorNumeric+0x19a)[0x2b80a32c7552]
> > > [node083:165071] [12] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(+0xa46c09)[0x2b80a39ccc09]
> > > [node083:165071] [13] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(PCSetUp+0x311)[0x2b80a3a8f1a9]
> > > [node083:165071] [14] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSetUp+0xbf7)[0x2b80a3b46e81]
> > > [node083:165071] [15] 
> > > /sdhome/bui/opt/petsc-3.11.0_ompi-3.0.0/lib/libpetsc.so.3.11(KSPSolve+0x210)[0x2b80a3b4746e]
> > > 
> > > Does anyone have an idea what is the problem and how to fix it? The PETSc 
> > > parameters I used are as below:
> > > 
> > > It looks like PasTix is having trouble setting the thread affinity:
> > > 
> > > sched_setaffinity: Invalid argument
> > > 
> > > so it may be your build of PasTix.
> > > 
> > >   Thanks,
> > > 
> > >      Matt
> > >  
> > > -pc_type lu
> > > -pc_factor_mat_solver_package pastix
> > > -mat_pastix_verbose 2
> > > -mat_pastix_threadnbr 1
> > > 
> > > Giang
> > > 
> > > 
> > > 
> > > -- 
> > > What most experimenters take for granted before they begin their 
> > > experiments is infinitely more interesting than any results to which 
> > > their experiments lead.
> > > -- Norbert Wiener
> > > 
> > > https://www.cse.buffalo.edu/~knepley/
> > > 
> > > 
> > > -- 
> > > What most experimenters take for granted before they begin their 
> > > experiments is infinitely more interesting than any results to which 
> > > their experiments lead.
> > > -- Norbert Wiener
> > > 
> > > https://www.cse.buffalo.edu/~knepley/
> > 
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their 
> > experiments is infinitely more interesting than any results to which their 
> > experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/
> 

Reply via email to