Dear Agustín,

I'm not sure if there's an easy way to determine which library is causing the "Illegal instruction" error, but it's possibly not a single specific library, but several...

I suggest you try re-installing all modules on the slave nodes (the oldest CPUs), if that's feasible.

When you use "eb --force", only the easyconfig files specified to the eb command are reinstalled. There's no command line option to re-install everything, since it's pretty rare to actually having to do this.

The easiest way would be to remove the module files, and then reinstall PySCF with "eb --robot".


regards,

Kenneth

On 03/06/2021 19:48, Agustín Aucar wrote:
Dear EasyBuild experts,

I tried to recompile some of the dependencies of the PySCF code by using:

eb name-of-file.eb --optarch=GENERIC -r --force

but the results are still the same. I recompiled 5 or 6 of the 36 "dependent" modules... Is there a way to somehow estimate which module is causing this problem to avoid recompiling each of the 36 modules?

The loaded modules (module purge && module load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6) are

Currently Loaded Modules:
  1) compiler/GCCcore/10.2.0                  10) lib/libevent/2.1.12-GCCcore-10.2.0   19) toolchain/foss/2020b      28) lib/pybind11/2.6.0-GCCcore-10.2.0   2) lib/zlib/1.2.11-GCCcore-10.2.0           11) lib/UCX/1.9.0-GCCcore-10.2.0         20) tools/bzip2/1.0.8-GCCcore-10.2.0    29) lang/SciPy-bundle/2020.11-foss-2020b   3) tools/binutils/2.35-GCCcore-10.2.0       12) lib/libfabric/1.11.0-GCCcore-10.2.0  21) devel/ncurses/6.2-GCCcore-10.2.0    30) tools/Szip/2.1.1-GCCcore-10.2.0   4) compiler/GCC/10.2.0                      13) lib/PMIx/3.1.5-GCCcore-10.2.0        22) lib/libreadline/8.0-GCCcore-10.2.0  31) data/HDF5/1.10.7-gompi-2020b   5) tools/numactl/2.0.13-GCCcore-10.2.0      14) mpi/OpenMPI/4.0.5-GCC-10.2.0         23) lang/Tcl/8.6.10-GCCcore-10.2.0      32) data/h5py/3.1.0-foss-2020b   6) tools/XZ/5.2.5-GCCcore-10.2.0            15) numlib/OpenBLAS/0.3.12-GCC-10.2.0    24) devel/SQLite/3.33.0-GCCcore-10.2.0  33) chem/qcint/4.0.6-foss-2020b-Python-3.8.6   7) lib/libxml2/2.9.10-GCCcore-10.2.0        16) toolchain/gompi/2020b                25) math/GMP/6.2.0-GCCcore-10.2.0       34) chem/libxc/5.1.3-GCC-10.2.0   8) system/libpciaccess/0.16-GCCcore-10.2.0  17) numlib/FFTW/3.3.8-gompi-2020b        26) lib/libffi/3.3-GCCcore-10.2.0     35) chem/XCFun/2.1.1-GCCcore-10.2.0   9) system/hwloc/2.2.0-GCCcore-10.2.0        18) numlib/ScaLAPACK/2.1.0-gompi-2020b   27) lang/Python/3.8.6-GCCcore-10.2.0    36) chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6


Thank you in advance for any help,
Agustín

El jue, 3 jun 2021 a las 8:03, Agustín Aucar (<agusau...@gmail.com <mailto:agusau...@gmail.com>>) escribió:

    Dear Åke and Kenneth,,

    Thank you very much for your replies.

    El jue, 3 jun 2021 a las 4:00, Kenneth Hoste
    (<kenneth.ho...@ugent.be <mailto:kenneth.ho...@ugent.be>>) escribió:

        Dear Agustín,

        The fundemental problem is indeed that you're building software
        on one
        type of CPU, and then trying to run it on another.

        Can you share some more details on what type of CPU is in the
        master
        node and slave nodes?

        If you can, try using the archspec tool (see
        https://github.com/archspec/archspec
        <https://github.com/archspec/archspec>, install with "pip3 install
        archspec", then run "archspec cpu").

        Or share the output of the following commands:

        grep 'model name' /proc/cpuinfo  | head -1


        grep flags /proc/cpuinfo | head -1


    Master node:

    model name : Dual-Core AMD Opteron(tm) Processor 2214

    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
    cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
    fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid
    pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall


    Slaves:

    model name : Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

    flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
    cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
    syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
    rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
    dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm
    pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes
    xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb
    cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp
    tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust
    bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap
    intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local
    dtherm ida arat pln pts md_clear flush_l1d

        You can also try controlling the optimizations that EasyBuild
        does by
        default, to prevent that it builds for the specific CPU in the
        build
        node, using "eb --optarch=GENERIC", see
        
https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html
        
<https://docs.easybuild.io/en/latest/Controlling_compiler_optimization_flags.html>.


    I tried doing

    eb PySCF-2.0.0a-foss-2020b-Python-3.8.6.eb --optarch=GENERIC -r --force

    but the problem is still the same. Maybe the problem is not in this
    particular code (PySCF) but in some of its dependencies. Is there
    something like a "--force" flag to force dependencies to recompile?

        George's suggestion is better/easier though: building on the
        oldest node
        should help you too...


    I tried this a couple of days ago, but it didn't resolve the
    problem. In fact: when doing so, I cannot run the code in master (as
    expected) but I can neither run it in slaves...

        regards,

        Kenneth



    Thank you for your help!

    Agustín

        On 02/06/2021 22:20, Agustín Aucar wrote:
         > Dear George,
         >
         > Thanks for your response. A few days ago, I tried to compile
        the code in
         > a slave node, but it didn't solve the problem...
         >
         > Best,
         > Agustín
         >
         > El mié, 2 jun 2021 a las 11:41, George Tsouloupas
         > (<g.tsoulou...@cyi.ac.cy <mailto:g.tsoulou...@cyi.ac.cy>
        <mailto:g.tsoulou...@cyi.ac.cy
        <mailto:g.tsoulou...@cyi.ac.cy>>>) escribió:
         >
         >     Hi,
         >
         >     In a similar situation we ended up just building the
        software on the
         >     "older" cpu (i.e. the "slave" in your case)
         >
         >     G.
         >
         >
         >     George Tsouloupas, PhD
         >     HPC Facility Technical Director
         >     The Cyprus Institute
         >     tel: +357 22208688
         >
         >     On 6/2/21 4:22 PM, Agustín Aucar wrote:
         >>     Dear EasyBuild experts,
         >>
         >>     Firstly, thank you for your very nice work!
         >>
         >>     I'm trying to compile PySCF with the following *.eb file:
         >>
         >>     easyblock = 'CMakeMakeCp'
         >>
         >>     name = 'PySCF'
         >>     version = '2.0.0a'
         >>     versionsuffix = '-Python-%(pyver)s'
         >>
         >>     homepage = 'http://www.pyscf.org <http://www.pyscf.org>
        <http://www.pyscf.org/ <http://www.pyscf.org/>>'
         >>     description = "PySCF is an open-source collection of
        electronic
         >>     structure modules powered by Python."
         >>
         >>     toolchain = {'name': 'foss', 'version': '2020b'}
         >>
         >>     source_urls = ['https://github.com/pyscf/pyscf/archive/
        <https://github.com/pyscf/pyscf/archive/>
         >>     <https://github.com/pyscf/pyscf/archive/
        <https://github.com/pyscf/pyscf/archive/>>']
         >>     sources = ['v%(version)s.tar.gz']
         >>     checksums =
>>  ['20f4c9faf65436a97f9dfc8099d3c79b988b0a2c5374c701fbe35abc6fad4922']
         >>
         >>     builddependencies = [('CMake', '3.18.4')]
         >>
         >>     dependencies = [
         >>         ('Python', '3.8.6'),
         >>         ('SciPy-bundle', '2020.11'),  # for numpy, scipy
         >>         ('h5py', '3.1.0'),
         >>         ('qcint', '4.0.6', versionsuffix),
         >>         ('libxc', '5.1.3'),
         >>         ('XCFun', '2.1.1'),
         >>     ]
         >>
         >>     start_dir = 'pyscf/lib'
         >>
         >>     separate_build_dir = True
         >>
         >>     configopts = "-DBUILD_LIBCINT=OFF -DBUILD_LIBXC=OFF
         >>     -DBUILD_XCFUN=OFF "
         >>
         >>     prebuildopts = "export
         >>     PYSCF_INC_DIR=$EBROOTQCINT/include:$EBROOTLIBXC/lib && "
         >>
         >>     files_to_copy = ['pyscf']
         >>
         >>     sanity_check_paths = {
         >>         'files': ['pyscf/__init__.py'],
         >>         'dirs': ['pyscf/data', 'pyscf/lib'],
         >>     }
         >>
         >>     sanity_check_commands = ["python -c 'import pyscf'"]
         >>
         >>     modextrapaths = {'PYTHONPATH': '', 'PYSCF_EXT_PATH': ''}
         >>
         >>     moduleclass = 'chem'
         >>
         >>
         >>     Even if the module is created, I am having troubles by
        running it
         >>     in a node different from master. In particular, when I
        load the
         >>     module and ran the code, it goes all OK:
         >>
         >>     module load chem/PySCF/2.0.0a-foss-2020b-Python-3.8.6
         >>     python
         >>     from pyscf import gto, scf
         >>     mol = gto.M(atom='H 0 0 0; H 0 0 1')
         >>     mf = scf.RHF(mol).run()
         >>
         >>     but when I try to run it on a node different from the
        master, I get:
         >>
         >>     Python 3.8.6 (default, Jun  1 2021, 16:43:49)
         >>     [GCC 10.2.0] on linux
         >>     Type "help", "copyright", "credits" or "license" for
        more information.
         >>     >>> from pyscf import gto, scf
         >>     >>> mol = gto.M(atom='H 0 0 0; H 0 0 1')
         >>     >>> mf = scf.RHF(mol).run()
         >>     Illegal instruction (core dumped)
         >>
         >>     As far as I read in different places, it seems to be
        related to
         >>     the different architectures of our master and slaves nodes.
         >>
         >>     If I execute
         >>
         >>     grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr
        '[:upper:]'
         >>     '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in
         >>     $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3"
        | "fma" |
         >>     "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";;
        esac; done;
         >>     MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
         >>
         >>     on the slaves I get: -march=native -mssse3 -mfma -mcx16
        -msse4.1
         >>     -msse4.2 -mpopcnt -mavx -mavx2
         >>
         >>     whereas on the master node we have: -march=native -mcx16
         >>
         >>     I tried to compile PySCF by adding these lines to my
        *.eb file:
         >>
         >>     configopts += "-DBUILD_FLAGS='-march=native -mssse3
        -mfma -mcx16
         >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
         >>     configopts += "-DCMAKE_C_FLAGS='-march=native -mssse3
        -mfma -mcx16
         >>     -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
         >>     configopts += "-DCMAKE_CXX_FLAGS='-march=native -mssse3
        -mfma
         >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2' "
         >>     configopts += "-DCMAKE_FORTRAN_FLAGS='-march=native
        -mssse3 -mfma
         >>     -mcx16 -msse4.1 -msse4.2 -mpopcnt -mavx -mavx2'"
         >>
         >>     but in that case the code does not run on master and
        neither in
         >>     slaves.
         >>
         >>
         >>     I'm sorry if it is a stupid question. I am far from
        being a system
         >>     admin...
         >>
         >>     Thanks a lot for your help.
         >>
         >>     Dr. Agustín Aucar
         >>     Institute for Modeling and Innovative Technologies -
        Argentina
         >

Reply via email to