PETSc stacks the Fortran modules in the same way it stacks the C include files. So the TAO module includes all the Fortran modules below it etc. It would be nearly impossible to disentangle the bits and pieces without introducing a more painful user experience. For example use PCTypes, use PCFunctions, use KSPTypes, .... impossible to use and impossible to maintain.
This is a completely artificial bug of IBM's own making in their compiler that we should not have to work around. Barry > On Mar 3, 2021, at 12:10 PM, Jacob Faibussowitsch <jacob....@gmail.com> wrote: > > Hello All, > > I discovered a compiler bug in the IBM xl fortran compiler a few weeks ago > that would crash the compiler when compiling petsc fortran interfaces. The > TL;DR of it is that the xl compiler creates a function dictionary for every > function imported in fortran modules, and since petsc fortran interfaces seem > to import entire packages writ-large this exceeds the number of dictionary > entries (2**21): > >> The reason for the Internal Compiler Error is because we can't grow an >> interal dictionary anymore (ie we hit a 2**21 limit). >> The file contains many module procedures and interfaces that use the same >> helper module. As a result, we are importing the dictionary entries for that >> module repeatedly reaching >> the limit. >> >> Can you please give the following source code workaround a try? >> Since there is already "use petscvecdefdummy" at the module scope, one >> workaround might be to remove the unnecessary "use petscvecdefdummy" in >> vecnotequal and vecequals >> and all similar procedures. >> >> For example, the test case has: >> module petscvecdef >> use petscvecdefdummy >> ... >> function vecnotequal(A,B) >> use petscvecdefdummy >> logical vecnotequal >> type(tVec), intent(in) :: A,B >> vecnotequal = (A%v .ne. B%v) >> end function >> function vecequals(A,B) >> use petscvecdefdummy >> logical vecequals >> type(tVec), intent(in) :: A,B >> vecequals = (A%v .eq. B%v) >> end function >> ... >> end module >> Another workaround would be to put the procedure definitions from this large >> module into several submodules. Each submodule would be able to accommodate >> a dictionary with 2**21 entries. >> >> >> Please let us know if one of the above workarounds resolve the issue. > > > The proposed fix from IBM would be to pull “use moduleXXX” out of subroutines > or to have our auto-fortran interfaces detect which symbols to include from > the respective modules and only include those in the subroutines. I’m not > familiar at all with how the interfaces are generated so I don’t even know if > this is possible. >> IBM provided the following additional explanation and example. Can the >> process used to generate these routines and functions determine the specific >> symbols required and then use the only keyword or import statement to >> include them? >> >> When factoring out use statements out of module procedures, you can just >> delete them. But you can't completely remove them from interface blocks. >> Instead, you can limit them either by using use <module>, only: <symbol> or >> import <symbol> . if the hundreds of use statements in the program are >> factored out / limited in this way, that should reduce the dictionary size >> sufficiently for the program to compile. >> >> For example >> Interface >> Subroutine VecRestoreArrayReadF90(v,array,ierr) >> use petscvecdef >> real(kind=selected_real_kind(10)), pointer :: array(:) >> integer(kind=selected_int_kind(5)) ierr >> type(tVec) v >> End Subroutine >> End Interface >> >> imports all symbols from petscvecdef into the dictionary even though we only >> need tVec . So we can either: >> >> Interface >> Subroutine VecRestoreArrayReadF90(v,array,ierr) >> use petscvecdef, only: tVec >> implicit none >> real(kind=selected_real_kind(10)), pointer :: array(:) >> integer(kind=selected_int_kind(5)) ierr >> type(tVec) v >> End Subroutine >> End Interface >> >> or if use petscvecdef is used in the outer scope, we can: >> Interface >> Subroutine VecRestoreArrayReadF90(v,array,ierr) >> import tVec >> implicit none >> real(kind=selected_real_kind(10)), pointer :: array(:) >> integer(kind=selected_int_kind(5)) ierr >> type(tVec) v >> End Subroutine >> End Interface >> (The two methods (use, only vs import) are equivalent in terms of impact to >> the dictionary.) >> > > Is this compiler ~feature~ something that we intend to work around? Thoughts? > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 > >> Begin forwarded message: >> >> From: "Roy Musselman" <roym...@us.ibm.com <mailto:roym...@us.ibm.com>> >> Subject: Re: Case TS005062693 - XLF: ICE in xlfentry compiling a module with >> 358 subroutines >> Date: March 3, 2021 at 08:23:17 CST >> To: Jacob Faibussowitsch <faibu...@illinois.edu >> <mailto:faibu...@illinois.edu>> >> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov >> <mailto:gyllenha...@llnl.gov>> >> >> Hi Jacob, >> I tried the first suggestion and commented out the use statements called >> within the functions. However, I hit the following error complaining about >> specific symbol dependencies provided by the library. >> >> .../src/vec/f90-mod/petscvecmod.F90", line 107.37: 1514-084 (S) Identifier a >> is being declared with type name tvec which has not been defined in a >> derived type definition. >> >> IBM provided the following additional explanation and example. Can the >> process used to generate these routines and functions determine the specific >> symbols required and then use the only keyword or import statement to >> include them? >> >> When factoring out use statements out of module procedures, you can just >> delete them. But you can't completely remove them from interface blocks. >> Instead, you can limit them either by using use <module>, only: <symbol> or >> import <symbol> . if the hundreds of use statements in the program are >> factored out / limited in this way, that should reduce the dictionary size >> sufficiently for the program to compile. >> >> For example >> Interface >> Subroutine VecRestoreArrayReadF90(v,array,ierr) >> use petscvecdef >> real(kind=selected_real_kind(10)), pointer :: array(:) >> integer(kind=selected_int_kind(5)) ierr >> type(tVec) v >> End Subroutine >> End Interface >> >> imports all symbols from petscvecdef into the dictionary even though we only >> need tVec . So we can either: >> >> Interface >> Subroutine VecRestoreArrayReadF90(v,array,ierr) >> use petscvecdef, only: tVec >> implicit none >> real(kind=selected_real_kind(10)), pointer :: array(:) >> integer(kind=selected_int_kind(5)) ierr >> type(tVec) v >> End Subroutine >> End Interface >> >> or if use petscvecdef is used in the outer scope, we can: >> Interface >> Subroutine VecRestoreArrayReadF90(v,array,ierr) >> import tVec >> implicit none >> real(kind=selected_real_kind(10)), pointer :: array(:) >> integer(kind=selected_int_kind(5)) ierr >> type(tVec) v >> End Subroutine >> End Interface >> (The two methods (use, only vs import) are equivalent in terms of impact to >> the dictionary.) >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Roy Musselman >> IBM HPC Application Analyst at Lawrence Livermore National Lab >> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com> >> LLNL office: 925-422-6033 >> Cell: 507-358-8895, Home: 507-281-9565 >> >> <graycol.gif>Roy Musselman---02/24/2021 07:08:45 PM---Hi Jacob, I opened the >> ticket with IBM: case TS005062693 and and the local LLNL Sierra Jira Ticket >> >> From: Roy Musselman/Rochester/Contr/IBM >> To: Jacob Faibussowitsch <faibu...@illinois.edu >> <mailto:faibu...@illinois.edu>> >> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov >> <mailto:gyllenha...@llnl.gov>> >> Date: 02/24/2021 07:08 PM >> Subject: Re: [EXTERNAL] Case TS005062693 - XLF: ICE in xlfentry compiling a >> module with 358 subroutines >> >> >> >> Hi Jacob, >> I opened the ticket with IBM: case TS005062693 and and the local LLNL Sierra >> Jira Ticket at >> https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues >> <https://urldefense.com/v3/__https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues__;!!DZ3fjg!vDUpTg4q6jg1lQwt37jm9Uzc7MqGrEdrg0wpKgGq9P5JoR3jKrqncOAKyni2BEUYOxQ$> >> >> Today IBM provided the response below. I don't know when I'll have time to >> try it on the reproducer I gave IBM. Perhaps early next week. Can you review >> this and see if it helps? >> >> The reason for the Internal Compiler Error is because we can't grow an >> interal dictionary anymore (ie we hit a 2**21 limit). >> The file contains many module procedures and interfaces that use the same >> helper module. As a result, we are importing the dictionary entries for that >> module repeatedly reaching >> the limit. >> >> Can you please give the following source code workaround a try? >> Since there is already "use petscvecdefdummy" at the module scope, one >> workaround might be to remove the unnecessary "use petscvecdefdummy" in >> vecnotequal and vecequals >> and all similar procedures. >> >> For example, the test case has: >> module petscvecdef >> use petscvecdefdummy >> ... >> function vecnotequal(A,B) >> use petscvecdefdummy >> logical vecnotequal >> type(tVec), intent(in) :: A,B >> vecnotequal = (A%v .ne. B%v) >> end function >> function vecequals(A,B) >> use petscvecdefdummy >> logical vecequals >> type(tVec), intent(in) :: A,B >> vecequals = (A%v .eq. B%v) >> end function >> ... >> end module >> Another workaround would be to put the procedure definitions from this large >> module into several submodules. Each submodule would be able to accommodate >> a dictionary with 2**21 entries. >> >> >> Please let us know if one of the above workarounds resolve the issue. >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Roy Musselman >> IBM HPC Application Analyst at Lawrence Livermore National Lab >> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com> >> LLNL office: 925-422-6033 >> Cell: 507-358-8895, Home: 507-281-9565 >> >> >> <graycol.gif>Roy Musselman---02/21/2021 09:42:55 PM---Hi Jacob, After some >> more experimentation, I think I may have found what is triggering the ICE. It >> >> From: Roy Musselman/Rochester/Contr/IBM >> To: Jacob Faibussowitsch <faibu...@illinois.edu >> <mailto:faibu...@illinois.edu>> >> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov >> <mailto:gyllenha...@llnl.gov>> >> Date: 02/21/2021 09:42 PM >> Subject: Re: [EXTERNAL] Re: xlf90_r Internal Compiler Error >> >> >> Hi Jacob, >> >> After some more experimentation, I think I may have found what is triggering >> the ICE. It doesn't appear to be related to the subroutine name length. I >> think the compiler may be hitting an internal limit of the number of >> subroutines within a module. There are 358 subroutines contained in the >> expanded petscmatmod.F90. Removing 4 subroutines will allow the compile to >> complete successfully, so the limit must be 354 subroutines. Is it possible >> for you to bust up petscmatmod into multiple modules? I'll package up the >> reproducer and pass it on to the compiler development team. >> >> I've asked for user feedback a couple years ago, when the IBM Power9 CORAL-1 >> Sierra systems were deployed, but received minimal responses. DOE is now >> working with Cray (aka HPE) developing the environment for the CORAL-2 >> system (El Capitan). I'll pass your request to the LLNL person I know that >> is dealing with math libraries for CORAL-2. >> >> We use the spack tool to download and build petsc and its specified >> dependencies. I switched between the PETSC versions by changing the PETSCDIR >> variable in the script I shared with you. I've attached a tar ball >> containing the scripts used to build PETSc via spack. >> >> [attachment "bld-petsc-spack.tgz" deleted by Roy >> Musselman/Rochester/Contr/IBM] >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Roy Musselman >> IBM HPC Application Analyst at Lawrence Livermore National Lab >> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com> >> LLNL office: 925-422-6033 >> Cell: 507-358-8895, Home: 507-281-9565 >> >> >> <graycol.gif>Jacob Faibussowitsch ---02/21/2021 12:24:11 PM---Hi Roy, > I'm >> not sure which projects at LLNL are using PETSc or if they chose to build >> their own ve >> >> From: Jacob Faibussowitsch <faibu...@illinois.edu >> <mailto:faibu...@illinois.edu>> >> To: Roy Musselman <roym...@us.ibm.com <mailto:roym...@us.ibm.com>> >> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov >> <mailto:gyllenha...@llnl.gov>> >> Date: 02/21/2021 12:24 PM >> Subject: [EXTERNAL] Re: xlf90_r Internal Compiler Error >> >> >> >> Hi Roy, I'm not sure which projects at LLNL are using PETSc or if they chose >> to build their own version. Entirely unrelated to our problem, but is it >> possible to find this out? It would be great if yes, but also completely >> fine if not. PETSc >> Hi Roy, >> I'm not sure which projects at LLNL are using PETSc or if they chose to >> build their own version. >> Entirely unrelated to our problem, but is it possible to find this out? It >> would be great if yes, but also completely fine if not. PETSc is potentially >> undergoing a rather transformative rewrite over the next few years and we’d >> like to gather current usage data to get a better idea of where PETSc fits >> into our users workflows. But we aren’t sure how to gather this data (we >> don’t particularly want to scrape and silently send it off without users >> consent/knowledge) absent user questionnaires and HPC usage statistics. >> If you are interested, I can share with you the spack recipes I use to build >> petsc with hdf5, hypre, and suplerlu-dist. >> Yes that would be quite useful. I can let it percolate through our dev >> channels for any other recommendations etc. >> 3.14.0 and 3.14.1 >> >> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90", >> line 9.13: 1514-219 (S) Unable to access module symbol file for module >> petscisdefdummy. Check path and file permissions of file. Use association >> not done for this module. >> 1501-511 Compilation failed for file petscvecmod.F90. >> How exactly did you switch between versions? PETSc has 2 types of fortran >> bindings, “ftn-custom” and “ftn-auto” (technically 3 including the F90 >> files, but those simply call either of the two preceding ones), a copy of >> which you will find in every src directory. As the names imply ftn-auto is >> auto generated while ftn-custom is hand-written. >> >> This also means that the ftn-auto files are __not__ tracked by git, so a >> simple git checkout [new-tag] may not properly dispose of the old >> auto-generated files (very rare, but IIRC we made a major enough change to >> the fortran bindings within the last year to warrant having to "make >> deletefortranstubs" before rebuilding). >> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch of >> other warning messages, but it still encounters the ICE. So, I'm uncertain >> if the subroutine name length is the root of the problem. >> Our current compiler flag selection philosophy is to require a minimum but >> choose the maximum available reasonable flag for the compiler (I.e. we >> require C99, but very often you will find that your code is compiled with >> C11 or C17 if they are available). It is therefore odd that configure did >> not use the same methodology for fortran compilers. I will relay this on our >> side. >> Is it possible for you to use subroutines that are less than 32 characters >> and see if that works four you? Have you used other fortran 90 compilers and >> do any of them complain of this? >> Of all of the small quirks fortran has this is probably the most esoteric >> one I’ve come across… I’ve attached a list of all the F90 compilers, and >> their flags which we use in CI/CD (all of which is run multiple times daily >> and __must__ pass). I got them all via grep, so there may be some duplicates >> here or there. As for using shorter names, this is also something we can >> look at, but since none of the other compilers have had issues with this I’m >> not sure this is the change to make. >> Are there any unusual or questionable language constructs used in any of the >> functions mentioned above that may possibly challenge the compiler? >> Not that I am aware of, but again I will ask around our dev channels and see >> if anything comes to mind. >> >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: (312) 694-3391[attachment "compilerList" deleted by Roy >> Musselman/Rochester/Contr/IBM] >> On Feb 20, 2021, at 22:05, Roy Musselman <roym...@us.ibm.com >> <mailto:roym...@us.ibm.com>> wrote: >> Hi Jacob, >> Thanks for letting me know that you are a PETSc developer and that you are >> testing it on the LLNL lassen system. I've used the spack build tool to >> build and deploy a few versions on the systems. I'm not sure which projects >> at LLNL are using PETSc or if they chose to build their own version. I did >> however provide a single precision version upon request that was integrated >> with MVAPICH2-MPI instead of the IBM-provided Spectrum-MPI. Here's what's >> available on the systems today. >> >> > ml avail petsc >> ----------------------------------------------------- >> /usr/tcetmp/modulefiles/Core >> ----------------------------------------------------- >> petsc/default petsc/3.10.2 petsc/3.11.3 petsc/3.13.0 (D) >> petsc/3.13.1-mvapich2-2020.01.09-xl-2020.03.18.single >> >> If you are interested, I can share with you the spack recipes I use to build >> petsc with hdf5, hypre, and suplerlu-dist. >> >> After several attempts I was able to reproduce the Internal Compiler Errro >> (ICE) that you are seeing using version 3.14.4. I've whittled it down to the >> petscmatmod.F90 file and it's specific dependencies. >> The following script is what I'm using. Note that in the 2nd set of >> compiles, the -E option is used to expand all included source files and >> headers and encapsulating it into a single large source file. This can be >> used to help isolate the source of the problem. >> >> #!/bin/bash >> >> PETSCDIR="../roymuss/spack-stage-petsc-3.14.4-eh5arny7l3cqjlltlfpjp6f4jofbnmz6/spack-src" >> >> OPTIONS=" -qmoddir=moddir -I$PETSCDIR/arch-linux-c-opt/include >> -I$PETSCDIR/include" >> mkdir -p moddir >> >> set -x >> >> # Compile original source files including dependencies >> if [ 0 = 1 ]; then >> mpif90 -c -g $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o >> petscsysmod.o >> mpif90 -c -g $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o >> petscvecmod.o >> mpif90 -c -g $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o >> petscmatmod.o >> fi >> >> # Use -E option to expand source into full source files >> if [ 0 = 1 ]; then >> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o >> full_petscsysmod.F90 >> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o >> full_petscvecmod.F90 >> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o >> full_petscmatmod.F90 >> fi >> >> # Compile from full source files >> if [ 1 = 1 ]; then >> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscsysmod.F90 -o >> full_petscsysmod.o >> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscvecmod.F90 -o >> full_petscvecmod.o >> mpif90 -V -c -g -Imoddir -qmoddir=moddir full_petscmatmod.F90 -o >> full_petscmatmod.o >> fi >> >> <eof> >> >> Petsc 3.13.6 it the most recent version that did not fail. I tried all >> subsequent versions and got the folowing results: >> >> 3.14.0 and 3.14.1 >> >> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90", >> line 9.13: 1514-219 (S) Unable to access module symbol file for module >> petscisdefdummy. Check path and file permissions of file. Use association >> not done for this module. >> 1501-511 Compilation failed for file petscvecmod.F90. >> >> 3.14.2, 3.14.3, and 3.14.4 >> >> . . . >> ** matnullspaceequals === End of Compilation 8 === >> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': >> free(): invalid pointer: 0x0000200001740018 *** >> >> Examining the tail end of petscmatmod.F90 >> >> >> 80 function matnullspaceequals(A,B) >> 81 use petscmatdefdummy >> 82 logical matnullspaceequals >> 83 type(tMatNullSpace), intent(in) :: A,B >> 84 matnullspaceequals = (A%v .eq. B%v) >> 85 end function >> 86 >> 87 #if defined(_WIN32) && defined(PETSC_USE_SHARED_LIBRARIES) >> 88 !DEC$ ATTRIBUTES DLLEXPORT::matnotequal >> 89 !DEC$ ATTRIBUTES DLLEXPORT::matequals >> 90 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringnotequal >> 91 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringequals >> 92 !DEC$ ATTRIBUTES DLLEXPORT::matnullspacenotequal >> 93 !DEC$ ATTRIBUTES DLLEXPORT::matnullspaceequals >> 94 #endif >> 95 module petscmat >> 96 use petscmatdef >> 97 use petscvec >> 98 #include <../src/mat/f90-mod/petscmat.h90> >> 99 interface >> 100 #include <../src/mat/f90-mod/ftn-auto-interfaces/petscmat.h90> >> 101 end interface >> 102 end module >> 103 >> >> Compiling the matnullspaceequals function was successful just before hitting >> the error. The error goes away when removing either or both of the #include >> lines 98 and 100. Both #include statements are required to produce the >> error. The 3.13.6 and 3.14.4 version of the file identified in the first >> #include at line 98 are identical. The file identified in line 100 is >> different between 3.13.6 and 3.14.4. >> Just looking at the list of subroutines contained within each version, the >> following are the differences. >> >> Old subroutines available in 3.13.6 but removed from 4.14.4 >> subroutine MatFreeIntermediateDataStructures(a,z) >> >> New subroutines available in 4.14.4 but not contained in 3.13.6 >> subroutine MatDenseReplaceArray(a,b,z) >> subroutine MatIsShell(a,b,z) >> subroutine MatRARtMultEqual(a,b,c,d,e,z) >> subroutine MatScaLAPACKGetBlockSizes(a,b,c,z) >> subroutine MatScaLAPACKSetBlockSizes(a,b,c,z) >> subroutine MatSeqAIJCUSPARSESetGenerateTranspose(a,b,z) >> subroutine MatSeqAIJSetTotalPreallocation(a,b,z) >> subroutine MatSetLayouts(a,b,c,z) >> >> Methodically removing the new subroutines did not provide a consistent >> result. But I did notice the extra long subroutine name >> MatSeqAIJCUSPARSESetGenerateTranspose had 37 characters. >> A little research found: In Fortran 90/95 the maximum length was 31 >> characters, in Fortran 2003 it is now 63 characters. I found the following >> subroutines with greater than 31 characters >> >> subroutine MatCreateMPIMatConcatenateSeqMat >> subroutine MatFactorFactorizeSchurComplement >> subroutine MatMPIAdjCreateNonemptySubcommMat >> subroutine MatSeqAIJCUSPARSESetGenerateTranspose >> subroutine MatMPIAIJSetUseScalableIncreaseOverlap >> subroutine MatFactorSolveSchurComplementTranspose >> >> I individually ifdef'd them out of the source file and was able to compile >> the files successfully without encountering the ICE. >> >> I'm not exactly sure what the maximum subroutine name length that the XLF >> compiler allows, but if it is only 31, it would be useful if the compiler >> detected this and issue a message instead of the ICE. >> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch of >> other warning messages, but it still encounters the ICE. So, I'm uncertain >> if the subroutine name length is the root of the problem. >> >> Is it possible for you to use subroutines that are less than 32 characters >> and see if that works four you? Have you used other fortran 90 compilers and >> do any of them complain of this? >> Are there any unusual or questionable language constructs used in any of the >> functions mentioned above that may possibly challenge the compiler? >> >> I'll package this up and send it to the IBM XL compiler development team for >> their examination and comment. >> >> Best Regards, >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Roy Musselman >> IBM HPC Application Analyst at Lawrence Livermore National Lab >> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com> >> LLNL office: 925-422-6033 >> Cell: 507-358-8895, Home: 507-281-9565 >> >> <graycol.gif>Jacob Faibussowitsch ---02/18/2021 02:17:05 PM---> The most >> recently built version available on the CORAL systems is 3.13.0. (ml load >> petsc/3.13.0) W >> >> From: Jacob Faibussowitsch <faibu...@illinois.edu >> <mailto:faibu...@illinois.edu>> >> To: Roy Musselman <roym...@us.ibm.com <mailto:roym...@us.ibm.com>> >> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov >> <mailto:gyllenha...@llnl.gov>> >> Date: 02/18/2021 02:17 PM >> Subject: [EXTERNAL] Re: xlf90_r Internal Compiler Error >> >> >> >> >> >> The most recently built version available on the CORAL systems... >> This Message Is From an External Sender >> This message came from outside your organization. >> The most recently built version available on the CORAL systems is 3.13.0. >> (ml load petsc/3.13.0) Will that work for you? >> I am building petsc from source as part of development work on petsc itself >> so modules are unfortunately not useful here. >> The files you sent me do not contain all the dependencies (other mod files) >> required to reproduce the error. >> I'll attempt to build version 3.14.4 from scratch and recreate the failing >> symptom you are observing. >> Yes, petsc uses an automated system to generate the fortran files from C >> which goes about 20 rabbit holes deeper than I was willing to dig. Let me >> know if you run into trouble configuring and building petsc, I can point you >> in the right direction. I’ve attached a “reconfigure” script with this >> email, it contains all of the arguments I used to configure petsc >> successfully on Lassen. If you place it into your $PETSC_DIR (i.e. the >> folder titled “petsc” and that contains a “configure” file) and run: >> >> $ python3 ./reconfigure-arch-linux-c-debug.py >> >> It should work. If not, you will have to >> >> $ ./configure —all-the-args —in-the-reconfigure —file >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: (312) 694-3391[attachment "reconfigure-arch-linux-c-debug.py" deleted >> by Roy Musselman/Rochester/Contr/IBM] >> On Feb 18, 2021, at 15:07, Roy Musselman <roym...@us.ibm.com >> <mailto:roym...@us.ibm.com>> wrote: >> Hi Jacob, >> >> The source file appears to come from the PETSc 3.14.4 library. The most >> recently built version available on the CORAL systems is 3.13.0. (ml load >> petsc/3.13.0) Will that work for you? >> The files you sent me do not contain all the dependencies (other mod files) >> required to reproduce the error. >> I'll attempt to build version 3.14.4 from scratch and recreate the failing >> symptom you are observing. >> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Roy Musselman >> IBM HPC Application Analyst at Lawrence Livermore National Lab >> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com> >> LLNL office: 925-422-6033 >> Cell: 507-358-8895, Home: 507-281-9565 >> >> <graycol.gif>Roy Musselman---02/18/2021 11:18:20 AM---I'll take a look. >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Roy Musselman >> >> From: Roy Musselman/Rochester/Contr/IBM >> To: LC Hotline <lc-hotl...@llnl.gov <mailto:lc-hotl...@llnl.gov>> >> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov >> <mailto:gyllenha...@llnl.gov>> >> Date: 02/18/2021 11:18 AM >> Subject: Re: [EXTERNAL] FW: xlf90_r Internal Compiler Error >> >> >> >> >> >> I'll take a look. >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Roy Musselman >> IBM HPC Application Analyst at Lawrence Livermore National Lab >> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com> >> LLNL office: 925-422-6033 >> Cell: 507-358-8895, Home: 507-281-9565 >> >> >> <graycol.gif>LC Hotline ---02/18/2021 11:03:55 AM---Hi John, Roy, Can you >> help this user with the problem that he is seeing when he tries to build with >> >> From: LC Hotline <lc-hotl...@llnl.gov <mailto:lc-hotl...@llnl.gov>> >> To: "Gyllenhaal, John C." <gyllenha...@llnl.gov >> <mailto:gyllenha...@llnl.gov>>, Roy Musselman <roym...@us.ibm.com >> <mailto:roym...@us.ibm.com>> >> Date: 02/18/2021 11:03 AM >> Subject: [EXTERNAL] FW: xlf90_r Internal Compiler Error >> >> >> >> Hi John, Roy, Can you help this user with the problem that he is... >> This Message Is From an External Sender >> This message came from outside your organization. >> Hi John, Roy, >> >> Can you help this user with the problem that he is seeing when he tries to >> build with xlf90 on Lassen? >> >> Thanks, >> Ryan >> -- >> LC Hotline >> >> From: Jacob Faibussowitsch <faibu...@illinois.edu >> <mailto:faibu...@illinois.edu>> >> Date: Wednesday, February 17, 2021 at 5:27 PM >> To: LC Hotline <lc-hotl...@llnl.gov <mailto:lc-hotl...@llnl.gov>> >> Subject: xlf90_r Internal Compiler Error >> >> Hello LC Support, >> >> While compiling my application on Lassen I seem have run afoul of the xlf90 >> mpi compiler wrapper with the following error: >> >> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': >> free(): invalid pointer: 0x0000200001740018 *** >> >> I’m fairly certain this isn’t my fault as this is code that compiles >> regularly on extensive CI/CD under various other compilers and machines, but >> you can never rule it out. I have included a verbose full log of my make run >> (which includes a comprehensive rundown of the environment) as well as a >> separate file containing the error message and stack trace from the >> compiler. Additionally I have also included the file which I believe is >> causing the error. Let me know if there is anything else I should send. >> >> P.S. My list of loaded modules: >> >> Currently Loaded Modules: >> 1) StdEnv (S) 4) cuda/11.1.1 7) valgrind/3.16.1 >> 2) clang/ibm-11.0.0 5) python/3.8.2 8) lapack/3.9.0-xl-2020.11.12 >> 3) spectrum-mpi/rolling-release 6) cmake/3.18.0 9) hip/3.0.0 >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: (312) 694-3391[attachment "errorReport.zip" deleted by Roy >> Musselman/Rochester/Contr/IBM] >