PETSc stacks the Fortran modules in the same way it stacks the C include 
files.   So the TAO module includes all the Fortran modules below it etc.  It 
would be nearly impossible to disentangle the bits and pieces without 
introducing a more painful user experience. For example use PCTypes, use 
PCFunctions, use KSPTypes, .... impossible to use and impossible to maintain.

   This is a completely artificial bug of IBM's own making in their compiler 
that we should not have to work around.

   Barry


> On Mar 3, 2021, at 12:10 PM, Jacob Faibussowitsch <jacob....@gmail.com> wrote:
> 
> Hello All,
> 
> I discovered a compiler bug in the IBM xl fortran compiler a few weeks ago 
> that would crash the compiler when compiling petsc fortran interfaces. The 
> TL;DR of it is that the xl compiler creates a function dictionary for every 
> function imported in fortran modules, and since petsc fortran interfaces seem 
> to import entire packages writ-large this exceeds the number of dictionary 
> entries (2**21):
> 
>> The reason for the Internal Compiler Error is because we can't grow an 
>> interal dictionary anymore (ie we hit a 2**21 limit).
>> The file contains many module procedures and interfaces that use the same 
>> helper module. As a result, we are importing the dictionary entries for that 
>> module repeatedly reaching 
>> the limit.
>>  
>> Can you please give the following source code workaround a try?
>> Since there is already "use petscvecdefdummy" at the module scope, one 
>> workaround might be to remove the unnecessary "use petscvecdefdummy" in 
>> vecnotequal and vecequals 
>> and all similar procedures.
>>  
>> For example, the test case has:
>>         module petscvecdef
>>         use petscvecdefdummy
>> ...
>>         function vecnotequal(A,B)
>>           use petscvecdefdummy
>>           logical vecnotequal
>>           type(tVec), intent(in) :: A,B
>>           vecnotequal = (A%v .ne. B%v)
>>         end function
>>         function vecequals(A,B)
>>           use petscvecdefdummy
>>           logical vecequals
>>           type(tVec), intent(in) :: A,B
>>           vecequals = (A%v .eq. B%v)
>>         end function
>> ...
>> end module
>> Another workaround would be to put the procedure definitions from this large 
>> module into several submodules.  Each submodule would be able to accommodate 
>> a dictionary with 2**21 entries.
>>  
>>  
>> Please let us know if one of the above workarounds resolve the issue.
> 
> 
> The proposed fix from IBM would be to pull “use moduleXXX” out of subroutines 
> or to have our auto-fortran interfaces detect which symbols to include from 
> the respective modules and only include those in the subroutines. I’m not 
> familiar at all with how the interfaces are generated so I don’t even know if 
> this is possible.
>> IBM provided the following additional explanation and example. Can the 
>> process used to generate these routines and functions determine the specific 
>> symbols required and then use the only keyword or import statement to 
>> include them?
>> 
>>  When factoring out use statements out of module procedures, you can just 
>> delete them.  But you can't completely remove them from interface blocks.  
>> Instead, you can limit them either by using use <module>, only: <symbol> or 
>> import <symbol> .  if the hundreds of use statements in the program are 
>> factored out / limited in this way, that should reduce the dictionary size 
>> sufficiently for the program to compile.
>>  
>> For example
>>       Interface
>>         Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>           use petscvecdef
>>           real(kind=selected_real_kind(10)), pointer :: array(:)
>>           integer(kind=selected_int_kind(5)) ierr
>>           type(tVec)     v
>>         End Subroutine
>>       End Interface
>>  
>> imports all symbols from petscvecdef into the dictionary even though we only 
>> need tVec .  So we can either:
>>  
>>       Interface
>>         Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>           use petscvecdef, only: tVec
>>           implicit none
>>           real(kind=selected_real_kind(10)), pointer :: array(:)
>>           integer(kind=selected_int_kind(5)) ierr
>>           type(tVec)     v
>>         End Subroutine
>>       End Interface
>>  
>> or if use petscvecdef is used in the outer scope, we can:
>>       Interface
>>         Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>           import tVec
>>           implicit none
>>           real(kind=selected_real_kind(10)), pointer :: array(:)
>>           integer(kind=selected_int_kind(5)) ierr
>>           type(tVec)     v
>>         End Subroutine
>>       End Interface
>> (The two methods (use, only vs import) are equivalent in terms of impact to 
>> the dictionary.)
>> 
> 
> Is this compiler ~feature~ something that we intend to work around? Thoughts?
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
> 
>> Begin forwarded message:
>> 
>> From: "Roy Musselman" <roym...@us.ibm.com <mailto:roym...@us.ibm.com>>
>> Subject: Re: Case TS005062693 - XLF: ICE in xlfentry compiling a module with 
>> 358 subroutines
>> Date: March 3, 2021 at 08:23:17 CST
>> To: Jacob Faibussowitsch <faibu...@illinois.edu 
>> <mailto:faibu...@illinois.edu>>
>> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov 
>> <mailto:gyllenha...@llnl.gov>>
>> 
>> Hi Jacob, 
>> I tried the first suggestion and commented out the use statements called 
>> within the functions. However, I hit the following error complaining about 
>> specific symbol dependencies provided by the library.
>> 
>> .../src/vec/f90-mod/petscvecmod.F90", line 107.37: 1514-084 (S) Identifier a 
>> is being declared with type name tvec which has not been defined in a 
>> derived type definition. 
>> 
>> IBM provided the following additional explanation and example. Can the 
>> process used to generate these routines and functions determine the specific 
>> symbols required and then use the only keyword or import statement to 
>> include them?
>> 
>>  When factoring out use statements out of module procedures, you can just 
>> delete them.  But you can't completely remove them from interface blocks.  
>> Instead, you can limit them either by using use <module>, only: <symbol> or 
>> import <symbol> .  if the hundreds of use statements in the program are 
>> factored out / limited in this way, that should reduce the dictionary size 
>> sufficiently for the program to compile.
>>  
>> For example
>>       Interface
>>         Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>           use petscvecdef
>>           real(kind=selected_real_kind(10)), pointer :: array(:)
>>           integer(kind=selected_int_kind(5)) ierr
>>           type(tVec)     v
>>         End Subroutine
>>       End Interface
>>  
>> imports all symbols from petscvecdef into the dictionary even though we only 
>> need tVec .  So we can either:
>>  
>>       Interface
>>         Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>           use petscvecdef, only: tVec
>>           implicit none
>>           real(kind=selected_real_kind(10)), pointer :: array(:)
>>           integer(kind=selected_int_kind(5)) ierr
>>           type(tVec)     v
>>         End Subroutine
>>       End Interface
>>  
>> or if use petscvecdef is used in the outer scope, we can:
>>       Interface
>>         Subroutine VecRestoreArrayReadF90(v,array,ierr)
>>           import tVec
>>           implicit none
>>           real(kind=selected_real_kind(10)), pointer :: array(:)
>>           integer(kind=selected_int_kind(5)) ierr
>>           type(tVec)     v
>>         End Subroutine
>>       End Interface
>> (The two methods (use, only vs import) are equivalent in terms of impact to 
>> the dictionary.)
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Roy Musselman
>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com>
>> LLNL office: 925-422-6033
>> Cell: 507-358-8895, Home: 507-281-9565
>> 
>> <graycol.gif>Roy Musselman---02/24/2021 07:08:45 PM---Hi Jacob, I opened the 
>> ticket with IBM: case TS005062693 and and the local LLNL Sierra Jira Ticket
>> 
>> From:  Roy Musselman/Rochester/Contr/IBM
>> To:  Jacob Faibussowitsch <faibu...@illinois.edu 
>> <mailto:faibu...@illinois.edu>>
>> Cc:  "Gyllenhaal, John C." <gyllenha...@llnl.gov 
>> <mailto:gyllenha...@llnl.gov>>
>> Date:  02/24/2021 07:08 PM
>> Subject:  Re: [EXTERNAL] Case TS005062693 - XLF: ICE in xlfentry compiling a 
>> module with 358 subroutines
>> 
>> 
>> 
>> Hi Jacob, 
>> I opened the ticket with IBM: case TS005062693 and and the local LLNL Sierra 
>> Jira Ticket at
>> https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues 
>> <https://urldefense.com/v3/__https://lc.llnl.gov/jira/projects/SIERRA/issues/SIERRA-111?filter=allissues__;!!DZ3fjg!vDUpTg4q6jg1lQwt37jm9Uzc7MqGrEdrg0wpKgGq9P5JoR3jKrqncOAKyni2BEUYOxQ$>
>> 
>> Today IBM provided the response below. I don't know when I'll have time to 
>> try it on the reproducer I gave IBM. Perhaps early next week. Can you review 
>> this and see if it helps? 
>> 
>>  The reason for the Internal Compiler Error is because we can't grow an 
>> interal dictionary anymore (ie we hit a 2**21 limit).
>> The file contains many module procedures and interfaces that use the same 
>> helper module. As a result, we are importing the dictionary entries for that 
>> module repeatedly reaching 
>> the limit.
>>  
>> Can you please give the following source code workaround a try?
>> Since there is already "use petscvecdefdummy" at the module scope, one 
>> workaround might be to remove the unnecessary "use petscvecdefdummy" in 
>> vecnotequal and vecequals 
>> and all similar procedures.
>>  
>> For example, the test case has:
>>         module petscvecdef
>>         use petscvecdefdummy
>> ...
>>         function vecnotequal(A,B)
>>           use petscvecdefdummy
>>           logical vecnotequal
>>           type(tVec), intent(in) :: A,B
>>           vecnotequal = (A%v .ne. B%v)
>>         end function
>>         function vecequals(A,B)
>>           use petscvecdefdummy
>>           logical vecequals
>>           type(tVec), intent(in) :: A,B
>>           vecequals = (A%v .eq. B%v)
>>         end function
>> ...
>> end module
>> Another workaround would be to put the procedure definitions from this large 
>> module into several submodules.  Each submodule would be able to accommodate 
>> a dictionary with 2**21 entries.
>>  
>>  
>> Please let us know if one of the above workarounds resolve the issue.
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Roy Musselman
>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com>
>> LLNL office: 925-422-6033
>> Cell: 507-358-8895, Home: 507-281-9565
>> 
>> 
>> <graycol.gif>Roy Musselman---02/21/2021 09:42:55 PM---Hi Jacob, After some 
>> more experimentation, I think I may have found what is triggering the ICE. It
>> 
>> From:  Roy Musselman/Rochester/Contr/IBM
>> To:  Jacob Faibussowitsch <faibu...@illinois.edu 
>> <mailto:faibu...@illinois.edu>>
>> Cc:  "Gyllenhaal, John C." <gyllenha...@llnl.gov 
>> <mailto:gyllenha...@llnl.gov>>
>> Date:  02/21/2021 09:42 PM
>> Subject:  Re: [EXTERNAL] Re: xlf90_r Internal Compiler Error
>> 
>> 
>> Hi Jacob, 
>> 
>> After some more experimentation, I think I may have found what is triggering 
>> the ICE. It doesn't appear to be related to the subroutine name length. I 
>> think the compiler may be hitting an internal limit of the number of 
>> subroutines within a module. There are 358 subroutines contained in the 
>> expanded petscmatmod.F90. Removing 4 subroutines will allow the compile to 
>> complete successfully, so the limit must be 354 subroutines. Is it possible 
>> for you to bust up petscmatmod into multiple modules? I'll package up the 
>> reproducer and pass it on to the compiler development team.
>> 
>> I've asked for user feedback a couple years ago, when the IBM Power9 CORAL-1 
>> Sierra systems were deployed, but received minimal responses. DOE is now 
>> working with Cray (aka HPE) developing the environment for the CORAL-2 
>> system (El Capitan). I'll pass your request to the LLNL person I know that 
>> is dealing with math libraries for CORAL-2.
>> 
>> We use the spack tool to download and build petsc and its specified 
>> dependencies. I switched between the PETSC versions by changing the PETSCDIR 
>> variable in the script I shared with you. I've attached a tar ball 
>> containing the scripts used to build PETSc via spack.
>> 
>> [attachment "bld-petsc-spack.tgz" deleted by Roy 
>> Musselman/Rochester/Contr/IBM] 
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Roy Musselman
>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com>
>> LLNL office: 925-422-6033
>> Cell: 507-358-8895, Home: 507-281-9565
>> 
>> 
>> <graycol.gif>Jacob Faibussowitsch ---02/21/2021 12:24:11 PM---Hi Roy, > I'm 
>> not sure which projects at LLNL are using PETSc or if they chose to build 
>> their own ve
>> 
>> From:  Jacob Faibussowitsch <faibu...@illinois.edu 
>> <mailto:faibu...@illinois.edu>>
>> To:  Roy Musselman <roym...@us.ibm.com <mailto:roym...@us.ibm.com>>
>> Cc:  "Gyllenhaal, John C." <gyllenha...@llnl.gov 
>> <mailto:gyllenha...@llnl.gov>>
>> Date:  02/21/2021 12:24 PM
>> Subject:  [EXTERNAL] Re: xlf90_r Internal Compiler Error
>> 
>> 
>> 
>> Hi Roy, I'm not sure which projects at LLNL are using PETSc or if they chose 
>> to build their own version. Entirely unrelated to our problem, but is it 
>> possible to find this out? It would be great if yes, but also completely 
>> fine if not. PETSc 
>> Hi Roy,
>> I'm not sure which projects at LLNL are using PETSc or if they chose to 
>> build their own version.
>> Entirely unrelated to our problem, but is it possible to find this out? It 
>> would be great if yes, but also completely fine if not. PETSc is potentially 
>> undergoing a rather transformative rewrite over the next few years and we’d 
>> like to gather current usage data to get a better idea of where PETSc fits 
>> into our users workflows. But we aren’t sure how to gather this data (we 
>> don’t particularly want to scrape and silently send it off without users 
>> consent/knowledge) absent user questionnaires and HPC usage statistics.
>> If you are interested, I can share with you the spack recipes I use to build 
>> petsc with hdf5, hypre, and suplerlu-dist.
>> Yes that would be quite useful. I can let it percolate through our dev 
>> channels for any other recommendations etc.
>> 3.14.0 and 3.14.1
>> 
>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90",
>>  line 9.13: 1514-219 (S) Unable to access module symbol file for module 
>> petscisdefdummy. Check path and file permissions of file. Use association 
>> not done for this module.
>> 1501-511 Compilation failed for file petscvecmod.F90.
>> How exactly did you switch between versions? PETSc has 2 types of fortran 
>> bindings, “ftn-custom” and “ftn-auto” (technically 3 including the F90 
>> files, but those simply call either of the two preceding ones), a copy of 
>> which you will find in every src directory. As the names imply ftn-auto is 
>> auto generated while ftn-custom is hand-written. 
>> 
>> This also means that the ftn-auto files are __not__ tracked by git, so a 
>> simple git checkout [new-tag] may not properly dispose of the old 
>> auto-generated files (very rare, but IIRC we made a major enough change to 
>> the fortran bindings within the last year to warrant having to "make 
>> deletefortranstubs" before rebuilding).
>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch of 
>> other warning messages, but it still encounters the ICE. So, I'm uncertain 
>> if the subroutine name length is the root of the problem. 
>> Our current compiler flag selection philosophy is to require a minimum but 
>> choose the maximum available reasonable flag for the compiler (I.e. we 
>> require C99, but very often you will find that your code is compiled with 
>> C11 or C17 if they are available). It is therefore odd that configure did 
>> not use the same methodology for fortran compilers. I will relay this on our 
>> side.
>> Is it possible for you to use subroutines that are less than 32 characters 
>> and see if that works four you? Have you used other fortran 90 compilers and 
>> do any of them complain of this? 
>> Of all of the small quirks fortran has this is probably the most esoteric 
>> one I’ve come across… I’ve attached a list of all the F90 compilers, and 
>> their flags which we use in CI/CD (all of which is run multiple times daily 
>> and __must__ pass). I got them all via grep, so there may be some duplicates 
>> here or there. As for using shorter names, this is also something we can 
>> look at, but since none of the other compilers have had issues with this I’m 
>> not sure this is the change to make.
>> Are there any unusual or questionable language constructs used in any of the 
>> functions mentioned above that may possibly challenge the compiler? 
>> Not that I am aware of, but again I will ask around our dev channels and see 
>> if anything comes to mind.
>> 
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> Cell: (312) 694-3391[attachment "compilerList" deleted by Roy 
>> Musselman/Rochester/Contr/IBM] 
>> On Feb 20, 2021, at 22:05, Roy Musselman <roym...@us.ibm.com 
>> <mailto:roym...@us.ibm.com>> wrote:
>> Hi Jacob,
>> Thanks for letting me know that you are a PETSc developer and that you are 
>> testing it on the LLNL lassen system. I've used the spack build tool to 
>> build and deploy a few versions on the systems. I'm not sure which projects 
>> at LLNL are using PETSc or if they chose to build their own version. I did 
>> however provide a single precision version upon request that was integrated 
>> with MVAPICH2-MPI instead of the IBM-provided Spectrum-MPI. Here's what's 
>> available on the systems today.
>> 
>> > ml avail petsc
>> ----------------------------------------------------- 
>> /usr/tcetmp/modulefiles/Core 
>> -----------------------------------------------------
>> petsc/default petsc/3.10.2 petsc/3.11.3 petsc/3.13.0 (D)  
>> petsc/3.13.1-mvapich2-2020.01.09-xl-2020.03.18.single
>> 
>> If you are interested, I can share with you the spack recipes I use to build 
>> petsc with hdf5, hypre, and suplerlu-dist.
>> 
>> After several attempts I was able to reproduce the Internal Compiler Errro 
>> (ICE) that you are seeing using version 3.14.4. I've whittled it down to the 
>> petscmatmod.F90 file and it's specific dependencies. 
>> The following script is what I'm using. Note that in the 2nd set of 
>> compiles, the -E option is used to expand all included source files and 
>> headers and encapsulating it into a single large source file. This can be 
>> used to help isolate the source of the problem.  
>> 
>> #!/bin/bash
>> 
>> PETSCDIR="../roymuss/spack-stage-petsc-3.14.4-eh5arny7l3cqjlltlfpjp6f4jofbnmz6/spack-src"
>>  
>> OPTIONS=" -qmoddir=moddir -I$PETSCDIR/arch-linux-c-opt/include 
>> -I$PETSCDIR/include"
>> mkdir -p moddir
>> 
>> set -x 
>> 
>> # Compile original source files including dependencies
>> if [ 0 = 1 ]; then
>> mpif90 -c -g $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o 
>> petscsysmod.o 
>> mpif90 -c -g $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o 
>> petscvecmod.o
>> mpif90 -c -g $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o 
>> petscmatmod.o
>> fi
>> 
>> # Use -E option to expand source into full source files
>> if [ 0 = 1 ]; then
>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/sys/f90-mod/petscsysmod.F90 -o 
>> full_petscsysmod.F90
>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/vec/f90-mod/petscvecmod.F90 -o 
>> full_petscvecmod.F90
>> mpif90 -c -g -E $OPTIONS $PETSCDIR/src/mat/f90-mod/petscmatmod.F90 -o 
>> full_petscmatmod.F90
>> fi
>> 
>> # Compile from full source files
>> if [ 1 = 1 ]; then
>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscsysmod.F90 -o 
>> full_petscsysmod.o
>> mpif90 -c -g -Imoddir -qmoddir=moddir full_petscvecmod.F90 -o 
>> full_petscvecmod.o
>> mpif90 -V -c -g -Imoddir -qmoddir=moddir full_petscmatmod.F90 -o 
>> full_petscmatmod.o
>> fi
>> 
>> <eof>
>> 
>> Petsc 3.13.6 it the most recent version that did not fail. I tried all 
>> subsequent versions and got the folowing results: 
>> 
>> 3.14.0 and 3.14.1
>> 
>> "../roymuss/spack-stage-petsc-3.14.0-on3lboy4slkz65tsjttgfmwghzky54jj/spack-src/src/vec/f90-mod/petscvecmod.F90",
>>  line 9.13: 1514-219 (S) Unable to access module symbol file for module 
>> petscisdefdummy. Check path and file permissions of file. Use association 
>> not done for this module.
>> 1501-511 Compilation failed for file petscvecmod.F90.
>> 
>> 3.14.2, 3.14.3, and 3.14.4
>> 
>> . . .
>> ** matnullspaceequals === End of Compilation 8 ===
>> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': 
>> free(): invalid pointer: 0x0000200001740018 ***
>> 
>> Examining the tail end of petscmatmod.F90
>> 
>> 
>> 80 function matnullspaceequals(A,B)
>> 81 use petscmatdefdummy
>> 82 logical matnullspaceequals
>> 83 type(tMatNullSpace), intent(in) :: A,B
>> 84 matnullspaceequals = (A%v .eq. B%v)
>> 85 end function
>> 86 
>> 87 #if defined(_WIN32) && defined(PETSC_USE_SHARED_LIBRARIES)
>> 88 !DEC$ ATTRIBUTES DLLEXPORT::matnotequal
>> 89 !DEC$ ATTRIBUTES DLLEXPORT::matequals
>> 90 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringnotequal
>> 91 !DEC$ ATTRIBUTES DLLEXPORT::matfdcoloringequals
>> 92 !DEC$ ATTRIBUTES DLLEXPORT::matnullspacenotequal
>> 93 !DEC$ ATTRIBUTES DLLEXPORT::matnullspaceequals
>> 94 #endif
>> 95 module petscmat
>> 96 use petscmatdef
>> 97 use petscvec
>> 98 #include <../src/mat/f90-mod/petscmat.h90>
>> 99 interface
>> 100 #include <../src/mat/f90-mod/ftn-auto-interfaces/petscmat.h90>
>> 101 end interface
>> 102 end module
>> 103 
>> 
>> Compiling the matnullspaceequals function was successful just before hitting 
>> the error. The error goes away when removing either or both of the #include 
>> lines 98 and 100. Both #include statements are required to produce the 
>> error. The 3.13.6 and 3.14.4 version of the file identified in the first 
>> #include at line 98 are identical. The file identified in line 100 is 
>> different between 3.13.6 and 3.14.4.
>> Just looking at the list of subroutines contained within each version, the 
>> following are the differences. 
>> 
>> Old subroutines available in 3.13.6 but removed from 4.14.4
>> subroutine MatFreeIntermediateDataStructures(a,z)
>> 
>> New subroutines available in 4.14.4 but not contained in 3.13.6 
>> subroutine MatDenseReplaceArray(a,b,z)
>> subroutine MatIsShell(a,b,z)
>> subroutine MatRARtMultEqual(a,b,c,d,e,z)
>> subroutine MatScaLAPACKGetBlockSizes(a,b,c,z)
>> subroutine MatScaLAPACKSetBlockSizes(a,b,c,z)
>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose(a,b,z)
>> subroutine MatSeqAIJSetTotalPreallocation(a,b,z)
>> subroutine MatSetLayouts(a,b,c,z)
>> 
>> Methodically removing the new subroutines did not provide a consistent 
>> result. But I did notice the extra long subroutine name 
>> MatSeqAIJCUSPARSESetGenerateTranspose had 37 characters.
>> A little research found: In Fortran 90/95 the maximum length was 31 
>> characters, in Fortran 2003 it is now 63 characters. I found the following 
>> subroutines with greater than 31 characters
>> 
>> subroutine MatCreateMPIMatConcatenateSeqMat
>> subroutine MatFactorFactorizeSchurComplement
>> subroutine MatMPIAdjCreateNonemptySubcommMat
>> subroutine MatSeqAIJCUSPARSESetGenerateTranspose
>> subroutine MatMPIAIJSetUseScalableIncreaseOverlap
>> subroutine MatFactorSolveSchurComplementTranspose
>> 
>> I individually ifdef'd them out of the source file and was able to compile 
>> the files successfully without encountering the ICE. 
>> 
>> I'm not exactly sure what the maximum subroutine name length that the XLF 
>> compiler allows, but if it is only 31, it would be useful if the compiler 
>> detected this and issue a message instead of the ICE.
>> Adding the option -qlanglvl=2003std or -qlanglvl=2008std produces a bunch of 
>> other warning messages, but it still encounters the ICE. So, I'm uncertain 
>> if the subroutine name length is the root of the problem. 
>> 
>> Is it possible for you to use subroutines that are less than 32 characters 
>> and see if that works four you? Have you used other fortran 90 compilers and 
>> do any of them complain of this? 
>> Are there any unusual or questionable language constructs used in any of the 
>> functions mentioned above that may possibly challenge the compiler? 
>> 
>> I'll package this up and send it to the IBM XL compiler development team for 
>> their examination and comment. 
>> 
>> Best Regards,
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Roy Musselman
>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com>
>> LLNL office: 925-422-6033
>> Cell: 507-358-8895, Home: 507-281-9565
>> 
>> <graycol.gif>Jacob Faibussowitsch ---02/18/2021 02:17:05 PM---> The most 
>> recently built version available on the CORAL systems is 3.13.0. (ml load 
>> petsc/3.13.0) W
>> 
>> From:  Jacob Faibussowitsch <faibu...@illinois.edu 
>> <mailto:faibu...@illinois.edu>>
>> To:  Roy Musselman <roym...@us.ibm.com <mailto:roym...@us.ibm.com>>
>> Cc:  "Gyllenhaal, John C." <gyllenha...@llnl.gov 
>> <mailto:gyllenha...@llnl.gov>>
>> Date:  02/18/2021 02:17 PM
>> Subject:  [EXTERNAL] Re: xlf90_r Internal Compiler Error
>> 
>> 
>> 
>> 
>> 
>> The most recently built version available on the CORAL systems... 
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> The most recently built version available on the CORAL systems is 3.13.0. 
>> (ml load petsc/3.13.0) Will that work for you?
>> I am building petsc from source as part of development work on petsc itself 
>> so modules are unfortunately not useful here.
>> The files you sent me do not contain all the dependencies (other mod files) 
>> required to reproduce the error. 
>> I'll attempt to build version 3.14.4 from scratch and recreate the failing 
>> symptom you are observing.
>> Yes, petsc uses an automated system to generate the fortran files from C 
>> which goes about 20 rabbit holes deeper than I was willing to dig. Let me 
>> know if you run into trouble configuring and building petsc, I can point you 
>> in the right direction. I’ve attached a “reconfigure” script with this 
>> email, it contains all of the arguments I used to configure petsc 
>> successfully on Lassen. If you place it into your $PETSC_DIR (i.e. the 
>> folder titled “petsc” and that contains a “configure” file) and run:
>> 
>> $ python3 ./reconfigure-arch-linux-c-debug.py
>> 
>> It should work. If not, you will have to 
>> 
>> $ ./configure —all-the-args —in-the-reconfigure —file
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> Cell: (312) 694-3391[attachment "reconfigure-arch-linux-c-debug.py" deleted 
>> by Roy Musselman/Rochester/Contr/IBM] 
>> On Feb 18, 2021, at 15:07, Roy Musselman <roym...@us.ibm.com 
>> <mailto:roym...@us.ibm.com>> wrote:
>> Hi Jacob,
>> 
>> The source file appears to come from the PETSc 3.14.4 library. The most 
>> recently built version available on the CORAL systems is 3.13.0. (ml load 
>> petsc/3.13.0) Will that work for you?
>> The files you sent me do not contain all the dependencies (other mod files) 
>> required to reproduce the error. 
>> I'll attempt to build version 3.14.4 from scratch and recreate the failing 
>> symptom you are observing.
>> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Roy Musselman
>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com>
>> LLNL office: 925-422-6033
>> Cell: 507-358-8895, Home: 507-281-9565
>> 
>> <graycol.gif>Roy Musselman---02/18/2021 11:18:20 AM---I'll take a look. 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Roy Musselman
>> 
>> From: Roy Musselman/Rochester/Contr/IBM
>> To: LC Hotline <lc-hotl...@llnl.gov <mailto:lc-hotl...@llnl.gov>>
>> Cc: "Gyllenhaal, John C." <gyllenha...@llnl.gov 
>> <mailto:gyllenha...@llnl.gov>>
>> Date: 02/18/2021 11:18 AM
>> Subject: Re: [EXTERNAL] FW: xlf90_r Internal Compiler Error
>> 
>> 
>> 
>> 
>> 
>> I'll take a look. 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Roy Musselman
>> IBM HPC Application Analyst at Lawrence Livermore National Lab
>> email: roym...@us.ibm.com <mailto:roym...@us.ibm.com>
>> LLNL office: 925-422-6033
>> Cell: 507-358-8895, Home: 507-281-9565
>> 
>> 
>> <graycol.gif>LC Hotline ---02/18/2021 11:03:55 AM---Hi John, Roy, Can you 
>> help this user with the problem that he is seeing when he tries to build with
>> 
>> From: LC Hotline <lc-hotl...@llnl.gov <mailto:lc-hotl...@llnl.gov>>
>> To: "Gyllenhaal, John C." <gyllenha...@llnl.gov 
>> <mailto:gyllenha...@llnl.gov>>, Roy Musselman <roym...@us.ibm.com 
>> <mailto:roym...@us.ibm.com>>
>> Date: 02/18/2021 11:03 AM
>> Subject: [EXTERNAL] FW: xlf90_r Internal Compiler Error
>> 
>> 
>> 
>> Hi John, Roy, Can you help this user with the problem that he is... 
>> This Message Is From an External Sender
>> This message came from outside your organization.
>> Hi John, Roy,
>> 
>> Can you help this user with the problem that he is seeing when he tries to 
>> build with xlf90 on Lassen?
>> 
>> Thanks,
>> Ryan
>> --
>> LC Hotline
>> 
>> From: Jacob Faibussowitsch <faibu...@illinois.edu 
>> <mailto:faibu...@illinois.edu>>
>> Date: Wednesday, February 17, 2021 at 5:27 PM
>> To: LC Hotline <lc-hotl...@llnl.gov <mailto:lc-hotl...@llnl.gov>>
>> Subject: xlf90_r Internal Compiler Error
>> 
>> Hello LC Support, 
>> 
>> While compiling my application on Lassen I seem have run afoul of the xlf90 
>> mpi compiler wrapper with the following error:
>> 
>> *** Error in `/usr/tce/packages/xl/xl-2020.11.12/xlf/16.1.1/exe/xlfentry': 
>> free(): invalid pointer: 0x0000200001740018 ***
>> 
>> I’m fairly certain this isn’t my fault as this is code that compiles 
>> regularly on extensive CI/CD under various other compilers and machines, but 
>> you can never rule it out. I have included a verbose full log of my make run 
>> (which includes a comprehensive rundown of the environment) as well as a 
>> separate file containing the error message and stack trace from the 
>> compiler. Additionally I have also included the file which I believe is 
>> causing the error. Let me know if there is anything else I should send.
>> 
>> P.S. My list of loaded modules:
>> 
>> Currently Loaded Modules:
>> 1) StdEnv (S) 4) cuda/11.1.1 7) valgrind/3.16.1
>> 2) clang/ibm-11.0.0 5) python/3.8.2 8) lapack/3.9.0-xl-2020.11.12
>> 3) spectrum-mpi/rolling-release 6) cmake/3.18.0 9) hip/3.0.0
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> Cell: (312) 694-3391[attachment "errorReport.zip" deleted by Roy 
>> Musselman/Rochester/Contr/IBM] 
> 

Reply via email to