Hello again,
Todd,

We have found the following alert on IBM’s website regarding GPFS:


  *   http://www.gpfsusergroup.org/news/gpfs-3-5-announcments
  *   http://www-01.ibm.com/support/docview.wss?uid=isg3T1021392

IBM has identified a problem with GPFS 3.5.0.20 and GPFS 4.1.0.2 where GPFS may 
fail to correctly handle multiple vectors passed via the writev() system call. 
When a {NULL, 0} is passed as the first vector, an EINVAL error may be 
incorrectly returned. This would cause the user application to fail 
unexpectedly when writev() is called to write to a GPFS file. User data are not 
affected. The writev() call is most likely to have been automatically generated 
by the library or compiler.

Guess what — we are running GPFS 3.5.0.20 on our compute nodes :-(  And it 
would also explain why we do not have it on another (non-compute) node, since 
that is still running GPFS 3.5.0.19... That’s what they call bad luck!

Thanks a lot for pointing is in the right direction. Now let’s hope that we get 
a GFPS fix soon!

-- Regards,

Franky Backeljauw



Op 28-okt.-2014, om 13:46 heeft Backeljauw Franky 
<franky.backelj...@uantwerpen.be<mailto:franky.backelj...@uantwerpen.be>> het 
volgende geschreven:

Hello Todd,

You’re spot on! Here the file testC(XX)Compiler.c was empty indeed. We are 
building on one of our GPFS filesystems on a compute node which is running SL 
6.4. Previously, we managed to build on another node with RHEL 6.4, so we’ll be 
checking for differences between these two nodes.

On SL 6.4, when building on /tmp, the build is fine, but the install fails when 
it’s copying 
/tmp/easybuild/CMake/2.8.12.1/intel-2014a/cmake-2.8.12.1/Copyright.txt to its 
destination (on the GPFS filesystem). When executing the copy ourselves, it’s 
working fine.

I’m puzzled... Do you know whether they solved the issue and how?

P.s.: The name of our new cluster is Hopper as well, yet we doubt whether this 
has anything to do with it ;-)

-- Many thanks,

Franky



Op 28-okt.-2014, om 11:08 heeft Todd Gamblin 
<tgamb...@llnl.gov<mailto:tgamb...@llnl.gov>> het volgende geschreven:

I've seen this error on machines where the filesystem was having issues, 
specifically on the home filesystem on NERSC's hopper machine.

The problem was that try_compile was generating an empty testCCompiler.c -- 
have you looked at the size of this file in your build output?

The root cause of the problem was that ostream::operator<< was generating calls 
to fwrite with more than 1023 bytes, but for whatever reason the filesystem was 
failing to execute that call.  Running strace on CMake showed the buggy call -- 
I'm attaching the C++ and CMake reproducers I made for NERSC's system.  You can 
run the cmake one with cmake -E, and you'll need to compile th C++ one.  Strace 
them and see if you can see the problem I'm describing.  Or try building in a 
different filesystem and see if the problem persists.

You may actually have a different problem, but the output you're reporting 
looks awfully familiar to me.  I don't think it's easy build's fault.

-Todd



Here's a more detailed description of the problem that I sent to the PETSc 
developers a few weeks ago:

Hi guys,

After playing around with this, the problem is the C++ implementation on
hopper. Something is screwy with fstream — it can’t write chunks larger
than 1023 bytes.  Attached are two reproducers for your support request.
One reproduces the problem in CMake; one reproduces it in C++.

If you dig in the CMakeFiles directory, you’ll see that the C file used to
do the compiler identification is actually 0 bytes, which is what gets you
the undefined reference to main in your error log, causing the compiler
test to fail:

$ l CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c
-rw-r--r-- 1 tgamblin tgamblin 0 Sep 30 21:34
CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c


I figured something was wrong with the CMake after the OS upgrade Mark
mentioned, and I tried to build a fresh one, but that gave the same error
*in the bootstrap script* — I couldn’t even build cmake.

If you dig around to the spot in CMake where it generates the files to use
for compiler testing, you find that CMake reads a file into a variable,
filters it, and writes it out.  Printing the variable using message() pre-
and post-filtering works fine.  But writing the variable still gets you a
0 byte file.

Making a simple reproducer that writes a small string succeeds.  If,
however, you make a string > 1024 bytes, you get zero-byte output.  Test
that by running my file:

cmake -P test.cmake

If you strace that, you notice this:

$ strace cmake -P test.cmake
[ … snip … ]
writev(3, [{NULL, 0}, {"01234567890123456789012345678901"..., 1024}], 2)
= -1 EINVAL (Invalid argument)

Smaller messages use a write() and not a writev(), so they succeed.  But
that’s not cmake’s fault.  ofstream does that.  If you run the attached
C++ program, which uses ofstream to write a 1024-byte string, it fails
too.  Take one character off the string and it works.

So, something is botched with Hopper’s C++ libs, or maybe with writev.  I
imagine that more than CMake is broken, at least on the front-end nodes.
I don’t know of many programs that write large chunks to ofstreams, so
maybe all is not lost.

Still makes me suspicious of the hopper machine.

-Todd



From: Backeljauw Franky 
<franky.backelj...@uantwerpen.be<mailto:franky.backelj...@uantwerpen.be>>
Reply-To: "easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>" 
<easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>>
List-Post: easybuild@lists.ugent.be
Date: Tuesday, October 28, 2014 at 2:56 AM
To: "easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>" 
<easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>>
Subject: [easybuild] CMake-3.3.0-intel-2014b : problem building

Hello all,

We have great difficulty in installing CMake on RHEL 6.4. We have tried both 
with CMake-2.8.12.1-intel-2014a.eb and CMake-3.3.0-intel-2014b.eb which is 
included in EasyBuild 1.15.2. The same problem occurs with the foss-toolchains 
as well as with (e.g.) CMake-2.8.12.1-GCC-4.8.2.eb.

We get the following errors:

-- The C compiler identification is unknown
CMake Error at Modules/CMakeDetermineCCompiler.cmake:170 (configure_file):
  configure_file Problem configuring file
Call Stack (most recent call first):
  CMakeLists.txt:16 (project)


-- The CXX compiler identification is unknown
CMake Error at Modules/CMakeDetermineCXXCompiler.cmake:168 (configure_file):
  configure_file Problem configuring file
Call Stack (most recent call first):
  CMakeLists.txt:16 (project)


-- Check for working C compiler: 
/apps/antwerpen/ivybridge/sl6/icc/2013.5.192-GCC-4.8.3/bin/intel64/icc
CMake Error at Modules/CMakeTestCCompiler.cmake:47 (try_compile):
  Unknown extension ".c" for file

    
/apps/antwerpen/easybuild/build/CMake/3.0.0/intel-2014b/cmake-3.0.0/CMakeFiles/CMakeTmp/testCCompiler.c

  try_compile() works only for enabled languages.  Currently these are:

    C CXX

I have included the full log for the CMake-3.0.0 build (cmake.log).

I hope someone can help us out here.

-- Many thanks for your reply,

Franky Backeljauw
<test.cmake><test.C>


Reply via email to