Hello again, Todd, We have found the following alert on IBM’s website regarding GPFS:
* http://www.gpfsusergroup.org/news/gpfs-3-5-announcments * http://www-01.ibm.com/support/docview.wss?uid=isg3T1021392 IBM has identified a problem with GPFS 3.5.0.20 and GPFS 4.1.0.2 where GPFS may fail to correctly handle multiple vectors passed via the writev() system call. When a {NULL, 0} is passed as the first vector, an EINVAL error may be incorrectly returned. This would cause the user application to fail unexpectedly when writev() is called to write to a GPFS file. User data are not affected. The writev() call is most likely to have been automatically generated by the library or compiler. Guess what — we are running GPFS 3.5.0.20 on our compute nodes :-( And it would also explain why we do not have it on another (non-compute) node, since that is still running GPFS 3.5.0.19... That’s what they call bad luck! Thanks a lot for pointing is in the right direction. Now let’s hope that we get a GFPS fix soon! -- Regards, Franky Backeljauw Op 28-okt.-2014, om 13:46 heeft Backeljauw Franky <franky.backelj...@uantwerpen.be<mailto:franky.backelj...@uantwerpen.be>> het volgende geschreven: Hello Todd, You’re spot on! Here the file testC(XX)Compiler.c was empty indeed. We are building on one of our GPFS filesystems on a compute node which is running SL 6.4. Previously, we managed to build on another node with RHEL 6.4, so we’ll be checking for differences between these two nodes. On SL 6.4, when building on /tmp, the build is fine, but the install fails when it’s copying /tmp/easybuild/CMake/2.8.12.1/intel-2014a/cmake-2.8.12.1/Copyright.txt to its destination (on the GPFS filesystem). When executing the copy ourselves, it’s working fine. I’m puzzled... Do you know whether they solved the issue and how? P.s.: The name of our new cluster is Hopper as well, yet we doubt whether this has anything to do with it ;-) -- Many thanks, Franky Op 28-okt.-2014, om 11:08 heeft Todd Gamblin <tgamb...@llnl.gov<mailto:tgamb...@llnl.gov>> het volgende geschreven: I've seen this error on machines where the filesystem was having issues, specifically on the home filesystem on NERSC's hopper machine. The problem was that try_compile was generating an empty testCCompiler.c -- have you looked at the size of this file in your build output? The root cause of the problem was that ostream::operator<< was generating calls to fwrite with more than 1023 bytes, but for whatever reason the filesystem was failing to execute that call. Running strace on CMake showed the buggy call -- I'm attaching the C++ and CMake reproducers I made for NERSC's system. You can run the cmake one with cmake -E, and you'll need to compile th C++ one. Strace them and see if you can see the problem I'm describing. Or try building in a different filesystem and see if the problem persists. You may actually have a different problem, but the output you're reporting looks awfully familiar to me. I don't think it's easy build's fault. -Todd Here's a more detailed description of the problem that I sent to the PETSc developers a few weeks ago: Hi guys, After playing around with this, the problem is the C++ implementation on hopper. Something is screwy with fstream — it can’t write chunks larger than 1023 bytes. Attached are two reproducers for your support request. One reproduces the problem in CMake; one reproduces it in C++. If you dig in the CMakeFiles directory, you’ll see that the C file used to do the compiler identification is actually 0 bytes, which is what gets you the undefined reference to main in your error log, causing the compiler test to fail: $ l CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c -rw-r--r-- 1 tgamblin tgamblin 0 Sep 30 21:34 CMakeFiles/2.8.11.1/CompilerIdC/CMakeCCompilerId.c I figured something was wrong with the CMake after the OS upgrade Mark mentioned, and I tried to build a fresh one, but that gave the same error *in the bootstrap script* — I couldn’t even build cmake. If you dig around to the spot in CMake where it generates the files to use for compiler testing, you find that CMake reads a file into a variable, filters it, and writes it out. Printing the variable using message() pre- and post-filtering works fine. But writing the variable still gets you a 0 byte file. Making a simple reproducer that writes a small string succeeds. If, however, you make a string > 1024 bytes, you get zero-byte output. Test that by running my file: cmake -P test.cmake If you strace that, you notice this: $ strace cmake -P test.cmake [ … snip … ] writev(3, [{NULL, 0}, {"01234567890123456789012345678901"..., 1024}], 2) = -1 EINVAL (Invalid argument) Smaller messages use a write() and not a writev(), so they succeed. But that’s not cmake’s fault. ofstream does that. If you run the attached C++ program, which uses ofstream to write a 1024-byte string, it fails too. Take one character off the string and it works. So, something is botched with Hopper’s C++ libs, or maybe with writev. I imagine that more than CMake is broken, at least on the front-end nodes. I don’t know of many programs that write large chunks to ofstreams, so maybe all is not lost. Still makes me suspicious of the hopper machine. -Todd From: Backeljauw Franky <franky.backelj...@uantwerpen.be<mailto:franky.backelj...@uantwerpen.be>> Reply-To: "easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>" <easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>> List-Post: easybuild@lists.ugent.be Date: Tuesday, October 28, 2014 at 2:56 AM To: "easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>" <easybuild@lists.ugent.be<mailto:easybuild@lists.ugent.be>> Subject: [easybuild] CMake-3.3.0-intel-2014b : problem building Hello all, We have great difficulty in installing CMake on RHEL 6.4. We have tried both with CMake-2.8.12.1-intel-2014a.eb and CMake-3.3.0-intel-2014b.eb which is included in EasyBuild 1.15.2. The same problem occurs with the foss-toolchains as well as with (e.g.) CMake-2.8.12.1-GCC-4.8.2.eb. We get the following errors: -- The C compiler identification is unknown CMake Error at Modules/CMakeDetermineCCompiler.cmake:170 (configure_file): configure_file Problem configuring file Call Stack (most recent call first): CMakeLists.txt:16 (project) -- The CXX compiler identification is unknown CMake Error at Modules/CMakeDetermineCXXCompiler.cmake:168 (configure_file): configure_file Problem configuring file Call Stack (most recent call first): CMakeLists.txt:16 (project) -- Check for working C compiler: /apps/antwerpen/ivybridge/sl6/icc/2013.5.192-GCC-4.8.3/bin/intel64/icc CMake Error at Modules/CMakeTestCCompiler.cmake:47 (try_compile): Unknown extension ".c" for file /apps/antwerpen/easybuild/build/CMake/3.0.0/intel-2014b/cmake-3.0.0/CMakeFiles/CMakeTmp/testCCompiler.c try_compile() works only for enabled languages. Currently these are: C CXX I have included the full log for the CMake-3.0.0 build (cmake.log). I hope someone can help us out here. -- Many thanks for your reply, Franky Backeljauw <test.cmake><test.C>