Dear all, I am trying to execute some experiments using ROMIO (MPICH) file view capabilities for reading and writing shared files into OrangeFS. I have tried with the IOR benchmark and an ad-hoc simple workload generator (attached to this message) with no success. After a few trials, I observed that when the dataset is small (i.e., O(100KB)), processes finish without errors, but the file ends up empty. For my target dataset produced by the ad-hoc generator (around 300 MB), multiple error messages like the following are exhibited, the file is created, but it is incomplete:
[E 23:05:11.388265] Error: payload_progress: Bad address (error class: 128) [E 23:05:11.388296] mem_to_bmi_callback_fn: I/O error occurred [E 23:05:11.388311] handle_io_error: flow proto error cleanup started on 0x2396e18: Bad address [E 23:05:11.388320] handle_io_error: flow proto 0x2396e18 canceled 0 operations, will clean up. [E 23:05:11.388333] handle_io_error: flow proto 0x2396e18 error cleanup finished: Bad address I verified that the ad-hoc generator works fine when executed locally (i.e., all processes in the same node writing to the local file system). This fact, allied to the file system error messages observed, suggests me that the problem may be in the OrangeFS. Following are the system information: - OrangeFS 2.9.7 - MPICH 3.2.1 - IOR 3.0.1 - GCC 4.4.7 20120313 - CentOS release 6.7 (kernel 2.6.32-573.22.1.el6.x86_64) - Data/Metadata servers on Hercule cluster and Clients on 20 nodes of the Nova cluster, both from the Grid’5000 testbed ( https://www.grid5000.fr/mediawiki/index.php/Lyon:Hardware) - ad-hoc generator compilation: $ mpicc fview.c -Wall -Wextra -o fview - ad-hoc generator execution: $ mpiexec -n 40 -f mpihostsfile ./fview Thank you. -- Eduardo Camilo Inacio, M.Sc. Lattes: http://lattes.cnpq.br/4794169282710899
fview.c
Description: Binary data
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
