Hello,when running my HPX application on our cluster with multiple localities I SOMETIMES get a segmentation fault with error message: "archive data bstream data chunk size mismatch: HPX(serialization_error)".
And when I rerun the same configuration, it either works or sometimes segfaults again.
Any idea what could cause this or how to debug it?
Thanks!
Tim
The full error output follows:
{stack-trace}: 4 frames:
0x2b40ce84564c : hpx::detail::backtrace[abi:cxx11](unsigned long) +
0x9c in /home/tbiedert/local/lib/libhpx.so.1
0x2b40ce8918fa : boost::exception_ptr
hpx::detail::get_exception<hpx::exception>(hpx::exception const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, long,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&) + 0xaa in
/home/tbiedert/local/lib/libhpx.so.1
0x2b40ce891e5e : void
hpx::detail::throw_exception<hpx::exception>(hpx::exception const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, long) + 0x4e in
/home/tbiedert/local/lib/libhpx.so.1
0x2b40ce92049e : hpx::detail::throw_exception(hpx::error,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, long) + 0x4e in
/home/tbiedert/local/lib/libhpx.so.1
{env}: 177 entries:
BASH_FUNC_module()=() { eval `/usr/bin/modulecmd bash $*`
}
BINARY_TYPE_HPC=
BSUB_BLOCK_EXEC_HOST=
CFLAGS=-I/software/binutils/2.27/include -I/software/gcc/6.2.0/include
CMAKE_PREFIX_PATH=/home/tbiedert/local
CPATH=/home/tbiedert/local/opt/tbb2017-update3/include
CPLUS_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/include:/home/tbiedert/local/include:
CPPFLAGS=-I/software/binutils/2.27/include -I/software/gcc/6.2.0/include
CPP_INCLUDE_PATH=/home/tbiedert/local/include:
CVS_RSH=ssh
C_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/include:/home/tbiedert/local/include:
G_BROKEN_FILENAMES=1
HISTCONTROL=ignoreboth
HISTSIZE=500
HOME=/home/tbiedert
HOSTNAME=node774
HOSTTYPE=X86_64
ITERM_ORIG_PS1=\[\033[7m\]\u@\h\[\033[m\] [\W]
ITERM_PREV_PS1=\[\]\[\033[7m\]\u@\h\[\033[m\] [\W] \[\]
JOB_TERMINATE_INTERVAL=300
KDEDIRS=/usr
KDE_IS_PRELINKED=1
LANG=en_US.UTF-8
LDFLAGS=-L/software/binutils/2.27/lib -L/software/gcc/6.2.0/lib64
-L/software/gcc/6.2.0/lib
LD_LIBRARY_PATH=/lsf/9.1/linux2.6-glibc2.3-x86_64/lib:/home/tbiedert/local/opt/tbb2017-update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release:/software/binutils/2.27/lib:/software/gcc/6.2.0/lib64:/software/gcc/6.2.0/lib:/home/tbiedert/local/lib:/home/tbiedert/local/usr/lib64:/home/tbiedert/local/lib64 LESSOPEN=||/usr/bin/lesspipe.sh %s LIBRARY_PATH=/home/tbiedert/local/opt/tbb2017-update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release LOADEDMODULES=gcc/6.2.0:binutils/latest LOGNAME=tbiedert LSB_ACCT_FILE=/tmp/5324709.tmpdir/.1481211361.5324709.acct LSB_AFFINITY_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostAffinityFile LSB_APPLICATION_NAME=hybrid_mpi_openmp LSB_BATCH_JID=5324709 LSB_BIND_CPU_LIST=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 LSB_CHKFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709 LSB_DJOB_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile LSB_DJOB_NUMPROC=128 LSB_DJOB_RANKFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile LSB_ECHKPNT_RSH_CMD=ssh LSB_EEXEC_REAL_GID= LSB_EEXEC_REAL_UID=LSB_EFFECTIVE_RSRCREQ=select[ ((( (model == XEON_E5_2640v3)) && type == any))] order[-slots:-maxslots] rusage[mem=60000.00] span[ptile=16] same[model] cu[type=switch:maxcus=1:pref=config] affinity[core(1)*1:distribute=pack]
LSB_ERRORFILE=5324709.err LSB_EXEC_CLUSTER=Elwetritsch LSB_EXEC_HOSTTYPE=X86_64 LSB_EXIT_PRE_ABORT=99LSB_HOSTS=node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775
LSB_JOBEXIT_STAT=0 LSB_JOBFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709 LSB_JOBID=5324709 LSB_JOBINDEX=0LSB_JOBNAME=mpirun --map-by ppr:1:node --bind-to none ./hpxvr --hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8 --blockSize 256x256x256 --tileSize 64x34 --preload --distributed --compress /scratch/tbiedert/4096x4096x4096.dummy
LSB_JOBRES_CALLBACK=56355@node790 LSB_JOBRES_PID=485 LSB_JOB_EXECUSER=tbiedert LSB_JOB_STARTER=/lsf/rhrk/bin/job_starter_hybrid_mpi_openmp "%USRCMD" LSB_MAX_NUM_PROCESSORS=128LSB_MCPU_HOSTS=node790 1 node792 1 node793 1 node795 1 node796 1 node773 1 node774 1 node775 1
LSB_OUTDIR=/home/tbiedert/HPX-VolumeRendering/build LSB_OUTPUTFILE=5324709.out LSB_PROJECT_NAME=default LSB_QUEUE=short LSB_RES_GET_FANOUT_INFO=Y LSB_SUB_HOST=head4LSB_SUB_RES_REQ=select[(model==XEON_E5_2640v3)] rusage[mem=60000] span[ptile=16] cu[maxcus=1:type=switch]
LSB_SUB_USER=tbiedert LSB_TRAPSIGS=trap # 15 10 12 2 1 LSB_UNIXGROUP_INT=inf LSB_XFER_OP= LSFUSER=tbiedert LSF_BINDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/bin LSF_CGROUP_TOPDIR_KEY=Elwetritsch LSF_EAUTH_AUX_DATA=/tmp/.auxr9ymHwN LSF_EAUTH_AUX_PASS=yes LSF_EAUTH_CLIENT=user LSF_EAUTH_SERVER=mbatchd@Elwetritsch LSF_ENVDIR=/lsf/conf LSF_FROM_HOST=node790 LSF_INVOKE_CMD=bsub LSF_JOB_TIMESTAMP_VALUE=1481212155 LSF_LIBDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/lib LSF_LIM_API_NTRIES=1 LSF_LOGDIR=/lsf/log LSF_PJL_TYPE=openmpi LSF_PM_TASKID=6 LSF_SERVERDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/etc LSF_VERSION=30 LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36: LS_EXECCWD=/home/tbiedert/HPX-VolumeRendering/build LS_EXEC_T=START LS_JOBPID=79755 LS_SUBCWD=/home/tbiedert/HPX-VolumeRendering/build MAIL=/var/spool/mail/tbiedert MANPATH=/software/gcc/6.2.0/share/man:/home/tbiedert/local/share/man:/lsf/9.1/man: MODULEPATH=/software/modulefiles MODULESHOME=/usr/share/Modules MSM_HOME=/usr/local/MegaRAID Storage Manager MSM_PRODUCT=MSM NXDIR=/usr/NX OMPI_APP_CTX_NUM_PROCS=8OMPI_ARGV=--hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8 --blockSize 256x256x256 --tileSize 64x34 --preload --distributed --compress /scratch/tbiedert/4096x4096x4096.dummy
OMPI_COMMAND=hpxvr
OMPI_COMM_WORLD_LOCAL_RANK=0
OMPI_COMM_WORLD_LOCAL_SIZE=1
OMPI_COMM_WORLD_NODE_RANK=0
OMPI_COMM_WORLD_RANK=6
OMPI_COMM_WORLD_SIZE=8
OMPI_FILE_LOCATION=/tmp/5324709.tmpdir/openmpi-sessions-tbiedert@node774_0/4164/1/6
OMPI_FIRST_RANKS=0
OMPI_MCA_db=^pmi
OMPI_MCA_ess=env
OMPI_MCA_ess_base_jobid=272891905
OMPI_MCA_ess_base_vpid=6
OMPI_MCA_grpcomm=^pmi
OMPI_MCA_hwloc_base_binding_policy=none
OMPI_MCA_initial_wdir=/home/tbiedert/HPX-VolumeRendering/build
OMPI_MCA_mpi_yield_when_idle=0
OMPI_MCA_orte_app_num=0
OMPI_MCA_orte_bound_at_launch=1
OMPI_MCA_orte_ess_jobid=272891904
OMPI_MCA_orte_ess_node_rank=0
OMPI_MCA_orte_ess_num_procs=8
OMPI_MCA_orte_ess_vpid=1
OMPI_MCA_orte_hnp_uri=272891904.0;tcp://10.255.8.90,10.250.8.90:48359
OMPI_MCA_orte_local_daemon_uri=272891904.6;tcp://10.255.8.74,10.250.8.74:47752
OMPI_MCA_orte_num_nodes=8
OMPI_MCA_orte_num_restarts=0
OMPI_MCA_orte_peer_fini_barrier_id=2
OMPI_MCA_orte_peer_init_barrier_id=1
OMPI_MCA_orte_peer_modex_id=0
OMPI_MCA_orte_precondition_transports=e2dcd4f3b6aa563f-9fb1cf15b9c08abf
OMPI_MCA_orte_tmpdir_base=/tmp/5324709.tmpdir
OMPI_MCA_pubsub=^pmi
OMPI_MCA_rmaps_base_mapping_policy=ppr:1:node
OMPI_MCA_shmem_RUNTIME_QUERY_hint=mmap
OMPI_NUM_APP_CTX=1
OMPI_UNIVERSE_SIZE=128
OPAL_OUTPUT_STDERR_FD=18
PATH=/lsf/9.1/linux2.6-glibc2.3-x86_64/bin:/software/binutils/2.27/bin:/software/gcc/6.2.0/bin:/home/tbiedert/local/bin:/lsf/rhrk/bin:/cluster/rhrk/bin:/usr/lib64/qt-3.3/bin:/usr/NX/bin:/lsf/9.1/linux2.6-glibc2.3-x86_64/etc:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bin:/home/tbiedert/bin
PWD=/home/tbiedert/HPX-VolumeRendering/build
QTDIR=/usr/lib64/qt-3.3
QTINC=/usr/lib64/qt-3.3/include
QTLIB=/usr/lib64/qt-3.3/lib
RBH_CFG_DEFAULT=/cluster/robinhood/conf/scratch.conf
RHRK_MPI_HYBRID=1
RHRK_NOTIFICATION=LOGS
RM_CPUTASK10=3
RM_CPUTASK11=5
RM_CPUTASK12=7
RM_CPUTASK13=9
RM_CPUTASK14=11
RM_CPUTASK15=13
RM_CPUTASK16=15
RM_CPUTASK1=0
RM_CPUTASK2=2
RM_CPUTASK3=4
RM_CPUTASK4=6
RM_CPUTASK5=8
RM_CPUTASK6=10
RM_CPUTASK7=12
RM_CPUTASK8=14
RM_CPUTASK9=1
SBD_KRB5CCNAME_VAL=
SCRATCH=/scratch/tbiedert
SHELL=/bin/bash
SHLVL=4
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
SSH_CLIENT=131.246.17.22 35482 22
SSH_CONNECTION=131.246.17.22 35482 131.246.113.228 22
SSH_TTY=/dev/pts/5
TBBROOT=/home/tbiedert/local/opt/tbb2017-update3
TMOUT=3600
TMPDIR=/tmp/5324709.tmpdir
USER=tbiedert
_=/home/tbiedert/local/bin/mpirun
_LMFILES_=/software/modulefiles/gcc/6.2.0:/software/modulefiles/binutils/latest
__LSF_JOB_TMPDIR__=/tmp/5324709.tmpdir
{locality-id}: 6
{hostname}: [ (mpi:6) ]
{process-id}: 79756
{function}: input_container::load_binary_chunk
{file}: /tmp/hpx-build/hpx/hpx/runtime/serialization/input_container.hpp
{line}: 146
{os-thread}: worker-thread#11
{thread-description}: <unknown>
{state}: state_running
{auxinfo}:
{config}:
HPX_HAVE_NATIVE_TLS=ON
HPX_HAVE_STACKTRACES=ON
HPX_HAVE_COMPRESSION_BZIP2=OFF
HPX_HAVE_COMPRESSION_SNAPPY=OFF
HPX_HAVE_COMPRESSION_ZLIB=OFF
HPX_HAVE_PARCEL_COALESCING=ON
HPX_HAVE_PARCELPORT_TCP=OFF
HPX_HAVE_PARCELPORT_MPI=ON (OpenMPI V1.8.3, MPI V3.0)
HPX_HAVE_VERIFY_LOCKS=OFF
HPX_HAVE_HWLOC=ON
HPX_HAVE_ITTNOTIFY=OFF
HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
HPX_PARCEL_MAX_CONNECTIONS=512
HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
HPX_AGAS_LOCAL_CACHE_SIZE=4096
HPX_HAVE_MALLOC=tcmalloc
HPX_PREFIX (configured)=/home/tbiedert/local
HPX_PREFIX=/home/tbiedert/local
{version}: V1.0.0-trunk (AGAS: V3.0), Git: 9ecdb73e07
{boost}: V1.62.0
{build-type}: release
{date}: Dec 7 2016 20:41:41
{platform}: linux
{compiler}: GNU C++ version 6.2.0
{stdlib}: GNU libstdc++ version 20160822
{what}: archive data bstream data chunk size mismatch:
HPX(serialization_error)
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
