Hello,

when running my HPX application on our cluster with multiple localities I SOMETIMES get a segmentation fault with error message: "archive data bstream data chunk size mismatch: HPX(serialization_error)".

And when I rerun the same configuration, it either works or sometimes segfaults again.

Any idea what could cause this or how to debug it?

Thanks!

Tim

The full error output follows:



{stack-trace}: 4 frames:
0x2b40ce84564c : hpx::detail::backtrace[abi:cxx11](unsigned long) + 0x9c in /home/tbiedert/local/lib/libhpx.so.1 0x2b40ce8918fa : boost::exception_ptr hpx::detail::get_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xaa in /home/tbiedert/local/lib/libhpx.so.1 0x2b40ce891e5e : void hpx::detail::throw_exception<hpx::exception>(hpx::exception const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) + 0x4e in /home/tbiedert/local/lib/libhpx.so.1 0x2b40ce92049e : hpx::detail::throw_exception(hpx::error, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) + 0x4e in /home/tbiedert/local/lib/libhpx.so.1
{env}: 177 entries:
  BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
}
  BINARY_TYPE_HPC=
  BSUB_BLOCK_EXEC_HOST=
  CFLAGS=-I/software/binutils/2.27/include -I/software/gcc/6.2.0/include
  CMAKE_PREFIX_PATH=/home/tbiedert/local
  CPATH=/home/tbiedert/local/opt/tbb2017-update3/include
CPLUS_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/include:/home/tbiedert/local/include:
  CPPFLAGS=-I/software/binutils/2.27/include -I/software/gcc/6.2.0/include
  CPP_INCLUDE_PATH=/home/tbiedert/local/include:
  CVS_RSH=ssh
C_INCLUDE_PATH=/software/binutils/2.27/include:/software/gcc/6.2.0/include:/home/tbiedert/local/include:
  G_BROKEN_FILENAMES=1
  HISTCONTROL=ignoreboth
  HISTSIZE=500
  HOME=/home/tbiedert
  HOSTNAME=node774
  HOSTTYPE=X86_64
  ITERM_ORIG_PS1=\[\033[7m\]\u@\h\[\033[m\] [\W]
  ITERM_PREV_PS1=\[\]\[\033[7m\]\u@\h\[\033[m\] [\W] \[\]
  JOB_TERMINATE_INTERVAL=300
  KDEDIRS=/usr
  KDE_IS_PRELINKED=1
  LANG=en_US.UTF-8
LDFLAGS=-L/software/binutils/2.27/lib -L/software/gcc/6.2.0/lib64 -L/software/gcc/6.2.0/lib
LD_LIBRARY_PATH=/lsf/9.1/linux2.6-glibc2.3-x86_64/lib:/home/tbiedert/local/opt/tbb2017-update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release:/software/binutils/2.27/lib:/software/gcc/6.2.0/lib64:/software/gcc/6.2.0/lib:/home/tbiedert/local/lib:/home/tbiedert/local/usr/lib64:/home/tbiedert/local/lib64
  LESSOPEN=||/usr/bin/lesspipe.sh %s
LIBRARY_PATH=/home/tbiedert/local/opt/tbb2017-update3/build/linux_intel64_gcc_cc6.2.0_libc2.12_kernel2.6.32_release
  LOADEDMODULES=gcc/6.2.0:binutils/latest
  LOGNAME=tbiedert
  LSB_ACCT_FILE=/tmp/5324709.tmpdir/.1481211361.5324709.acct
LSB_AFFINITY_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostAffinityFile
  LSB_APPLICATION_NAME=hybrid_mpi_openmp
  LSB_BATCH_JID=5324709
  LSB_BIND_CPU_LIST=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
  LSB_CHKFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709
LSB_DJOB_HOSTFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile
  LSB_DJOB_NUMPROC=128
LSB_DJOB_RANKFILE=/home/tbiedert/.lsbatch/1481211361.5324709.hostfile
  LSB_ECHKPNT_RSH_CMD=ssh
  LSB_EEXEC_REAL_GID=
  LSB_EEXEC_REAL_UID=
LSB_EFFECTIVE_RSRCREQ=select[ ((( (model == XEON_E5_2640v3)) && type == any))] order[-slots:-maxslots] rusage[mem=60000.00] span[ptile=16] same[model] cu[type=switch:maxcus=1:pref=config] affinity[core(1)*1:distribute=pack]
  LSB_ERRORFILE=5324709.err
  LSB_EXEC_CLUSTER=Elwetritsch
  LSB_EXEC_HOSTTYPE=X86_64
  LSB_EXIT_PRE_ABORT=99
LSB_HOSTS=node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node790 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node792 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node793 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node795 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node796 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node773 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node774 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775 node775
  LSB_JOBEXIT_STAT=0
  LSB_JOBFILENAME=/home/tbiedert/.lsbatch/1481211361.5324709
  LSB_JOBID=5324709
  LSB_JOBINDEX=0
LSB_JOBNAME=mpirun --map-by ppr:1:node --bind-to none ./hpxvr --hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8 --blockSize 256x256x256 --tileSize 64x34 --preload --distributed --compress /scratch/tbiedert/4096x4096x4096.dummy
  LSB_JOBRES_CALLBACK=56355@node790
  LSB_JOBRES_PID=485
  LSB_JOB_EXECUSER=tbiedert
  LSB_JOB_STARTER=/lsf/rhrk/bin/job_starter_hybrid_mpi_openmp "%USRCMD"
  LSB_MAX_NUM_PROCESSORS=128
LSB_MCPU_HOSTS=node790 1 node792 1 node793 1 node795 1 node796 1 node773 1 node774 1 node775 1
  LSB_OUTDIR=/home/tbiedert/HPX-VolumeRendering/build
  LSB_OUTPUTFILE=5324709.out
  LSB_PROJECT_NAME=default
  LSB_QUEUE=short
  LSB_RES_GET_FANOUT_INFO=Y
  LSB_SUB_HOST=head4
LSB_SUB_RES_REQ=select[(model==XEON_E5_2640v3)] rusage[mem=60000] span[ptile=16] cu[maxcus=1:type=switch]
  LSB_SUB_USER=tbiedert
  LSB_TRAPSIGS=trap # 15 10 12 2 1
  LSB_UNIXGROUP_INT=inf
  LSB_XFER_OP=
  LSFUSER=tbiedert
  LSF_BINDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/bin
  LSF_CGROUP_TOPDIR_KEY=Elwetritsch
  LSF_EAUTH_AUX_DATA=/tmp/.auxr9ymHwN
  LSF_EAUTH_AUX_PASS=yes
  LSF_EAUTH_CLIENT=user
  LSF_EAUTH_SERVER=mbatchd@Elwetritsch
  LSF_ENVDIR=/lsf/conf
  LSF_FROM_HOST=node790
  LSF_INVOKE_CMD=bsub
  LSF_JOB_TIMESTAMP_VALUE=1481212155
  LSF_LIBDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/lib
  LSF_LIM_API_NTRIES=1
  LSF_LOGDIR=/lsf/log
  LSF_PJL_TYPE=openmpi
  LSF_PM_TASKID=6
  LSF_SERVERDIR=/lsf/9.1/linux2.6-glibc2.3-x86_64/etc
  LSF_VERSION=30
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.tbz=01;31:*.tbz2=01;31:*.bz=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
  LS_EXECCWD=/home/tbiedert/HPX-VolumeRendering/build
  LS_EXEC_T=START
  LS_JOBPID=79755
  LS_SUBCWD=/home/tbiedert/HPX-VolumeRendering/build
  MAIL=/var/spool/mail/tbiedert
MANPATH=/software/gcc/6.2.0/share/man:/home/tbiedert/local/share/man:/lsf/9.1/man:
  MODULEPATH=/software/modulefiles
  MODULESHOME=/usr/share/Modules
  MSM_HOME=/usr/local/MegaRAID Storage Manager
  MSM_PRODUCT=MSM
  NXDIR=/usr/NX
  OMPI_APP_CTX_NUM_PROCS=8
OMPI_ARGV=--hpx:threads 16 --no-output --csv --warmup 3 --benchmark 8 --blockSize 256x256x256 --tileSize 64x34 --preload --distributed --compress /scratch/tbiedert/4096x4096x4096.dummy
  OMPI_COMMAND=hpxvr
  OMPI_COMM_WORLD_LOCAL_RANK=0
  OMPI_COMM_WORLD_LOCAL_SIZE=1
  OMPI_COMM_WORLD_NODE_RANK=0
  OMPI_COMM_WORLD_RANK=6
  OMPI_COMM_WORLD_SIZE=8
OMPI_FILE_LOCATION=/tmp/5324709.tmpdir/openmpi-sessions-tbiedert@node774_0/4164/1/6
  OMPI_FIRST_RANKS=0
  OMPI_MCA_db=^pmi
  OMPI_MCA_ess=env
  OMPI_MCA_ess_base_jobid=272891905
  OMPI_MCA_ess_base_vpid=6
  OMPI_MCA_grpcomm=^pmi
  OMPI_MCA_hwloc_base_binding_policy=none
  OMPI_MCA_initial_wdir=/home/tbiedert/HPX-VolumeRendering/build
  OMPI_MCA_mpi_yield_when_idle=0
  OMPI_MCA_orte_app_num=0
  OMPI_MCA_orte_bound_at_launch=1
  OMPI_MCA_orte_ess_jobid=272891904
  OMPI_MCA_orte_ess_node_rank=0
  OMPI_MCA_orte_ess_num_procs=8
  OMPI_MCA_orte_ess_vpid=1
OMPI_MCA_orte_hnp_uri=272891904.0;tcp://10.255.8.90,10.250.8.90:48359
OMPI_MCA_orte_local_daemon_uri=272891904.6;tcp://10.255.8.74,10.250.8.74:47752
  OMPI_MCA_orte_num_nodes=8
  OMPI_MCA_orte_num_restarts=0
  OMPI_MCA_orte_peer_fini_barrier_id=2
  OMPI_MCA_orte_peer_init_barrier_id=1
  OMPI_MCA_orte_peer_modex_id=0
OMPI_MCA_orte_precondition_transports=e2dcd4f3b6aa563f-9fb1cf15b9c08abf
  OMPI_MCA_orte_tmpdir_base=/tmp/5324709.tmpdir
  OMPI_MCA_pubsub=^pmi
  OMPI_MCA_rmaps_base_mapping_policy=ppr:1:node
  OMPI_MCA_shmem_RUNTIME_QUERY_hint=mmap
  OMPI_NUM_APP_CTX=1
  OMPI_UNIVERSE_SIZE=128
  OPAL_OUTPUT_STDERR_FD=18
PATH=/lsf/9.1/linux2.6-glibc2.3-x86_64/bin:/software/binutils/2.27/bin:/software/gcc/6.2.0/bin:/home/tbiedert/local/bin:/lsf/rhrk/bin:/cluster/rhrk/bin:/usr/lib64/qt-3.3/bin:/usr/NX/bin:/lsf/9.1/linux2.6-glibc2.3-x86_64/etc:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/bin:/home/tbiedert/bin
  PWD=/home/tbiedert/HPX-VolumeRendering/build
  QTDIR=/usr/lib64/qt-3.3
  QTINC=/usr/lib64/qt-3.3/include
  QTLIB=/usr/lib64/qt-3.3/lib
  RBH_CFG_DEFAULT=/cluster/robinhood/conf/scratch.conf
  RHRK_MPI_HYBRID=1
  RHRK_NOTIFICATION=LOGS
  RM_CPUTASK10=3
  RM_CPUTASK11=5
  RM_CPUTASK12=7
  RM_CPUTASK13=9
  RM_CPUTASK14=11
  RM_CPUTASK15=13
  RM_CPUTASK16=15
  RM_CPUTASK1=0
  RM_CPUTASK2=2
  RM_CPUTASK3=4
  RM_CPUTASK4=6
  RM_CPUTASK5=8
  RM_CPUTASK6=10
  RM_CPUTASK7=12
  RM_CPUTASK8=14
  RM_CPUTASK9=1
  SBD_KRB5CCNAME_VAL=
  SCRATCH=/scratch/tbiedert
  SHELL=/bin/bash
  SHLVL=4
  SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
  SSH_CLIENT=131.246.17.22 35482 22
  SSH_CONNECTION=131.246.17.22 35482 131.246.113.228 22
  SSH_TTY=/dev/pts/5
  TBBROOT=/home/tbiedert/local/opt/tbb2017-update3
  TMOUT=3600
  TMPDIR=/tmp/5324709.tmpdir
  USER=tbiedert
  _=/home/tbiedert/local/bin/mpirun
_LMFILES_=/software/modulefiles/gcc/6.2.0:/software/modulefiles/binutils/latest
  __LSF_JOB_TMPDIR__=/tmp/5324709.tmpdir
{locality-id}: 6
{hostname}: [ (mpi:6) ]
{process-id}: 79756
{function}: input_container::load_binary_chunk
{file}: /tmp/hpx-build/hpx/hpx/runtime/serialization/input_container.hpp
{line}: 146
{os-thread}: worker-thread#11
{thread-description}: <unknown>
{state}: state_running
{auxinfo}:
{config}:
  HPX_HAVE_NATIVE_TLS=ON
  HPX_HAVE_STACKTRACES=ON
  HPX_HAVE_COMPRESSION_BZIP2=OFF
  HPX_HAVE_COMPRESSION_SNAPPY=OFF
  HPX_HAVE_COMPRESSION_ZLIB=OFF
  HPX_HAVE_PARCEL_COALESCING=ON
  HPX_HAVE_PARCELPORT_TCP=OFF
  HPX_HAVE_PARCELPORT_MPI=ON (OpenMPI V1.8.3, MPI V3.0)
  HPX_HAVE_VERIFY_LOCKS=OFF
  HPX_HAVE_HWLOC=ON
  HPX_HAVE_ITTNOTIFY=OFF
  HPX_HAVE_RUN_MAIN_EVERYWHERE=OFF
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_AGAS_LOCAL_CACHE_SIZE=4096
  HPX_HAVE_MALLOC=tcmalloc
  HPX_PREFIX (configured)=/home/tbiedert/local
  HPX_PREFIX=/home/tbiedert/local
{version}: V1.0.0-trunk (AGAS: V3.0), Git: 9ecdb73e07
{boost}: V1.62.0
{build-type}: release
{date}: Dec  7 2016 20:41:41
{platform}: linux
{compiler}: GNU C++ version 6.2.0
{stdlib}: GNU libstdc++ version 20160822
{what}: archive data bstream data chunk size mismatch: HPX(serialization_error)


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to