Hi Roger, On Dec 8, 2010, at 11:41 AM, Roger Martin wrote:
> Hi Quincey, [got the 'e' this time] > > The problem cannot be repeated in a c example program because the problem is > upstream use of vector's with a -= operation. The failure in this area did > not show except in this mpi application even though the same library and code > is used in single process application. > ............... > scores[qbChemicalShiftTask->getNMRAtomIndices()[index]]-=qbChemicalShiftTask->getNMRTrace()[index]; > ............. > where scores etc. are std:: vector<double>, std:: vector<int> typedefs. In > certain runs the indices were incorrect so this was badly constructed and > needed to be rewritten to have better alignment of indices to the previous > creation of the scores vector and better checking. > > These vectors were not used as input to any hdf5 interfaces; hdf5 looks clean > and stable in the file closing. The problem was entirely in non hdf5 code > but resulted in stepping on the H5SL skip list from my compile and running of > the integrated system. > > Thank you for being ready to look into it if it could be duplicated and shown > to be in the H5SL remove area. The problem wasn't in hdf5 code. Ah, that's good to hear, thanks! Quincey > On 12/08/2010 10:12 AM, Roger Martin wrote: >> Hi Quincy, >> >> I'll be pulling pieces out of the large c++ project into a small test c >> program to see if the seg fault can be duplicated in a wieldable example and >> if accomplished, will send it to you. >> >> MemoryScape and gdb(Netbeans) doesn't show any memory issues from our >> library code and hdf5. MemoryScape doesn't expand through the H5SL_REMOVE >> macro so in another working copy I'm trying to treat a copy of it as a >> function. >> >> On 12/07/2010 05:20 PM, Quincey Koziol wrote: >>> Hi Roger, >>> >>> On Dec 7, 2010, at 2:06 PM, Roger Martin wrote: >>> >>>> Further: >>>> >>>> Debugging with MemoryScape: >>>> Reveals a segfault in H5SL.c (1.8.5) at line 1068 >>>> ...1068.... >>>> H5SL_REMOVE(SCALAR, slist, x, const haddr_t, key, -) >>>> //H5SL_TYPE_HADDR case >>>> .... >>>> >>>> The stack trace is: >>>> H5SL_remove 1068 >>>> H5C_flush_single_entry 7993 >>>> H5C_flush_cache 1395 >>>> H5AC_flush 941 >>>> H5F_flush 1673 >>>> H5F_dest 996 >>>> H5F_try_close 1900 >>>> H5F_close 1750 >>>> H5I_dec_ref 1490 >>>> H5F_close 1951 >>>> >>>> I'll be adding print outs to see what variable/pointer is causing the seg >>>> fault. The MemoryScape Fame shows: >>>> .............. >>>> Stack Frame >>>> Function "H5SL_remove": >>>> slist: 0x0b790fc0 (Allocated) -> (H5SL_t) >>>> key: 0x0b9853f8 (Allocated Interior) -> >>>> 0x000000000001affc (110588) >>>> Block "$b8": >>>> _last: 0x0b772270 (Allocated) -> (H5SL_node_t) >>>> _llast: 0x0001affc -> (H5SL_node_t) >>>> _next: 0x0b9855c0 (Allocated) -> (H5SL_node_t) >>>> _drop: 0x0b772270 (Allocated) -> (H5SL_node_t) >>>> _ldrop: 0x0b772270 (Allocated) -> (H5SL_node_t) >>>> _count: 0x00000000 (0) >>>> _i:<Bad address: 0x00000000> >>>> Local variables: >>>> x:<Bad address: 0x00000000> >>>> hashval:<Bad address: 0x00000000> >>>> ret_value:<Bad address: 0x00000000> >>>> FUNC: "H5SL_remove" >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> ................ >>>> >>>> Some bad addresses on some of the variables such as x which was set by "x >>>> = slist->header;" which is a skip list. >>>> >>>> These appear to be internal API functions and I'm wondering how I could be >>>> offending them from high level API calls and file interfaces. What could >>>> be in the cache H5C when >>>> H5Fget_obj_count(fileID, H5F_OBJ_ALL) = 1 >>>> and H5Fget_obj_count(fileID, H5F_OBJ_DATASET | H5F_OBJ_GROUP | >>>> H5F_OBJ_DATATYPE | H5F_OBJ_ATTR) =0 >>>> for the file the code is trying to close. >>> Yes, you are correct, that shouldn't happen. :-/ Do you have a simple C >>> program you can send to show this failure? >>> >>> Quincey >>> >>>> On 12/03/2010 11:33 AM, Roger Martin wrote: >>>>> Hi, >>>>> >>>>> Using hdf1.8.5 and 1.8.6 pre2; openmpi 1.4.3 on linux rhel4 and rhel5 >>>>> >>>>> >>>>> In a case where the hdf5 operations aren't using MPI but build an h5 file >>>>> exclusive to individual MPI jobs/processes: >>>>> >>>>> The create: >>>>> currentFileID = H5Fcreate(filePath.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, >>>>> H5P_DEFAULT); >>>>> >>>>> and many file operations using the hl methods including packet table, >>>>> tables and datasets etc. perform successfully. >>>>> >>>>> Then near the individual processes' end the >>>>> H5Fclose(currentFileID); >>>>> is called but doesn't return. A check for open objects says only one >>>>> file object is open but no other objects(group, dataset etc). No other >>>>> software or process is acting on this h5; it is named exclusively for the >>>>> one job it is associated with. >>>>> >>>>> This isn't a parallel hdf5 in MPI attempt. In another scenario parallel >>>>> hdf5 is working the collective way just fine. This current issue is for >>>>> people who don't have or want a parallel file system and I made a coarsed >>>>> grained MPI to run independent jobs for these folks. Each job has its >>>>> own h5 opened with H5Fcreate(filePath.c_str(), H5F_ACC_TRUNC, >>>>> H5P_DEFAULT, H5P_DEFAULT); >>>>> >>>>> Where should I look? >>>>> >>>>> I'll try to make a small example test case for show and tell. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Hdf-forum is for HDF software users discussion. >>>>> Hdf-forum@hdfgroup.org >>>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>>>> >>>> >>>> _______________________________________________ >>>> Hdf-forum is for HDF software users discussion. >>>> Hdf-forum@hdfgroup.org >>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> Hdf-forum@hdfgroup.org >>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> Hdf-forum@hdfgroup.org >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > Hdf-forum@hdfgroup.org > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@hdfgroup.org http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org