> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Budi Mulyono > Sent: Monday, August 24, 2009 3:38 AM > To: r-help@r-project.org > Subject: [R] hdf5 package segfault when processing large data > > Hi there, > > I am currently working on something that uses hdf5 library. I think > hdf5 is a great data format, I've used it somewhat extensively in > python via PyTables. I was looking for something similar to that in R. > The closest I can get is this library: hdf5. While it does not work > the same way as PyTables did, but it's good enough to let them > exchange data via hdf5 file. > > There is just 1 problem, I keep getting Segfault error when trying to > process large files (>10MB), although this is by no mean large when we > talk about hdf5 capabilities. I have included the example code and > data below. I have tried with different OS (WinXP and Ubuntu 8.04), > architecture (32 and 64bit) and R versions (2.7.1, 2.72, and 2.9.1), > but all of them present the same problem. I was wondering if anyone > have any clue as to what's going on here and maybe can advice me to > handle it.
This sort of problem should be sent to the package's maintainer. > packageDescription("hdf5") Package: hdf5 Version: 1.6.9 Title: HDF5 Author: Marcus G. Daniels mdani...@lanl.gov Maintainer: Marcus G. Daniels <mdani...@lanl.gov> Description: Interface to the NCSA HDF5 library ... This is probably due to the code in hdf5.c allocating a huge matrix, buf, on the stack with 883 unsigned char buf[rowcount][size]; It dies with the segmentatio fault (stack overflow, in particular) at line 898, where it tries to access this buf. 885 for (ri = 0; ri < rowcount; ri++) 886 for (pos = 0; pos < colcount; pos++) 887 { 888 SEXP item = VECTOR_ELT (val, pos); 889 SEXPTYPE type = TYPEOF (item); 890 void *ptr = &buf[ri][offsets[pos]]; 891 892 switch (type) 893 { 894 case REALSXP: 895 memcpy (ptr, &REAL (item)[ri], sizeof (double)); 896 break; 897 case INTSXP: 898 memcpy (ptr, &INTEGER (item)[ri], sizeof (int)); 899 break; The code should use one of the allocators in the R API instead of putting the big memory block on the stack. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > > Thank you, appreciate any help i can get. > > Cheers, > > Budi > > The example script > ==================== > library(hdf5) > fileName <- "sample.txt" > myTable <- read.table(fileName,header=TRUE,sep="\t",as.is=TRUE) > hdf5save("test.hdf", "myTable") > > ======== > The data example, the list continue for more than 250,000 > rows: sample.txt > ======== > Date Time f1 f2 f3 f4 f5 > 20070328 07:56 463 463.07 462.9 463.01 1100 > 20070328 07:57 463.01 463.01 463.01 463.01 200 > .... > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.