Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are used. parseTickDataFromDir = function(tickerDir, per, subper) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { dat.i = parseTickData(tickerAbsFilenames[i]) dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes - openDateTimes[j] + 0:23400 } else { templateTimes - c(templateTimes, openDateTimes[j] + 0:23400) } } templateTimes - as.xts(templateTimes) dat.i - merge(dat.i, templateTimes, all=T) if (is.na(dat.i[1])) { dat.i[1] - -1 } dat.i - na.locf(dat.i) dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) setTxtProgressBar(pb, i) } close(pb) } parseTickData - function(inputFile) { DAT.list - scan(file=inputFile, sep=,,skip=1,what=list(Date=,Time=,Close=0,Volume=0),quiet=T) index - as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format=%m/%d/%Y %H:%M:%S) DAT.xts - xts(DAT.list$Close,index) DAT.xts - make.index.unique(DAT.xts) return(DAT.xts) } DATTick - parseTickDataFromDir(tickerDirSecond, seconds,10) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, July 22, 2012 4:48 PM To: David Terk Cc: r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu On 12-07-22 3:54 PM, David Terk wrote: I am reading several hundred files. Anywhere from 50k-400k in size. It appears that when I read these files with R 2.15.1 the process will hang or seg fault on the scan() call. This does not happen on R 2.14.1. The code below doesn't do anything other than define a couple of functions. Please simplify it to code that creates a file (or multiple files), reads it or them, and shows a bug. If you can't do that, then gradually add the rest of the stuff from these functions into the mix until you figure out what is really causing the bug. If you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. Duncan Murdoch This is happening on the precise build of Ubuntu. I have included everything, but the issue appears to be when performing the scan in the method parseTickData. Below is the code. Hopefully this is the right place to post. parseTickDataFromDir = function(tickerDir, per, subper, fun) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { # Grab Raw Tick Data dat.i = parseTickData(tickerAbsFilenames[i]) #Sys.sleep(1) # Create Template dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes - openDateTimes[j] + 0:23400 } else { templateTimes - c(templateTimes, openDateTimes[j] + 0:23400) } } # Convert templateTimes to XTS, merge with data and convert NA's templateTimes - as.xts(templateTimes) dat.i - merge(dat.i, templateTimes, all=T)
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are used. parseTickDataFromDir = function(tickerDir, per, subper) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { dat.i = parseTickData(tickerAbsFilenames[i]) dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes - openDateTimes[j] + 0:23400 } else { templateTimes - c(templateTimes, openDateTimes[j] + 0:23400) } } templateTimes - as.xts(templateTimes) dat.i - merge(dat.i, templateTimes, all=T) if (is.na(dat.i[1])) { dat.i[1] - -1 } dat.i - na.locf(dat.i) dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) setTxtProgressBar(pb, i) } close(pb) } parseTickData - function(inputFile) { DAT.list - scan(file=inputFile, sep=,,skip=1,what=list(Date=,Time=,Close=0,Volume=0),quiet=T) index - as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format=%m/%d/%Y %H:%M:%S) DAT.xts - xts(DAT.list$Close,index) DAT.xts - make.index.unique(DAT.xts) return(DAT.xts) } DATTick - parseTickDataFromDir(tickerDirSecond, seconds,10) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, July 22, 2012 4:48 PM To: David Terk Cc: r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu On 12-07-22 3:54 PM, David Terk wrote: I am reading several hundred files. Anywhere from 50k-400k in size. It appears that when I read these files with R 2.15.1 the process will hang or seg fault on the scan() call. This does not happen on R 2.14.1. The code below doesn't do anything other than define a couple of functions. Please simplify it to code that creates a file (or multiple files), reads it or them, and shows a bug. If you can't do that, then gradually add the rest of the stuff from these functions into the mix until you figure out what is really causing the bug. If you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. Duncan Murdoch This is happening on the precise build of Ubuntu. I have included everything, but the issue appears to be when performing the scan in the method parseTickData. Below is the code. Hopefully this is the right place to post. parseTickDataFromDir = function(tickerDir, per, subper, fun) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { # Grab Raw Tick Data dat.i = parseTickData(tickerAbsFilenames[i]) #Sys.sleep(1) # Create Template dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates,
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
David, You still haven't provided a reproducible example. As Duncan already said, if you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. And R-devel is not the appropriate venue to discuss this if it's truly an issue with xts/zoo. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 12:41 AM, David Terk david.t...@gmail.com wrote: Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are used. parseTickDataFromDir = function(tickerDir, per, subper) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { dat.i = parseTickData(tickerAbsFilenames[i]) dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes - openDateTimes[j] + 0:23400 } else { templateTimes - c(templateTimes, openDateTimes[j] + 0:23400) } } templateTimes - as.xts(templateTimes) dat.i - merge(dat.i, templateTimes, all=T) if (is.na(dat.i[1])) { dat.i[1] - -1 } dat.i - na.locf(dat.i) dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) setTxtProgressBar(pb, i) } close(pb) } parseTickData - function(inputFile) { DAT.list - scan(file=inputFile, sep=,,skip=1,what=list(Date=,Time=,Close=0,Volume=0),quiet=T) index - as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format=%m/%d/%Y %H:%M:%S) DAT.xts - xts(DAT.list$Close,index) DAT.xts - make.index.unique(DAT.xts) return(DAT.xts) } DATTick - parseTickDataFromDir(tickerDirSecond, seconds,10) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, July 22, 2012 4:48 PM To: David Terk Cc: r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu On 12-07-22 3:54 PM, David Terk wrote: I am reading several hundred files. Anywhere from 50k-400k in size. It appears that when I read these files with R 2.15.1 the process will hang or seg fault on the scan() call. This does not happen on R 2.14.1. The code below doesn't do anything other than define a couple of functions. Please simplify it to code that creates a file (or multiple files), reads it or them, and shows a bug. If you can't do that, then gradually add the rest of the stuff from these functions into the mix until you figure out what is really causing the bug. If you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. Duncan Murdoch This is happening on the precise build of Ubuntu. I have included everything, but the issue appears to be when performing the scan in the method parseTickData. Below is the code. Hopefully this is the right place to post. parseTickDataFromDir = function(tickerDir, per, subper, fun) { tickerAbsFilenames = list.files(tickerDir,full.names=T)
[Rd] duplicated() variation that goes both ways to capture all duplicates
Dear all The trouble with the current duplicated() function in is that it can report duplicates while searching fromFirst _or_ fromLast, but not both ways. Often users will want to identify and extract all the copies of the item that has duplicates, not only the duplicates themselves. To take the example from the man page: data(iris) iris[duplicated(iris), ] ##duplicates while searching fromFirst Sepal.Length Sepal.Width Petal.Length Petal.Width Species 143 5.8 2.7 5.1 1.9 virginica iris[duplicated(iris, fromLast=T), ] ##duplicates while searching fromLast Sepal.Length Sepal.Width Petal.Length Petal.Width Species 102 5.8 2.7 5.1 1.9 virginica To extract all the copies of the concerned items (original and duplicates) one would need to do something like this: iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ] ##duplicates while searching bothWays Sepal.Length Sepal.Width Petal.Length Petal.Width Species 102 5.8 2.7 5.1 1.9 virginica 143 5.8 2.7 5.1 1.9 virginica Unfortunately this is unnecessarily long and convoluted. Short of a 'bothWays' argument in duplicated(), I came up with a small wrapper that simplifies the above: duplicated2 - function(x, bothWays=TRUE, ...) { if(!bothWays) { return(duplicated(x, ...)) } else if(bothWays) { return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, ...))) } } Now the above can be achieved simply via: iris[duplicated2(iris), ] ##duplicates while searching bothWays Sepal.Length Sepal.Width Petal.Length Petal.Width Species 102 5.8 2.7 5.1 1.9 virginica 143 5.8 2.7 5.1 1.9 virginica So here's my inquiry: Would the R Core consider adding such functionality in 'base' R? Either the---suitably cleaned up---duplicated2() function above, or a bothWays argument in duplicated() itself? Either of the two would improve user convenience and reduce confusion. (In my case it took some time before I understood the correct approach to this problem.) Regards Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] duplicated() variation that goes both ways to capture all duplicates
On 23/07/2012 8:49 AM, Liviu Andronic wrote: Dear all The trouble with the current duplicated() function in is that it can report duplicates while searching fromFirst _or_ fromLast, but not both ways. Often users will want to identify and extract all the copies of the item that has duplicates, not only the duplicates themselves. To take the example from the man page: data(iris) iris[duplicated(iris), ] ##duplicates while searching fromFirst Sepal.Length Sepal.Width Petal.Length Petal.Width Species 143 5.8 2.7 5.1 1.9 virginica iris[duplicated(iris, fromLast=T), ] ##duplicates while searching fromLast Sepal.Length Sepal.Width Petal.Length Petal.Width Species 102 5.8 2.7 5.1 1.9 virginica To extract all the copies of the concerned items (original and duplicates) one would need to do something like this: iris[(duplicated(iris) | duplicated(iris, fromLast=T)), ] ##duplicates while searching bothWays Sepal.Length Sepal.Width Petal.Length Petal.Width Species 102 5.8 2.7 5.1 1.9 virginica 143 5.8 2.7 5.1 1.9 virginica Unfortunately this is unnecessarily long and convoluted. Short of a 'bothWays' argument in duplicated(), I came up with a small wrapper that simplifies the above: duplicated2 - function(x, bothWays=TRUE, ...) { if(!bothWays) { return(duplicated(x, ...)) } else if(bothWays) { return((duplicated(x, ...) | duplicated(x, fromLast=TRUE, ...))) } } Now the above can be achieved simply via: iris[duplicated2(iris), ] ##duplicates while searching bothWays Sepal.Length Sepal.Width Petal.Length Petal.Width Species 102 5.8 2.7 5.1 1.9 virginica 143 5.8 2.7 5.1 1.9 virginica So here's my inquiry: Would the R Core consider adding such functionality in 'base' R? Either the---suitably cleaned up---duplicated2() function above, or a bothWays argument in duplicated() itself? Either of the two would improve user convenience and reduce confusion. (In my case it took some time before I understood the correct approach to this problem.) I can't speak for all of R core, but I don't see the need for this in base R -- your solution looks fine to me. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
Well, you still haven't convinced anyone but yourself that it's definitely an xts problem, since you have not provided any reproducible example... -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:14 AM, David Terk david.t...@gmail.com wrote: Where should this be discussed since it is definitely XTS related? I will gladly upload the simplified script + data files to whoever is maintaining this part of the code. Fortunately there is a workaround here. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 8:15 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, You still haven't provided a reproducible example. As Duncan already said, if you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. And R-devel is not the appropriate venue to discuss this if it's truly an issue with xts/zoo. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 12:41 AM, David Terk david.t...@gmail.com wrote: Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are used. parseTickDataFromDir = function(tickerDir, per, subper) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { dat.i = parseTickData(tickerAbsFilenames[i]) dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes - openDateTimes[j] + 0:23400 } else { templateTimes - c(templateTimes, openDateTimes[j] + 0:23400) } } templateTimes - as.xts(templateTimes) dat.i - merge(dat.i, templateTimes, all=T) if (is.na(dat.i[1])) { dat.i[1] - -1 } dat.i - na.locf(dat.i) dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) setTxtProgressBar(pb, i) } close(pb) } parseTickData - function(inputFile) { DAT.list - scan(file=inputFile, sep=,,skip=1,what=list(Date=,Time=,Close=0,Volume=0),quiet=T) index - as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format=%m/%d/%Y %H:%M:%S) DAT.xts - xts(DAT.list$Close,index) DAT.xts - make.index.unique(DAT.xts) return(DAT.xts) } DATTick - parseTickDataFromDir(tickerDirSecond, seconds,10) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, July 22, 2012 4:48 PM To: David Terk Cc: r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu On 12-07-22 3:54 PM, David Terk wrote: I am reading several hundred files. Anywhere from 50k-400k in size. It appears that when I read these files with R 2.15.1 the process will hang or seg fault on the scan() call. This does not happen on R 2.14.1. The code below
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
Where should this be discussed since it is definitely XTS related? I will gladly upload the simplified script + data files to whoever is maintaining this part of the code. Fortunately there is a workaround here. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 8:15 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, You still haven't provided a reproducible example. As Duncan already said, if you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. And R-devel is not the appropriate venue to discuss this if it's truly an issue with xts/zoo. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 12:41 AM, David Terk david.t...@gmail.com wrote: Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are used. parseTickDataFromDir = function(tickerDir, per, subper) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { dat.i = parseTickData(tickerAbsFilenames[i]) dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes - openDateTimes[j] + 0:23400 } else { templateTimes - c(templateTimes, openDateTimes[j] + 0:23400) } } templateTimes - as.xts(templateTimes) dat.i - merge(dat.i, templateTimes, all=T) if (is.na(dat.i[1])) { dat.i[1] - -1 } dat.i - na.locf(dat.i) dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) setTxtProgressBar(pb, i) } close(pb) } parseTickData - function(inputFile) { DAT.list - scan(file=inputFile, sep=,,skip=1,what=list(Date=,Time=,Close=0,Volume=0),quiet=T) index - as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format=%m/%d/%Y %H:%M:%S) DAT.xts - xts(DAT.list$Close,index) DAT.xts - make.index.unique(DAT.xts) return(DAT.xts) } DATTick - parseTickDataFromDir(tickerDirSecond, seconds,10) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, July 22, 2012 4:48 PM To: David Terk Cc: r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu On 12-07-22 3:54 PM, David Terk wrote: I am reading several hundred files. Anywhere from 50k-400k in size. It appears that when I read these files with R 2.15.1 the process will hang or seg fault on the scan() call. This does not happen on R 2.14.1. The code below doesn't do anything other than define a couple of functions. Please simplify it to code that creates a file (or multiple files), reads it or them, and shows a bug. If you can't do that, then gradually add the rest of the stuff from these functions into the mix until you figure out what is really
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
I'm attaching a runnable script and corresponding data files. This will freeze at 83%. I'm not sure how much simpler to get than this. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 9:17 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu Well, you still haven't convinced anyone but yourself that it's definitely an xts problem, since you have not provided any reproducible example... -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:14 AM, David Terk david.t...@gmail.com wrote: Where should this be discussed since it is definitely XTS related? I will gladly upload the simplified script + data files to whoever is maintaining this part of the code. Fortunately there is a workaround here. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 8:15 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, You still haven't provided a reproducible example. As Duncan already said, if you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. And R-devel is not the appropriate venue to discuss this if it's truly an issue with xts/zoo. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 12:41 AM, David Terk david.t...@gmail.com wrote: Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are used. parseTickDataFromDir = function(tickerDir, per, subper) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { dat.i = parseTickData(tickerAbsFilenames[i]) dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes - openDateTimes[j] + 0:23400 } else { templateTimes - c(templateTimes, openDateTimes[j] + 0:23400) } } templateTimes - as.xts(templateTimes) dat.i - merge(dat.i, templateTimes, all=T) if (is.na(dat.i[1])) { dat.i[1] - -1 } dat.i - na.locf(dat.i) dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) setTxtProgressBar(pb, i) } close(pb) } parseTickData - function(inputFile) { DAT.list - scan(file=inputFile, sep=,,skip=1,what=list(Date=,Time=,Close=0,Volume=0),quiet=T) index - as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format=%m/%d/%Y %H:%M:%S) DAT.xts - xts(DAT.list$Close,index) DAT.xts - make.index.unique(DAT.xts) return(DAT.xts) } DATTick - parseTickDataFromDir(tickerDirSecond, seconds,10) -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Sunday, July 22,
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
David, Thank you for providing something reproducible. This line: templateTimes - as.xts(templateTimes) creates a zero-width xts object (i.e. the coredata is a zero-length vector, but there is a non-zero-length index). So, the to.period(templateTimes) call returns OHLC data of random memory locations. This is the likely cause of the segfaults. Since aggregating no data doesn't make sense, I have patched to.period to throw an error when run on zero-width/length objects (revision 690 on R-Forge). The attached file works with the CRAN version of xts because it avoids the issue entirely. Your script will still hang on the BAC_0.csv file because as.character.POSIXt can take a long time. Better to just call format() directly (as I do in the attached file). If you have any follow-up questions, please send them to R-SIG-Finance. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:41 AM, David Terk david.t...@gmail.com wrote: I'm attaching a runnable script and corresponding data files. This will freeze at 83%. I'm not sure how much simpler to get than this. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 9:17 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu Well, you still haven't convinced anyone but yourself that it's definitely an xts problem, since you have not provided any reproducible example... -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:14 AM, David Terk david.t...@gmail.com wrote: Where should this be discussed since it is definitely XTS related? I will gladly upload the simplified script + data files to whoever is maintaining this part of the code. Fortunately there is a workaround here. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 8:15 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, You still haven't provided a reproducible example. As Duncan already said, if you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. And R-devel is not the appropriate venue to discuss this if it's truly an issue with xts/zoo. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 12:41 AM, David Terk david.t...@gmail.com wrote: Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are used. parseTickDataFromDir = function(tickerDir, per, subper) { tickerAbsFilenames = list.files(tickerDir,full.names=T) tickerNames = list.files(tickerDir,full.names=F) tickerNames = gsub(_[a-zA-Z0-9].csv,,tickerNames) pb - txtProgressBar(min = 0, max = length(tickerAbsFilenames), style = 3) for(i in 1:length(tickerAbsFilenames)) { dat.i = parseTickData(tickerAbsFilenames[i]) dates - unique(substr(as.character(index(dat.i)), 1,10)) times - rep(09:30:00, length(dates)) openDateTimes - strptime(paste(dates, times), %F %H:%M:%S) templateTimes - NULL for (j in 1:length(openDateTimes)) { if (is.null(templateTimes)) { templateTimes
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
Thank you for getting this done so quickly. This will process now. One quick question regarding a call to as.character.POSIXt. When using scan, since scan reads line by line, would it make sense to have the ability to perform a char - POSIXct conversion on each line that is read, rather than after all lines have been read? Perhaps this already exists somewhere and I am not aware of it. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 12:00 PM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, Thank you for providing something reproducible. This line: templateTimes - as.xts(templateTimes) creates a zero-width xts object (i.e. the coredata is a zero-length vector, but there is a non-zero-length index). So, the to.period(templateTimes) call returns OHLC data of random memory locations. This is the likely cause of the segfaults. Since aggregating no data doesn't make sense, I have patched to.period to throw an error when run on zero-width/length objects (revision 690 on R-Forge). The attached file works with the CRAN version of xts because it avoids the issue entirely. Your script will still hang on the BAC_0.csv file because as.character.POSIXt can take a long time. Better to just call format() directly (as I do in the attached file). If you have any follow-up questions, please send them to R-SIG-Finance. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:41 AM, David Terk david.t...@gmail.com wrote: I'm attaching a runnable script and corresponding data files. This will freeze at 83%. I'm not sure how much simpler to get than this. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 9:17 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu Well, you still haven't convinced anyone but yourself that it's definitely an xts problem, since you have not provided any reproducible example... -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:14 AM, David Terk david.t...@gmail.com wrote: Where should this be discussed since it is definitely XTS related? I will gladly upload the simplified script + data files to whoever is maintaining this part of the code. Fortunately there is a workaround here. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 8:15 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, You still haven't provided a reproducible example. As Duncan already said, if you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. And R-devel is not the appropriate venue to discuss this if it's truly an issue with xts/zoo. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 12:41 AM, David Terk david.t...@gmail.com wrote: Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I reboot the machine and try again, the code will not run. The bug itself is happening somewhere in XTS or ZOO. I will gladly upload the data files. It is happening on the 10th data file which is only 225k lines in size. Below is the simplified code. The call to either dat.i - to.period(dat.i, period=per, k=subper, name=NULL) index(dat.i) - index(to.period(templateTimes, period=per, k=subper)) is what is causing R to hang or crash. I have been able to replicate this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to consistently replicate from R Studio. The code below will consistently replicate when the appropriate files are
[Rd] large dataset - confused
I'm trying to load a dataset into R, but I'm completely lost. This is probably due mostly to the fact that I'm a complete R newb, but it's got me stuck in a research project. I've tried just opening the text file in WordPad and copying the data directly into R, but it's too big and causes the program to crash. Any suggestions or assistance? I'm kinda desperate and lost. -- View this message in context: http://r.789695.n4.nabble.com/large-dataset-confused-tp4637476.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] large dataset - confused
On 23/07/2012 18:32, walcotteric wrote: I'm trying to load a dataset into R, but I'm completely lost. This is probably due mostly to the fact that I'm a complete R newb, but it's got me stuck in a research project. I've tried just opening the text file in WordPad and copying the data directly into R, but it's too big and causes the program to crash. Any suggestions or assistance? I'm kinda desperate and lost. Yes, you are lost. The R posting guide is at http://www.r-project.org/posting-guide.html and will point you to the right list and also the manuals (at e.g. http://cran.r-project.org/manuals.html, and one of them seems exactly what you need). BTW, 'large dataset' is meaningless: when I asked a class of Statistics PhD students the answers differed by 7 orders of magnitude. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] large dataset - confused
On 07/23/2012 12:32 PM, walcotteric wrote: I'm trying to load a dataset into R, but I'm completely lost. This is probably due mostly to the fact that I'm a complete R newb, but it's got me stuck in a research project. I've tried just opening the text file in WordPad and copying the data directly into R, but it's too big and causes the program to crash. Any suggestions or assistance? I'm kinda desperate and lost. Check the manual about loading data: http://cran.r-project.org/doc/manuals/R-data.html If you're still having trouble, read the posting guide: http://www.r-project.org/posting-guide.html Follow its advice about reproducibility. Also, this question should have been directed to R-Help, not R-devel Regards, - Brian -- Brian G. Peterson http://braverock.com/brian/ Ph: 773-459-4973 IM: bgpbraverock __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] large dataset - confused
1) Move this off R-devel to R-help. 2) Read the IO manual here: http://cran.r-project.org/manuals.html 3) You probably want to look at the read.table() function's help page by typing ?read.table Michael On Mon, Jul 23, 2012 at 12:32 PM, walcotteric walco...@msu.edu wrote: I'm trying to load a dataset into R, but I'm completely lost. This is probably due mostly to the fact that I'm a complete R newb, but it's got me stuck in a research project. I've tried just opening the text file in WordPad and copying the data directly into R, but it's too big and causes the program to crash. Any suggestions or assistance? I'm kinda desperate and lost. -- View this message in context: http://r.789695.n4.nabble.com/large-dataset-confused-tp4637476.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] large dataset - confused
Hi, On Mon, Jul 23, 2012 at 1:32 PM, walcotteric walco...@msu.edu wrote: I'm trying to load a dataset into R, but I'm completely lost. This is probably due mostly to the fact that I'm a complete R newb, but it's got me stuck in a research project. I've tried just opening the text file in WordPad and copying the data directly into R, but it's too big and causes the program to crash. Any suggestions or assistance? I'm kinda desperate and lost. Sure. First of all, you need to post to the R-help list, not the R-devel list. Then you need to read the Intro to R that came with R when you installed it. Then you need to read the posting guide for R-help, and provide the requested information, including: how big is your dataset? what format is it in? (text file isn't very informative) what R commands have you used? (read.table() perhaps) and so on. Also, what do you mean by crash? R stops working? You get an error message? Sarah -- Sarah Goslee http://www.functionaldiversity.org __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
On 07/23/2012 11:49 AM, David Terk wrote: One quick question regarding a call to as.character.POSIXt. When using scan, since scan reads line by line, would it make sense to have the ability to perform a char - POSIXct conversion on each line that is read, rather than after all lines have been read? Perhaps this already exists somewhere and I am not aware of it. It's actually much faster to load everything into memory and then convert it all to xts at once. as.POSIXct will work on a vector to create your index, this s better than calling it millions of times, once for each row. -- Brian __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] large dataset - confused
On Mon, Jul 23, 2012 at 06:42:17PM +0100, Prof Brian Ripley wrote: [...] BTW, 'large dataset' is meaningless: when I asked a class of Statistics PhD students the answers differed by 7 orders of magnitude. [...] lol But isn't 7 a small number? ;-) Ciao, Oliver __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu
On Jul 23, 2012, at 12:49 PM, David Terk wrote: Thank you for getting this done so quickly. This will process now. One quick question regarding a call to as.character.POSIXt. When using scan, since scan reads line by line, would it make sense to have the ability to perform a char - POSIXct conversion on each line that is read, rather than after all lines have been read? That's not the problem -- the problem is that converting through format specifications is very, very slow - if you have standard -mm-dd hh:mm:ss format (or a subset thereof) you can use fastPOSTXct from http://rforge.net/fasttime - it's many orders of magnitude faster than using format-based conversions - but it is also limited to the standard GMT format (hence the speed). If you have more complex format and have to go through format, you can use pvec from multicore/parallel to at least use all cores of your machine. Cheers, Simon Perhaps this already exists somewhere and I am not aware of it. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 12:00 PM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, Thank you for providing something reproducible. This line: templateTimes - as.xts(templateTimes) creates a zero-width xts object (i.e. the coredata is a zero-length vector, but there is a non-zero-length index). So, the to.period(templateTimes) call returns OHLC data of random memory locations. This is the likely cause of the segfaults. Since aggregating no data doesn't make sense, I have patched to.period to throw an error when run on zero-width/length objects (revision 690 on R-Forge). The attached file works with the CRAN version of xts because it avoids the issue entirely. Your script will still hang on the BAC_0.csv file because as.character.POSIXt can take a long time. Better to just call format() directly (as I do in the attached file). If you have any follow-up questions, please send them to R-SIG-Finance. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:41 AM, David Terk david.t...@gmail.com wrote: I'm attaching a runnable script and corresponding data files. This will freeze at 83%. I'm not sure how much simpler to get than this. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 9:17 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu Well, you still haven't convinced anyone but yourself that it's definitely an xts problem, since you have not provided any reproducible example... -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 8:14 AM, David Terk david.t...@gmail.com wrote: Where should this be discussed since it is definitely XTS related? I will gladly upload the simplified script + data files to whoever is maintaining this part of the code. Fortunately there is a workaround here. -Original Message- From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] Sent: Monday, July 23, 2012 8:15 AM To: David Terk Cc: Duncan Murdoch; r-devel@r-project.org Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu David, You still haven't provided a reproducible example. As Duncan already said, if you don't post code that allows us to reproduce the crash, it's really unlikely that we'll be able to fix it. And R-devel is not the appropriate venue to discuss this if it's truly an issue with xts/zoo. Best, -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com On Mon, Jul 23, 2012 at 12:41 AM, David Terk david.t...@gmail.com wrote: Looks like the call to: dat.i - to.period(dat.i, period=per, k=subper, name=NULL) If what is causing the issue. If variable name is not set, or set to any value other than NULL. Than no hang occurs. -Original Message- From: David Terk [mailto:david.t...@gmail.com] Sent: Monday, July 23, 2012 1:25 AM To: 'Duncan Murdoch' Cc: 'r-devel@r-project.org' Subject: RE: [Rd] Reading many large files causes R to crash - Possible Bug in R 2.15.1 64-bit Ubuntu I've isolated the bug. When the seg fault was produced there was an error that memory had not been mapped. Here is the odd part of the bug. If you comment out certain code and get a full run than comment in the code which is causing the problem it will actually run. So I think it is safe to assume something wrong is taking place with memory allocation. Example. While testing, I have been able to get to a point where the code will run. But if I
[Rd] S4 objects in formulas
Hi, I have very carefully developed several S4 classes that describe censored water-quality data. I have routines for them that will support their use in data.frames and so forth. I have run into a problem when I try to use the S4 class as the response variable in a formula and try to extract the model frame. I get an error like: Error in model.frame.default(as.lcens(Y) ~ X) : object is not a matrix In this case, as.lcens works much like the Surv function in the survival package except that the object is an S4 class and not a matrix of class Surv. I would have expected that the model.frame function would have been able to manipulate any kind of object that can be subsetted and put into a data.frame. But that appears not to be the case. I'm using R 2.14.1 if that matters. I can supply the routines for the lcens data if needed. Am I looking at needing to write a wrapper to convert all of my S4 classes into matrices and then extract the necessary data in the matrices according to rules for the particular kind of S4 class? Or, am I missing a key piece on how model.frame works? Thanks. Dave [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] On RObjectTables
Luke, Please keep me advised on this, because the Qt interfaces heavily rely on the ObjectTables (btw, it has worked great for my use cases). Michael On Fri, Jul 20, 2012 at 7:32 AM, luke-tier...@uiowa.edu wrote: I believe everyone who has worked on the relevant files has tried to maintain this functionality, but as it seems to get used and tested very little I can't be sure it is functional at this point. The facility in its current form does complicate the internal code and limit some experiments we might otherwise do, so I would not be surprised if it was at least substantially changed in the next year or two. Best, luke On Thu, 19 Jul 2012, Jeroen Ooms wrote: I was wondering if anyone knows more about the state of RObjectTables. This largely undocumented functionality was introduced by Duncan around 2002 somewhere and enables you create an environment where the contents are dynamically queried by R through a hook function. It is mentioned in R Internals and ?attach. This functionality is quite powerful and allows you to e.g. offload a big database of R objects to disk, yet use them as if they were in your workspace. The recent RProtoBuf package also uses some of this functionality to dynamically lookup proto definitions. I would like to do something similar, but I am not sure if support for this functionality will be or has been discontinued. The RObjectTables package is no longer available on OmegaHat and nothing has not been mentioned on the mailing lists for about 5 years. I found an old version of the package no github which seems to work, but as far as I understand, the package still needs the hooks from within R to work. So if this functionality is actually unsupported and might be removed at some point, I should probably not invest in it. [[alternative HTML version deleted]] __** R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-develhttps://stat.ethz.ch/mailman/listinfo/r-devel -- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics andFax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tier...@uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu __** R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-develhttps://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Finding dynamic shared libraries loaded with a package
Is there a way to query a package to see what dynamic shared libraries are loaded with it? The reason I ask is because during development, I want to unload libraries so that they can be reloaded without restarting R. I want to make it automatic so that you can just pass in the name of the package, and it will unload all the relevant shared libraries. Typically, the name of the shared library is the same as the package. So something like this usually works: pkgname - 'bitops' pkgpath - system.file(package=pkgname) library.dynam.unload(pkgname, pkgpath) Some R packages have shared libraries with names that differ from the package, and this strategy won't work for them. I'm aware that the NAMESPACE file will have an entry like this: useDynLib(libname) but I don't know how to access this information from within R. Is this possible? Another strategy I've looked at is to get all the directories listed by .dynLibs() and picking out those that contain the path of the package, but I'd prefer not to do it this way if possible, since it seems like a bit of a hack. For example, this code will load bitops, then unload the shared library and unload the package. library(bitops) # Show what's loaded .dynLibs() pkgname - 'bitops' # Get installation path for the package pkgpath - system.file(package=pkgname) # Get a vector of paths for all loaded libs dynlib_paths - vapply(.dynLibs(), function(x) x[[path]], character(1)) # Find which of the lib paths start with pkgpath pkgmatch - pkgpath == substr(dynlib_paths, 1, nchar(pkgpath)) # Get matching lib paths and strip off leading path and extension (.so or .dll) libnames - sub(\\.[^\\.]*$, , basename(dynlib_paths[pkgmatch])) library.dynam.unload(libnames, pkgpath) # Show what's loaded .dynLibs() # Finally, also unload the package detach(paste(package, pkgname, sep =:), character.only = TRUE, force = TRUE, unload = TRUE) Thanks for any help you can provide, -Winston [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Finding dynamic shared libraries loaded with a package
On Mon, Jul 23, 2012 at 8:29 PM, Winston Chang winstoncha...@gmail.com wrote: Is there a way to query a package to see what dynamic shared libraries are loaded with it? This gives a DLLInfoList class object whose components are info associated with the loaded dll's DLLInfoList - library.dynam() and this gives the components associated with package stats DLLInfoList[sapply(DLLInfoList, [[, name) == stats] -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Finding dynamic shared libraries loaded with a package
On Mon, Jul 23, 2012 at 7:47 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Jul 23, 2012 at 8:29 PM, Winston Chang winstoncha...@gmail.com wrote: Is there a way to query a package to see what dynamic shared libraries are loaded with it? This gives a DLLInfoList class object whose components are info associated with the loaded dll's DLLInfoList - library.dynam() and this gives the components associated with package stats DLLInfoList[sapply(DLLInfoList, [[, name) == stats] Thanks - I think this does the trick! Although I decided to use .dynLibs() instead of library.dynam(). The latter just calls the former when no name is passed to it. Another mailing list member sent me a message suggesting getLoadedDLLs(). This appears to be slightly different -- if I understand correctly, it returns all loaded DLLs, while .dynLibs() returns just the ones loaded by packages. -Winston [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] S4 objects in formulas (really, model frames)
The help for model.frame says Only variables whose type is raw, logical, integer, real, complex or character can be included in a model frame: this includes classed variables such as factors (whose underlying type is integer), but excludes lists. Some S4 objects are of one of those types, but some are not. Some matrices are, some are not. Objects of class Surv are. On 23/07/2012 21:33, David L Lorenz wrote: Hi, I have very carefully developed several S4 classes that describe censored water-quality data. I have routines for them that will support their use in data.frames and so forth. I have run into a problem when I try to use the S4 class as the response variable in a formula and try to extract the model frame. I get an error like: Error in model.frame.default(as.lcens(Y) ~ X) : object is not a matrix In this case, as.lcens works much like the Surv function in the survival package except that the object is an S4 class and not a matrix of class Surv. I would have expected that the model.frame function would have been able to manipulate any kind of object that can be subsetted and put into a data.frame. But that appears not to be the case. I'm using R 2.14.1 if that matters. I can supply the routines for the lcens data if needed. Am I looking at needing to write a wrapper to convert all of my S4 classes into matrices and then extract the necessary data in the matrices according to rules for the particular kind of S4 class? Or, am I missing a key piece on how model.frame works? Thanks. Dave [[alternative HTML version deleted]] The posting guide asked you not to do that. And to do your own homework. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel