Well, you still haven't convinced anyone but yourself that it's definitely an xts problem, since you have not provided any reproducible example... -- Joshua Ulrich | about.me/joshuaulrich FOSS Trading | www.fosstrading.com
On Mon, Jul 23, 2012 at 8:14 AM, David Terk <david.t...@gmail.com> wrote: > Where should this be discussed since it is definitely XTS related? I will > gladly upload the simplified script + data files to whoever is maintaining > this part of the code. Fortunately there is a workaround here. > > -----Original Message----- > From: Joshua Ulrich [mailto:josh.m.ulr...@gmail.com] > Sent: Monday, July 23, 2012 8:15 AM > To: David Terk > Cc: Duncan Murdoch; r-devel@r-project.org > Subject: Re: [Rd] Reading many large files causes R to crash - Possible Bug > in R 2.15.1 64-bit Ubuntu > > David, > > You still haven't provided a reproducible example. As Duncan already said, > "if you don't post code that allows us to reproduce the crash, it's really > unlikely that we'll be able to fix it." > > And R-devel is not the appropriate venue to discuss this if it's truly an > issue with xts/zoo. > > Best, > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > > > On Mon, Jul 23, 2012 at 12:41 AM, David Terk <david.t...@gmail.com> wrote: >> Looks like the call to: >> >> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL) >> >> If what is causing the issue. If variable name is not set, or set to >> any value other than NULL. Than no hang occurs. >> >> -----Original Message----- >> From: David Terk [mailto:david.t...@gmail.com] >> Sent: Monday, July 23, 2012 1:25 AM >> To: 'Duncan Murdoch' >> Cc: 'r-devel@r-project.org' >> Subject: RE: [Rd] Reading many large files causes R to crash - >> Possible Bug in R 2.15.1 64-bit Ubuntu >> >> I've isolated the bug. When the seg fault was produced there was an >> error that memory had not been mapped. Here is the odd part of the >> bug. If you comment out certain code and get a full run than comment in > the code which >> is causing the problem it will actually run. So I think it is safe to >> assume something wrong is taking place with memory allocation. Example. >> While testing, I have been able to get to a point where the code will run. >> But if I reboot the machine and try again, the code will not run. >> >> The bug itself is happening somewhere in XTS or ZOO. I will gladly >> upload the data files. It is happening on the 10th data file which is >> only 225k lines in size. >> >> Below is the simplified code. The call to either >> >> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL) >> index(dat.i) <- index(to.period(templateTimes, period=per, k=subper)) >> >> is what is causing R to hang or crash. I have been able to replicate >> this on Windows 7 64 bit and Ubuntu 64 bit. Seems easiest to >> consistently replicate from R Studio. >> >> The code below will consistently replicate when the appropriate files >> are used. >> >> parseTickDataFromDir = function(tickerDir, per, subper) { >> tickerAbsFilenames = list.files(tickerDir,full.names=T) >> tickerNames = list.files(tickerDir,full.names=F) >> tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames) >> pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), >> style = 3) >> >> for(i in 1:length(tickerAbsFilenames)) { >> dat.i = parseTickData(tickerAbsFilenames[i]) >> dates <- unique(substr(as.character(index(dat.i)), 1,10)) >> times <- rep("09:30:00", length(dates)) >> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S") >> templateTimes <- NULL >> >> for (j in 1:length(openDateTimes)) { >> if (is.null(templateTimes)) { >> templateTimes <- openDateTimes[j] + 0:23400 >> } else { >> templateTimes <- c(templateTimes, openDateTimes[j] + 0:23400) >> } >> } >> >> templateTimes <- as.xts(templateTimes) >> dat.i <- merge(dat.i, templateTimes, all=T) >> if (is.na(dat.i[1])) { >> dat.i[1] <- -1 >> } >> dat.i <- na.locf(dat.i) >> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL) >> index(dat.i) <- index(to.period(templateTimes, period=per, >> k=subper)) >> setTxtProgressBar(pb, i) >> } >> close(pb) >> } >> >> parseTickData <- function(inputFile) { >> DAT.list <- scan(file=inputFile, >> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T) >> index <- >> as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y >> %H:%M:%S") >> DAT.xts <- xts(DAT.list$Close,index) >> DAT.xts <- make.index.unique(DAT.xts) >> return(DAT.xts) >> } >> >> DATTick <- parseTickDataFromDir(tickerDirSecond, "seconds",10) >> >> -----Original Message----- >> From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] >> Sent: Sunday, July 22, 2012 4:48 PM >> To: David Terk >> Cc: r-devel@r-project.org >> Subject: Re: [Rd] Reading many large files causes R to crash - >> Possible Bug in R 2.15.1 64-bit Ubuntu >> >> On 12-07-22 3:54 PM, David Terk wrote: >>> I am reading several hundred files. Anywhere from 50k-400k in size. >>> It appears that when I read these files with R 2.15.1 the process >>> will hang or seg fault on the scan() call. This does not happen on R > 2.14.1. >> >> The code below doesn't do anything other than define a couple of > functions. >> Please simplify it to code that creates a file (or multiple files), >> reads it or them, and shows a bug. >> >> If you can't do that, then gradually add the rest of the stuff from >> these functions into the mix until you figure out what is really causing > the bug. >> >> If you don't post code that allows us to reproduce the crash, it's >> really unlikely that we'll be able to fix it. >> >> Duncan Murdoch >> >>> >>> >>> >>> This is happening on the precise build of Ubuntu. >>> >>> >>> >>> I have included everything, but the issue appears to be when >>> performing the scan in the method parseTickData. >>> >>> >>> >>> Below is the code. Hopefully this is the right place to post. >>> >>> >>> >>> parseTickDataFromDir = function(tickerDir, per, subper, fun) { >>> >>> tickerAbsFilenames = list.files(tickerDir,full.names=T) >>> >>> tickerNames = list.files(tickerDir,full.names=F) >>> >>> tickerNames = gsub("_[a-zA-Z0-9].csv","",tickerNames) >>> >>> pb <- txtProgressBar(min = 0, max = length(tickerAbsFilenames), >>> style = 3) >>> >>> >>> >>> for(i in 1:length(tickerAbsFilenames)) { >>> >>> >>> >>> # Grab Raw Tick Data >>> >>> dat.i = parseTickData(tickerAbsFilenames[i]) >>> >>> #Sys.sleep(1) >>> >>> # Create Template >>> >>> dates <- unique(substr(as.character(index(dat.i)), 1,10)) >>> >>> times <- rep("09:30:00", length(dates)) >>> >>> openDateTimes <- strptime(paste(dates, times), "%F %H:%M:%S") >>> >>> templateTimes <- NULL >>> >>> >>> >>> for (j in 1:length(openDateTimes)) { >>> >>> if (is.null(templateTimes)) { >>> >>> templateTimes <- openDateTimes[j] + 0:23400 >>> >>> } else { >>> >>> templateTimes <- c(templateTimes, openDateTimes[j] + >>> 0:23400) >>> >>> } >>> >>> } >>> >>> >>> >>> # Convert templateTimes to XTS, merge with data and convert NA's >>> >>> templateTimes <- as.xts(templateTimes) >>> >>> dat.i <- merge(dat.i, templateTimes, all=T) >>> >>> # If there is no data in the first print, we will have leading >>> NA's. So set them to -1. >>> >>> # Since we do not want these values removed by to.period >>> >>> if (is.na(dat.i[1])) { >>> >>> dat.i[1] <- -1 >>> >>> } >>> >>> # Fix remaining NA's >>> >>> dat.i <- na.locf(dat.i) >>> >>> # Convert to desired bucket size >>> >>> dat.i <- to.period(dat.i, period=per, k=subper, name=NULL) >>> >>> # Always use templated index, otherwise merge fails with other >>> symbols >>> >>> index(dat.i) <- index(to.period(templateTimes, period=per, >>> k=subper)) >>> >>> # If there was missing data at open, set close to NA >>> >>> valsToChange <- which(dat.i[,"Open"] == -1) >>> >>> if (length(valsToChange) != 0) { >>> >>> dat.i[valsToChange, "Close"] <- NA >>> >>> } >>> >>> if(i == 1) { >>> >>> DAT = fun(dat.i) >>> >>> } else { >>> >>> DAT = merge(DAT,fun(dat.i)) >>> >>> } >>> >>> setTxtProgressBar(pb, i) >>> >>> } >>> >>> close(pb) >>> >>> colnames(DAT) = tickerNames >>> >>> return(DAT) >>> >>> } >>> >>> >>> >>> parseTickData <- function(inputFile) { >>> >>> DAT.list <- scan(file=inputFile, >>> sep=",",skip=1,what=list(Date="",Time="",Close=0,Volume=0),quiet=T) >>> >>> index <- >>> as.POSIXct(paste(DAT.list$Date,DAT.list$Time),format="%m/%d/%Y >>> %H:%M:%S") >>> >>> DAT.xts <- xts(DAT.list$Close,index) >>> >>> DAT.xts <- make.index.unique(DAT.xts) >>> >>> return(DAT.xts) >>> >>> } >>> >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel