Re: [R] Aggragating subsets of data in larger vector with sapply
Hi Chris, This seems to work on the sample data you provided. FUN - function(x) { x - xts(as.numeric(x),index(x)) period.apply(x, endpoints(x,secs), sum) } lapply(split.default(xSym$Size,xSym$Direction), FUN) Best, -- Joshua Ulrich | FOSS Trading: www.fosstrading.com On Sun, Jan 9, 2011 at 6:10 PM, rivercode aqua...@gmail.com wrote: Have 40,000 rows of buy/sell trade data and am trying to add up the buys for each second, the code works but it is very slow. Any suggestions how to improve the sapply function ? secEP = endpoints(xSym$Direction, secs) # vector of last second on an XTS timeseries object with multiple entries for each second. d = xSym$Direction s = xSym$Size buySize = sapply(1:(length(secEP)-1), function(y) { i = (secEP[y]+ 1):secEP[y+1]; # index of vectors between each secEP return(sum(as.numeric(s[i][d[i] == buy]))); } ) Object details: secEP = numeric Vector of one second Endpoints in xSym$Direction. head(xSym$Direction) Direction 2011-01-05 09:30:00 unkn 2011-01-05 09:30:02 sell 2011-01-05 09:30:02 buy 2011-01-05 09:30:04 buy 2011-01-05 09:30:04 buy 2011-01-05 09:30:04 buy head(xSym$Size) Size 2011-01-05 09:30:00 865 2011-01-05 09:30:02 100 2011-01-05 09:30:02 100 2011-01-05 09:30:04 100 2011-01-05 09:30:04 100 2011-01-05 09:30:04 41 Thanks, Chris -- View this message in context: http://r.789695.n4.nabble.com/Aggragating-subsets-of-data-in-larger-vector-with-sapply-tp3206445p3206445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Aggragating subsets of data in larger vector with sapply
split the data by truncating the time to a second, then process each group. this will save the subsetting you are doing. also merge the data with direction and size in the same frame. it looks like you can subset by buy to begin with. Sent from my iPad On Jan 9, 2011, at 19:10, rivercode aqua...@gmail.com wrote: Have 40,000 rows of buy/sell trade data and am trying to add up the buys for each second, the code works but it is very slow. Any suggestions how to improve the sapply function ? secEP = endpoints(xSym$Direction, secs) # vector of last second on an XTS timeseries object with multiple entries for each second. d = xSym$Direction s = xSym$Size buySize = sapply(1:(length(secEP)-1), function(y) { i = (secEP[y]+ 1):secEP[y+1]; # index of vectors between each secEP return(sum(as.numeric(s[i][d[i] == buy]))); } ) Object details: secEP = numeric Vector of one second Endpoints in xSym$Direction. head(xSym$Direction) Direction 2011-01-05 09:30:00 unkn 2011-01-05 09:30:02 sell 2011-01-05 09:30:02 buy 2011-01-05 09:30:04 buy 2011-01-05 09:30:04 buy 2011-01-05 09:30:04 buy head(xSym$Size) Size 2011-01-05 09:30:00 865 2011-01-05 09:30:02 100 2011-01-05 09:30:02 100 2011-01-05 09:30:04 100 2011-01-05 09:30:04 100 2011-01-05 09:30:04 41 Thanks, Chris -- View this message in context: http://r.789695.n4.nabble.com/Aggragating-subsets-of-data-in-larger-vector-with-sapply-tp3206445p3206445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.