>>>>> "Kevin" == Kevin B Hendricks <[EMAIL PROTECTED]> >>>>> on Fri, 28 Jul 2006 14:53:57 -0400 writes:
[.........] Kevin> The idea is to somehow make functions that work well Kevin> over small sub- sequences of a much longer vector Kevin> without resorting to splitting the vector into many Kevin> smaller vectors. Kevin> In my particular case, the problem was my data frame Kevin> had over 1 million lines had probably over 500,000 Kevin> unique sort keys (ie. think of it as an R factor with Kevin> over 500,000 levels). The implementation of "by" Kevin> uses "tapply" which in turn uses "split". So "split" Kevin> simply ate up all the time trying to create 500,000 Kevin> vectors each of short length 1, 2, or 3; and the Kevin> associated garbage collection. Not that I have spent enough time thinking about this thread's topic, but I have seen more than one case where using tapply() unnecessarily slowed down computations. I don't remember the details, but know that in one case, replacing tapply() by a few lines of code {one of which using lapply() IIRC}, sped up that computation by a factor (of 2 ? or more?). I also vaguely remember that I thought about making tapply() faster, but came to the conclusion it could not be sped up quickly, because it works in a quite more general context than it was used in that application (and maybe yours?). Kevin> I simple loop that walked the short sequence of Kevin> values (since the data frame was already sorted) Kevin> calculating what it needed, would work much faster Kevin> than splitting the original vector into so very many Kevin> smaller vectors (and the associated copying of data). Kevin> That problem is very similar problem to the Kevin> calculation of basic stats on a short moving window Kevin> over a very long vector. >> The author of that message ultimately wrote the caTools R >> package which contains some optimized versions. Kevin> I will look into that package and maybe use it for a Kevin> model for what I want to do. Kevin> Thanks, Kevin> Kevin Kevin> ______________________________________________ Kevin> R-devel@r-project.org mailing list Kevin> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel