RE: [R] fast mkChar
Thank you for the lead, Peter. It may be useful for other packages I write. As to the strings, I think I have to take what is already there. I agree that strings would be better managed in malloc-style fashion (probably with reference counter) and not by gc(). However I don't want to have a system with two different string classes, such close relatives seldom coexist peacefully. BTW, the slowness of mkChar explains why R is so slow when it needs to compute names for long vectors. Thank you for an interesting discussion, Vadim > -Original Message- > From: Peter Dalgaard [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 08, 2004 3:35 PM > To: Vadim Ogranovich > Cc: R-Help > Subject: Re: [R] fast mkChar > > "Vadim Ogranovich" <[EMAIL PROTECTED]> writes: > > > I am no expert in memory management in R so it's hard for > me to tell > > what is and what is not doable. From reading the code of > allocVector() > > in memory.c I think that the critical part is to vectorize > > CLASS_GET_FREE_NODE and use the vectorized version along > the lines of > > the code fragment below (taken from memory.c). > > > > if (node_class < NUM_SMALL_NODE_CLASSES) { > > CLASS_GET_FREE_NODE(node_class, s); > > > > If this is possible than the rest is just a matter of code > refactoring. > > > > By vectorizing I mean writing a macro > CLASS_GET_FREE_NODE2(node_class, > > s, n) which in one go allocates n little objects of class > node_class > > and "inscribes" them into the elements of vector s, which > is assumed > > to be long enough to hold these objects. > > > > If this is doable than the only missing piece would be a > new function > > setChar(CHARSXP rstr, const char * cstr) which copies > 'cstr' into 'rstr' > > and (re)allocates the heap memory if necessary. Here the setChar() > > macro is safe since s[i]-s are all brand new and thus are > not shared > > with any other object. > > I had a similar idea initially, but I don't think it can fly: > First, allocating n objects at once is not likely to be much > faster than allocating them one-by-one, especially when you > consider the implications of having to deal with > near-out-of-memory conditions. > Second, you have to know the string lengths when allocating, > since the structure of a vector object (CHARSXP) is a header > immediately followed by the data. > > A more interesting line to pursue is that - depending on what > it really is that you need - you might be able to create a > different kind of object that could "walk and quack" like a > character vector, but is stored differently internally. E.g. > you could set up a representation that is just a block of > pointers, pointing to strings that are being maintained in > malloc-style. > > Have a look at External pointers and finalization. > > > -- >O__ Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: > (+45) 35327907 > > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Wrong question [Wasn't: Re: [R] fast mkChar]
On Wednesday 09 of June 2004 09:52, you wrote: > This is my first message to the list and I believe the question > I am including is a simple one. http://www.r-project.org/posting-guide.html -- Matej Cepl, http://www.ceplovi.cz/matej GPG Finger: 89EF 4BC6 288A BF43 1BAB 25C3 E09F EF25 D964 84AC 138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488 Of course I'm respectable. I'm old. Politicians, ugly buildings, and whores all get respectable if they last long enough. --John Huston in "Chinatown." __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] fast mkChar
Hello everyone This is my first message to the list and I believe the question I am including is a simple one. I have a matrix where I need to calculate ANOVA for the rows as the columns represent a different treatment. I would like to know if there is a command or a series of commans that I can enter to do that. At the moment I have a external script that extracts each row from the matrix, transforms it in a column, another factor columns is add and the text file is thrown to Rterm --vanilla. Any help is appreciated. Thanks a lot Paulo Nuin __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] fast mkChar
"Vadim Ogranovich" <[EMAIL PROTECTED]> writes: > I am no expert in memory management in R so it's hard for me to tell > what is and what is not doable. From reading the code of allocVector() > in memory.c I think that the critical part is to vectorize > CLASS_GET_FREE_NODE and use the vectorized version along the lines of > the code fragment below (taken from memory.c). > > if (node_class < NUM_SMALL_NODE_CLASSES) { > CLASS_GET_FREE_NODE(node_class, s); > > If this is possible than the rest is just a matter of code refactoring. > > By vectorizing I mean writing a macro CLASS_GET_FREE_NODE2(node_class, > s, n) which in one go allocates n little objects of class node_class and > "inscribes" them into the elements of vector s, which is assumed to be > long enough to hold these objects. > > If this is doable than the only missing piece would be a new function > setChar(CHARSXP rstr, const char * cstr) which copies 'cstr' into 'rstr' > and (re)allocates the heap memory if necessary. Here the setChar() macro > is safe since s[i]-s are all brand new and thus are not shared with any > other object. I had a similar idea initially, but I don't think it can fly: First, allocating n objects at once is not likely to be much faster than allocating them one-by-one, especially when you consider the implications of having to deal with near-out-of-memory conditions. Second, you have to know the string lengths when allocating, since the structure of a vector object (CHARSXP) is a header immediately followed by the data. A more interesting line to pursue is that - depending on what it really is that you need - you might be able to create a different kind of object that could "walk and quack" like a character vector, but is stored differently internally. E.g. you could set up a representation that is just a block of pointers, pointing to strings that are being maintained in malloc-style. Have a look at External pointers and finalization. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] fast mkChar
I am no expert in memory management in R so it's hard for me to tell what is and what is not doable. From reading the code of allocVector() in memory.c I think that the critical part is to vectorize CLASS_GET_FREE_NODE and use the vectorized version along the lines of the code fragment below (taken from memory.c). if (node_class < NUM_SMALL_NODE_CLASSES) { CLASS_GET_FREE_NODE(node_class, s); If this is possible than the rest is just a matter of code refactoring. By vectorizing I mean writing a macro CLASS_GET_FREE_NODE2(node_class, s, n) which in one go allocates n little objects of class node_class and "inscribes" them into the elements of vector s, which is assumed to be long enough to hold these objects. If this is doable than the only missing piece would be a new function setChar(CHARSXP rstr, const char * cstr) which copies 'cstr' into 'rstr' and (re)allocates the heap memory if necessary. Here the setChar() macro is safe since s[i]-s are all brand new and thus are not shared with any other object. > -Original Message- > From: Peter Dalgaard [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 08, 2004 1:23 PM > To: Vadim Ogranovich > Cc: R-Help > Subject: Re: [R] fast mkChar > > "Vadim Ogranovich" <[EMAIL PROTECTED]> writes: > > > Hi, > > > > To speed up reading of large (few million lines) CSV files I am > > writing custom read functions (in C). By timing various > approaches I > > figured out that one of the bottlenecks in reading > character fields is > > the mkChar() function which on each call incurs a lot of > > garbage-collection-related overhead. > > > > I wonder if there is a "vectorized" version of mkChar, say > > mkChar2(char **, int length) that converts an array of C > strings to a > > string vector, which somehow amortizes the gc overhead over > the entire array? > > > > If no such function exists, I'd appreciate any hint as to > how to write > > it. > > The real issue here is that character vectors are implemented > as generic vectors of little R objects (CHARSXP type) that > each hold one string. Allocating all those objects is > probably what does you in. > > The reason behind the implementation is probably that doing > it that way allows the mechanics of the garbage collector to > be applied directly (CHARSXPs are just vectors of bytes), but > it is obviously wasteful in terms of total allocation. If you > can think up something better, please say so (but remember > that the memory management issues are nontrivial). > > -- >O__ Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: > (+45) 35327907 > > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] fast mkChar
On Tue, 8 Jun 2004 12:23:58 -0700, "Vadim Ogranovich" <[EMAIL PROTECTED]> wrote : >Hi, > >To speed up reading of large (few million lines) CSV files I am writing >custom read functions (in C). By timing various approaches I figured out >that one of the bottlenecks in reading character fields is the mkChar() >function which on each call incurs a lot of garbage-collection-related >overhead. > >I wonder if there is a "vectorized" version of mkChar, say mkChar2(char >**, int length) that converts an array of C strings to a string vector, >which somehow amortizes the gc overhead over the entire array? > >If no such function exists, I'd appreciate any hint as to how to write >it. It's not easy. Internally R strings always have a header at the front, so you need to allocate memory and move C strings to get R to understand them. Duncan Murdoch __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] fast mkChar
"Vadim Ogranovich" <[EMAIL PROTECTED]> writes: > Hi, > > To speed up reading of large (few million lines) CSV files I am writing > custom read functions (in C). By timing various approaches I figured out > that one of the bottlenecks in reading character fields is the mkChar() > function which on each call incurs a lot of garbage-collection-related > overhead. > > I wonder if there is a "vectorized" version of mkChar, say mkChar2(char > **, int length) that converts an array of C strings to a string vector, > which somehow amortizes the gc overhead over the entire array? > > If no such function exists, I'd appreciate any hint as to how to write > it. The real issue here is that character vectors are implemented as generic vectors of little R objects (CHARSXP type) that each hold one string. Allocating all those objects is probably what does you in. The reason behind the implementation is probably that doing it that way allows the mechanics of the garbage collector to be applied directly (CHARSXPs are just vectors of bytes), but it is obviously wasteful in terms of total allocation. If you can think up something better, please say so (but remember that the memory management issues are nontrivial). -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html