Re: [Rd] [SPAM?] [R] read.csv behaviour
The consensus is that it is a very bad idea for a file nominally described as a CSV file to have a varying numbers of fields from record to record. If you have need for such a file and go to whatever lengths you need to in order to create it, please mark it as something other than a CSV file when you are done. With this in mind, you may be in a better position to understand why the pre-built facilities do not provide much support in your endeavor. --- Jeff Newmiller The . . Go Live... DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Mehmet Suzen msu...@mango-solutions.com wrote: This might be obvious but I was wondering if anyone knows quick and easy way of writing out a CSV file with varying row lengths, ideally an initial data read from a CSV file which has the same format. See example below. I found it quite strange that R cannot write it in one go, so one must append blocks or post-process the file, is this true? (even Ruby can do it!!) Otherwise it puts ,, or similar for missing column values in the shorter length rows and fill=FALSE option do not work! I don't want to post-process if possible. See this post: http://r.789695.n4.nabble.com/Re-read-csv-trap-td3301924.html Example that generated Error! writeLines(c(A,B,C,D, 1,a,b,c, 2,f,g,c, 3,a,i,j, 4,a,b,c, 5,d,e,f, 6,g,h,i,j,k,l,m,n), con=file(test.csv)) read.csv(test.csv) try(read.csv(test.csv,fill=FALSE)) LEGAL NOTICE\ This message is intended for the use o...{...{{dropped:14}} __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to set values
I'm not sure if I have understood your question, so I will put two possibilities: 1 - You have a variable called X =0 and a variable called Y = 12 and want that to be numeric. You can do this: X - as.numeric(X) Y - as.numeric(Y) 2 - You have a variable called XY = 0,12 and want to make a list of numeric. Then you can do something like this: XY - as.numeric(unlist(strsplit(XY, ,))) I hope this helps, Best regards Jorge Aikes Junior -- View this message in context: http://r.789695.n4.nabble.com/How-to-set-values-tp3823290p3848526.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] load.R patch suggestion
Ben Bolker bbol...@gmail.com on Thu, 15 Sep 2011 11:03:26 +0200 writes: Inspired by http://stackoverflow.com/questions/7487778/could-you-tell-me-what-this-error-means I wrote the following very small (one-line) patch which returns an *informative* error message when R tries to load a zero-byte file rather than Error in if (!grepl(RD[AX]2\n, magic)) { : argument is of length zero I would guess that error messages with the word magic in them would be disturbing to new users, who are probably worried already that R is magic ... :-) indeed... While it would not be a good idea to program around such error messages in general, as each extra if(...) is executed everytime the function is called, i.e. has a (albeit *very small*) penalty for every correct call just for the sake of that message in the erronous call case, I do agree that it is worth here and so have added it (for R-devel only). Thank you, Ben. Ben Bolker -- Index: load.R === --- load.R(revision 56743) +++ load.R(working copy) @@ -25,6 +25,7 @@ ## Since the connection is not open this opens it in binary mode ## and closes it again. magic - readChar(con, 5L, useBytes = TRUE) +if (length(magic)==0) stop(empty (zero-byte) file) if (!grepl(RD[AX]2\n, magic)) { ## a check while we still know the call to load() if(grepl(RD[ABX][12]\r, magic)) -- __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [R] read.csv behaviour
On 09/28/2011 09:23 AM, Mehmet Suzen wrote: This might be obvious but I was wondering if anyone knows quick and easy way of writing out a CSV file with varying row lengths, ideally an initial data read from a CSV file which has the same format. See example below. I found it quite strange that R cannot write it in one go, so one must append blocks or post-process the file, is this true? (even Ruby can do it!!) Otherwise it puts ,, or similar for missing column values in the shorter length rows and fill=FALSE option do not work! I don't want to post-process if possible. See this post: http://r.789695.n4.nabble.com/Re-read-csv-trap-td3301924.html Example that generated Error! writeLines(c(A,B,C,D, 1,a,b,c, 2,f,g,c, 3,a,i,j, 4,a,b,c, 5,d,e,f, 6,g,h,i,j,k,l,m,n), con=file(test.csv)) read.csv(test.csv) try(read.csv(test.csv,fill=FALSE)) Hi Mehmet, The example doesn't need to call file, writeLines does it for you. It worked for me: writeLines(c(A,B,C,D, 1,a,b,c, 2,f,g,c, 3,a,i,j, 4,a,b,c, 5,d,e,f, 6,g,h,i,j,k,l,m,n), con=test.csv) and to get the original object back, use: readLines(test.csv) The reason you can't use read.csv is that it returns a data frame, and that object can't have elements of unequal length. If you want an object with elements of unequal length, try: as.list(readLines(test.csv)) Jim __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] read.csv behaviour
Mehmet Suzen msuzen at mango-solutions.com writes: This might be obvious but I was wondering if anyone knows quick and easy way of writing out a CSV file with varying row lengths, ideally an initial data read from a CSV file which has the same format. See example below. writeLines(c(A,B,C,D, 1,a,b,c, 2,f,g,c, 3,a,i,j, 4,a,b,c, 5,d,e,f, 6,g,h,i,j,k,l,m,n), con=file(test.csv)) X - read.csv(test.csv) It's not that pretty, but something like tmpf - function(x) paste(x[nzchar(x)],collapse=,) writeLines(apply(as.matrix(X),1,tmpf),con=outfile.csv) might work __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] serialize/unserialize vector improvement
Hi folks, I've attached a patch to the svn trunk that improves the performance of the serialize/unserialize interface for vector types. The current implementation: a) invokes the R_XDREncode operation for each element of the vector type, and b) uses a switch statement to determine the stream type for each element of the vector type. I've added R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements at a time, and I've reorganized the implementation so that the stream type is not queried once per element. In the following microbenchmark (below), I've observed performance improvements of about x2.4. In a real benchmark that is using the serialization interface to make MPI calls, I see about a 10% improvement in performance. Cheers, --Michael microbenchmark: input - matrix(1:1, 1, 1) output - serialize(input, NULL) for(i in 1:10) { print(system.time(serialize(input, NULL))) } for(i in 1:10) { print(system.time(unserialize(output))) } Index: src/include/Rinternals.h === --- src/include/Rinternals.h (revision 57107) +++ src/include/Rinternals.h (working copy) @@ -749,6 +749,7 @@ void Rf_warningcall_immediate(SEXP, const char *, ...); /* Save/Load Interface */ +#define R_XDR_COMPLEX_SIZE 16 #define R_XDR_DOUBLE_SIZE 8 #define R_XDR_INTEGER_SIZE 4 @@ -757,6 +758,13 @@ void R_XDREncodeInteger(int i, void *buf); int R_XDRDecodeInteger(void *buf); +void R_XDREncodeDoubleVector(double *d, void *buf, int len); +void R_XDRDecodeDoubleVector(void *input, double *output, int len); +void R_XDREncodeComplexVector(Rcomplex *c, void *buf, int len); +void R_XDRDecodeComplexVector(void *input, Rcomplex *output, int len); +void R_XDREncodeIntegerVector(int *i, void *buf, int len); +void R_XDRDecodeIntegerVector(void *input, int *output, int len); + typedef void *R_pstream_data_t; typedef enum { Index: src/main/serialize.c === --- src/main/serialize.c (revision 57107) +++ src/main/serialize.c (working copy) @@ -792,20 +792,62 @@ WriteItem(STRING_ELT(s, i), ref_table, stream); } -/* e.g., OutVec(fp, obj, INTEGER, OutInteger) */ -#define OutVec(fp, obj, accessor, outfunc)\ - do {\ - int cnt; \ - for (cnt = 0; cnt LENGTH(obj); ++cnt) \ - outfunc(fp, accessor(obj, cnt)); \ - } while (0) - -#define LOGICAL_ELT(x,__i__) LOGICAL(x)[__i__] #define INTEGER_ELT(x,__i__) INTEGER(x)[__i__] #define REAL_ELT(x,__i__) REAL(x)[__i__] #define COMPLEX_ELT(x,__i__) COMPLEX(x)[__i__] #define RAW_ELT(x,__i__) RAW(x)[__i__] +#define OutVec(NAME, CAPNAME, XDR, CAPXDR, TYPE) \ +static R_INLINE void Out ## NAME ## Vec(R_outpstream_t stream, SEXP s, int length) \ +{\ + OutInteger(stream, length); \ + switch (stream-type) { \ + case R_pstream_xdr_format: \ + if (length (128 / R_XDR_## CAPXDR ##_SIZE)) \ + { \ + char *buf = Calloc( R_XDR_ ## CAPXDR ## _SIZE * length, char); \ + R_XDREncode ## XDR ## Vector(CAPNAME(s), buf, length);\ + stream-OutBytes(stream, buf, R_XDR_ ## CAPXDR ## _SIZE * length); \ + Free(buf); \ + } else {\ + char buf[128]; \ + R_XDREncode ## XDR ## Vector(CAPNAME(s), buf, length);\ + stream-OutBytes(stream, buf, R_XDR_ ## CAPXDR ## _SIZE * length); \ + } \ + break; \ + case R_pstream_binary_format:\ + stream-OutBytes(stream, CAPNAME(s), sizeof(TYPE) * length); \ + break; \ + default: \ + { \ + int cnt;\ + for (cnt = 0; cnt length; ++cnt) \ + Out ## NAME(stream, CAPNAME ## _ELT(s, cnt)); \ + } \ + } \ +} + +OutVec(Integer, INTEGER, Integer, INTEGER, int) +OutVec(Real, REAL, Double, DOUBLE, double) +OutVec(Complex, COMPLEX, Complex, COMPLEX, Rcomplex) + +static R_INLINE void OutByteVec(R_outpstream_t stream, SEXP s, int length) +{ + OutInteger(stream, length); + switch (stream-type) { + case R_pstream_xdr_format: + case R_pstream_binary_format: + stream-OutBytes(stream, RAW(s), length); + break; + default: + { + int cnt; + for (cnt = 0; cnt length; ++cnt) + OutByte(stream, RAW_ELT(s, cnt)); + } + } +} + static void WriteItem (SEXP s, SEXP ref_table, R_outpstream_t stream) { int i; @@ -932,16 +974,13 @@ break; case LGLSXP: case INTSXP: - OutInteger(stream, LENGTH(s)); - OutVec(stream, s, INTEGER_ELT, OutInteger); + OutIntegerVec(stream, s, LENGTH(s)); break; case REALSXP: - OutInteger(stream, LENGTH(s)); - OutVec(stream, s, REAL_ELT, OutReal); + OutRealVec(stream, s, LENGTH(s)); break; case CPLXSXP: - OutInteger(stream, LENGTH(s)); - OutVec(stream, s, COMPLEX_ELT, OutComplex); + OutComplexVec(stream, s, LENGTH(s)); break; case