Re: [Rd] [SPAM?] [R] read.csv behaviour

2011-09-28 Thread Jeff Newmiller
The consensus is that it is a very bad idea for a file nominally described as a 
CSV file to have a varying numbers of fields from record to record. If you have 
need for such a file and go to whatever lengths you need to in order to create 
it, please mark it as something other than a CSV file when you are done.
With this in mind, you may be in a better position to understand why the 
pre-built facilities do not provide much support in your endeavor.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Mehmet Suzen msu...@mango-solutions.com wrote:


This might be obvious but I was wondering if anyone knows quick and easy
way of writing out a CSV file with varying row lengths, ideally an
initial data read from a CSV file which has the same format. See example
below.


I found it quite strange that R cannot write it in one go, so one must
append blocks or post-process the file, is this true? (even Ruby can do
it!!) 

Otherwise it puts ,, or similar for missing column values in the
shorter length rows and fill=FALSE option do not work!

I don't want to post-process if possible.

See this post:
http://r.789695.n4.nabble.com/Re-read-csv-trap-td3301924.html

Example that generated Error!

writeLines(c(A,B,C,D,
1,a,b,c,
2,f,g,c,
3,a,i,j,
4,a,b,c,
5,d,e,f,
6,g,h,i,j,k,l,m,n),
con=file(test.csv))

read.csv(test.csv)
try(read.csv(test.csv,fill=FALSE))
LEGAL NOTICE\ This message is intended for the use o...{...{{dropped:14}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] How to set values

2011-09-28 Thread jorgeA
I'm not sure if I have understood your question, so I will put two
possibilities: 

1 - You have a variable called X =0 and a variable called Y = 12 and
want that to be numeric. You can do this: 

X - as.numeric(X) 
Y - as.numeric(Y) 

2 - You have a variable called XY = 0,12 and want to make a list of
numeric. Then you can do something like this: 

XY - as.numeric(unlist(strsplit(XY, ,))) 

I hope this helps, 

Best regards 
Jorge Aikes Junior

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-set-values-tp3823290p3848526.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] load.R patch suggestion

2011-09-28 Thread Martin Maechler
 Ben Bolker bbol...@gmail.com
 on Thu, 15 Sep 2011 11:03:26 +0200 writes:

 Inspired by

 
http://stackoverflow.com/questions/7487778/could-you-tell-me-what-this-error-means

 I wrote the following very small (one-line) patch which
 returns an *informative* error message when R tries to
 load a zero-byte file rather than

 Error in if (!grepl(RD[AX]2\n, magic)) { : argument is
 of length zero

   I would guess that error messages with the word magic
 in them would be disturbing to new users, who are probably
 worried already that R is magic ...

:-)  indeed...

While it would not be a good idea to program around such error
messages in general, as each extra if(...) is executed everytime
the function is called, i.e. has a (albeit *very small*) penalty for
every correct call just for the sake of that message in the
erronous call case,
I do agree that it is worth here and so have added it (for
R-devel only).

Thank you, Ben.

   Ben Bolker


 --
 Index: load.R
 ===
 --- load.R(revision 56743)
 +++ load.R(working copy)
 @@ -25,6 +25,7 @@
  ## Since the connection is not open this opens it in binary mode
  ## and closes it again.
  magic - readChar(con, 5L, useBytes = TRUE)
 +if (length(magic)==0) stop(empty (zero-byte) file)
  if (!grepl(RD[AX]2\n, magic)) {
  ## a check while we still know the call to load()
  if(grepl(RD[ABX][12]\r, magic))
 
 --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] read.csv behaviour

2011-09-28 Thread Jim Lemon

On 09/28/2011 09:23 AM, Mehmet Suzen wrote:


This might be obvious but I was wondering if anyone knows quick and easy
way of writing out a CSV file with varying row lengths, ideally an
initial data read from a CSV file which has the same format. See example
below.


I found it quite strange that R cannot write it in one go, so one must
append blocks or post-process the file, is this true? (even Ruby can do
it!!)

Otherwise it puts ,, or similar for missing column values in the
shorter length rows and fill=FALSE option do not work!

I don't want to post-process if possible.

See this post:
http://r.789695.n4.nabble.com/Re-read-csv-trap-td3301924.html

Example that generated Error!

writeLines(c(A,B,C,D,
  1,a,b,c,
  2,f,g,c,
  3,a,i,j,
  4,a,b,c,
  5,d,e,f,
  6,g,h,i,j,k,l,m,n),
con=file(test.csv))

read.csv(test.csv)
try(read.csv(test.csv,fill=FALSE))


Hi Mehmet,
The example doesn't need to call file, writeLines does it for you. It 
worked for me:


 writeLines(c(A,B,C,D,
  1,a,b,c,
  2,f,g,c,
  3,a,i,j,
  4,a,b,c,
  5,d,e,f,
  6,g,h,i,j,k,l,m,n),
con=test.csv)

and to get the original object back, use:

readLines(test.csv)

The reason you can't use read.csv is that it returns a data frame, and 
that object can't have elements of unequal length. If you want an object 
with elements of unequal length, try:


as.list(readLines(test.csv))

Jim

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] read.csv behaviour

2011-09-28 Thread Ben Bolker
Mehmet Suzen msuzen at mango-solutions.com writes:

 This might be obvious but I was wondering if anyone knows quick and easy
 way of writing out a CSV file with varying row lengths, ideally an
 initial data read from a CSV file which has the same format. See example
 below.
 
 writeLines(c(A,B,C,D,
  1,a,b,c,
  2,f,g,c,
  3,a,i,j,
  4,a,b,c,
  5,d,e,f,
  6,g,h,i,j,k,l,m,n),
con=file(test.csv))
 

X - read.csv(test.csv)


  It's not that pretty, but something like


tmpf - function(x) paste(x[nzchar(x)],collapse=,)
writeLines(apply(as.matrix(X),1,tmpf),con=outfile.csv)

  might work

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] serialize/unserialize vector improvement

2011-09-28 Thread Michael Spiegel
Hi folks,

I've attached a patch to the svn trunk that improves the performance
of the serialize/unserialize interface for vector types. The current
implementation: a) invokes the R_XDREncode operation for each element
of the vector type, and b) uses a switch statement to determine the
stream type for each element of the vector type. I've added
R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements
at a time, and I've reorganized the implementation so that the stream
type is not queried once per element.

In the following microbenchmark (below), I've observed performance
improvements of about x2.4.  In a real benchmark that is using the
serialization interface to make MPI calls, I see about a 10%
improvement in performance.

Cheers,
--Michael

microbenchmark:

input - matrix(1:1, 1, 1)
output - serialize(input, NULL)
for(i in 1:10) { print(system.time(serialize(input, NULL))) }
for(i in 1:10) { print(system.time(unserialize(output))) }
Index: src/include/Rinternals.h
===
--- src/include/Rinternals.h	(revision 57107)
+++ src/include/Rinternals.h	(working copy)
@@ -749,6 +749,7 @@
 void Rf_warningcall_immediate(SEXP, const char *, ...);
 
 /* Save/Load Interface */
+#define R_XDR_COMPLEX_SIZE 16
 #define R_XDR_DOUBLE_SIZE 8
 #define R_XDR_INTEGER_SIZE 4
 
@@ -757,6 +758,13 @@
 void R_XDREncodeInteger(int i, void *buf);
 int R_XDRDecodeInteger(void *buf);
 
+void R_XDREncodeDoubleVector(double *d, void *buf, int len);
+void R_XDRDecodeDoubleVector(void *input, double *output, int len);
+void R_XDREncodeComplexVector(Rcomplex *c, void *buf, int len);
+void R_XDRDecodeComplexVector(void *input, Rcomplex *output, int len);
+void R_XDREncodeIntegerVector(int *i, void *buf, int len);
+void R_XDRDecodeIntegerVector(void *input, int *output, int len);
+
 typedef void *R_pstream_data_t;
 
 typedef enum {
Index: src/main/serialize.c
===
--- src/main/serialize.c	(revision 57107)
+++ src/main/serialize.c	(working copy)
@@ -792,20 +792,62 @@
 	WriteItem(STRING_ELT(s, i), ref_table, stream);
 }
 
-/* e.g., OutVec(fp, obj, INTEGER, OutInteger) */
-#define OutVec(fp, obj, accessor, outfunc)\
-	do {\
-		int cnt;		\
-		for (cnt = 0; cnt  LENGTH(obj); ++cnt)		\
-			outfunc(fp, accessor(obj, cnt));		\
-	} while (0)
-
-#define LOGICAL_ELT(x,__i__)	LOGICAL(x)[__i__]
 #define INTEGER_ELT(x,__i__)	INTEGER(x)[__i__]
 #define REAL_ELT(x,__i__)	REAL(x)[__i__]
 #define COMPLEX_ELT(x,__i__)	COMPLEX(x)[__i__]
 #define RAW_ELT(x,__i__)	RAW(x)[__i__]
 
+#define OutVec(NAME, CAPNAME, XDR, CAPXDR, TYPE) \
+static R_INLINE void Out ## NAME ## Vec(R_outpstream_t stream, SEXP s, int length) \
+{\
+	OutInteger(stream, length);	\
+	switch (stream-type) {		\
+	case R_pstream_xdr_format:	\
+		if (length  (128 / R_XDR_## CAPXDR ##_SIZE))			\
+		{		\
+			char *buf = Calloc( R_XDR_ ## CAPXDR ## _SIZE * length, char); 		\
+			R_XDREncode ## XDR ## Vector(CAPNAME(s), buf, length);\
+			stream-OutBytes(stream, buf, R_XDR_ ## CAPXDR ## _SIZE * length);	\
+			Free(buf);			\
+		} else {\
+			char buf[128];		\
+			R_XDREncode ## XDR ## Vector(CAPNAME(s), buf, length);\
+			stream-OutBytes(stream, buf, R_XDR_ ## CAPXDR ## _SIZE * length);	\
+		}		\
+		break;	\
+	case R_pstream_binary_format:\
+		stream-OutBytes(stream, CAPNAME(s), sizeof(TYPE) * length);			\
+		break;	\
+	default:	\
+	{			\
+		int cnt;\
+		for (cnt = 0; cnt  length; ++cnt)		\
+			Out ## NAME(stream, CAPNAME ## _ELT(s, cnt));		\
+	}			\
+	}			\
+}
+
+OutVec(Integer, INTEGER, Integer, INTEGER, int)
+OutVec(Real, REAL, Double, DOUBLE, double)
+OutVec(Complex, COMPLEX, Complex, COMPLEX, Rcomplex)
+
+static R_INLINE void OutByteVec(R_outpstream_t stream, SEXP s, int length)
+{
+	OutInteger(stream, length);
+	switch (stream-type) {
+	case R_pstream_xdr_format:
+	case R_pstream_binary_format:
+		stream-OutBytes(stream, RAW(s), length);
+		break;
+	default:
+	{
+		int cnt;
+		for (cnt = 0; cnt  length; ++cnt)
+			OutByte(stream, RAW_ELT(s, cnt));
+	}
+	}
+}
+
 static void WriteItem (SEXP s, SEXP ref_table, R_outpstream_t stream)
 {
 int i;
@@ -932,16 +974,13 @@
 	break;
 	case LGLSXP:
 	case INTSXP:
-	OutInteger(stream, LENGTH(s));
-	OutVec(stream, s, INTEGER_ELT, OutInteger);
+	OutIntegerVec(stream, s, LENGTH(s));
 	break;
 	case REALSXP:
-	OutInteger(stream, LENGTH(s));
-	OutVec(stream, s, REAL_ELT, OutReal);
+	OutRealVec(stream, s, LENGTH(s));
 	break;
 	case CPLXSXP:
-	OutInteger(stream, LENGTH(s));
-	OutVec(stream, s, COMPLEX_ELT, OutComplex);
+	OutComplexVec(stream, s, LENGTH(s));
 	break;
 	case