Re: [R] How to split a data.frame into its columns?
> On Aug 28, 2016, at 11:14 PM, Marius Hofert > wrote: > > Hi, > > I need a fast way to split a data.frame (and matrix) into a list of > columns. This is a bit of a puzzle since data.frame objects are by definition "lists of columns". If you want a data.frame object (say it's name is dat) to _only be a list of columns then dat <- unclass(dat) The split.data.frame function splits by rows since that is the most desired and expected behavior and because the authors of S/R probably thought there was no point in making the split "by columns" when it already was. -- David. > For matrices, split(x, col(x)) works (which can then be done > in C for speed-up, if necessary), but for a data.frame? split(iris, > col(iris)) does not work as expected (?). > The outcome should be lapply(seq_len(ncol(iris)), function(j) > iris[,j]) and not require additional packages (if possible). > > Thanks & cheers, > Marius > > PS: Below is the C code for matrices. Not sure how easy it would be to > extend that to data.frames (?) > > SEXP col_split(SEXP x) > { >/* Setup */ >int *dims = INTEGER(getAttrib(x, R_DimSymbol)); >int n = dims[0], d = dims[1]; >SEXP res = PROTECT(allocVector(VECSXP, d)); >SEXP ref; >int i = 0, j, k; > >/* Distinguish int/real matrices */ >switch (TYPEOF(x)) { >case INTSXP: >for(j = 0; j < d; j++) { >SET_VECTOR_ELT(res, j, allocVector(INTSXP, n)); >int *e = INTEGER(VECTOR_ELT(res, j)); >for(k = 0 ; k < n ; i++, k++) { >e[k] = INTEGER(x)[i]; >} >} >break; >case REALSXP: >for(j = 0; j < d; j++) { >SET_VECTOR_ELT(res, j, allocVector(REALSXP, n)); >double *e = REAL(VECTOR_ELT(res, j)); >for(k = 0 ; k < n ; i++, k++) { >e[k] = REAL(x)[i]; >} >} >break; >case LGLSXP: >for(j = 0; j < d; j++) { >SET_VECTOR_ELT(res, j, allocVector(LGLSXP, n)); >int *e = LOGICAL(VECTOR_ELT(res, j)); >for(k = 0 ; k < n ; i++, k++) { >e[k] = LOGICAL(x)[i]; >} >} >break; >case STRSXP: >for(j = 0; j < d; j++) { > ref = allocVector(STRSXP, n); >SET_VECTOR_ELT(res, j, ref); >ref = VECTOR_ELT(res, j); >for(k = 0 ; k < n ; i++, k++) { >SET_STRING_ELT(ref, k, STRING_ELT(x, i)); >} >} >break; >default: error("Wrong type of 'x': %s", CHAR(type2str_nowarn(TYPEOF(x; >} > >/* Return */ >UNPROTECT(1); >return(res); > } > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to split a data.frame into its columns?
Need to re-read the "Introduction to R". Data frames ARE lists of columns. So to convert a matrix to a list of vectors use as.data.frame( m ) -- Sent from my phone. Please excuse my brevity. On August 28, 2016 11:14:20 PM PDT, Marius Hofert wrote: >Hi, > >I need a fast way to split a data.frame (and matrix) into a list of >columns. For matrices, split(x, col(x)) works (which can then be done >in C for speed-up, if necessary), but for a data.frame? split(iris, >col(iris)) does not work as expected (?). >The outcome should be lapply(seq_len(ncol(iris)), function(j) >iris[,j]) and not require additional packages (if possible). > >Thanks & cheers, >Marius > >PS: Below is the C code for matrices. Not sure how easy it would be to >extend that to data.frames (?) > >SEXP col_split(SEXP x) >{ >/* Setup */ >int *dims = INTEGER(getAttrib(x, R_DimSymbol)); >int n = dims[0], d = dims[1]; >SEXP res = PROTECT(allocVector(VECSXP, d)); >SEXP ref; >int i = 0, j, k; > >/* Distinguish int/real matrices */ >switch (TYPEOF(x)) { >case INTSXP: >for(j = 0; j < d; j++) { >SET_VECTOR_ELT(res, j, allocVector(INTSXP, n)); >int *e = INTEGER(VECTOR_ELT(res, j)); >for(k = 0 ; k < n ; i++, k++) { >e[k] = INTEGER(x)[i]; >} >} >break; >case REALSXP: >for(j = 0; j < d; j++) { >SET_VECTOR_ELT(res, j, allocVector(REALSXP, n)); >double *e = REAL(VECTOR_ELT(res, j)); >for(k = 0 ; k < n ; i++, k++) { >e[k] = REAL(x)[i]; >} >} >break; >case LGLSXP: >for(j = 0; j < d; j++) { >SET_VECTOR_ELT(res, j, allocVector(LGLSXP, n)); >int *e = LOGICAL(VECTOR_ELT(res, j)); >for(k = 0 ; k < n ; i++, k++) { >e[k] = LOGICAL(x)[i]; >} >} >break; >case STRSXP: >for(j = 0; j < d; j++) { >ref = allocVector(STRSXP, n); >SET_VECTOR_ELT(res, j, ref); >ref = VECTOR_ELT(res, j); >for(k = 0 ; k < n ; i++, k++) { >SET_STRING_ELT(ref, k, STRING_ELT(x, i)); >} >} >break; >default: error("Wrong type of 'x': %s", >CHAR(type2str_nowarn(TYPEOF(x; >} > >/* Return */ >UNPROTECT(1); >return(res); >} > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to split a data.frame into its columns?
Hi, I need a fast way to split a data.frame (and matrix) into a list of columns. For matrices, split(x, col(x)) works (which can then be done in C for speed-up, if necessary), but for a data.frame? split(iris, col(iris)) does not work as expected (?). The outcome should be lapply(seq_len(ncol(iris)), function(j) iris[,j]) and not require additional packages (if possible). Thanks & cheers, Marius PS: Below is the C code for matrices. Not sure how easy it would be to extend that to data.frames (?) SEXP col_split(SEXP x) { /* Setup */ int *dims = INTEGER(getAttrib(x, R_DimSymbol)); int n = dims[0], d = dims[1]; SEXP res = PROTECT(allocVector(VECSXP, d)); SEXP ref; int i = 0, j, k; /* Distinguish int/real matrices */ switch (TYPEOF(x)) { case INTSXP: for(j = 0; j < d; j++) { SET_VECTOR_ELT(res, j, allocVector(INTSXP, n)); int *e = INTEGER(VECTOR_ELT(res, j)); for(k = 0 ; k < n ; i++, k++) { e[k] = INTEGER(x)[i]; } } break; case REALSXP: for(j = 0; j < d; j++) { SET_VECTOR_ELT(res, j, allocVector(REALSXP, n)); double *e = REAL(VECTOR_ELT(res, j)); for(k = 0 ; k < n ; i++, k++) { e[k] = REAL(x)[i]; } } break; case LGLSXP: for(j = 0; j < d; j++) { SET_VECTOR_ELT(res, j, allocVector(LGLSXP, n)); int *e = LOGICAL(VECTOR_ELT(res, j)); for(k = 0 ; k < n ; i++, k++) { e[k] = LOGICAL(x)[i]; } } break; case STRSXP: for(j = 0; j < d; j++) { ref = allocVector(STRSXP, n); SET_VECTOR_ELT(res, j, ref); ref = VECTOR_ELT(res, j); for(k = 0 ; k < n ; i++, k++) { SET_STRING_ELT(ref, k, STRING_ELT(x, i)); } } break; default: error("Wrong type of 'x': %s", CHAR(type2str_nowarn(TYPEOF(x; } /* Return */ UNPROTECT(1); return(res); } __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing the file
Based on the discussion of ORing values with characters in [1] which may generate "unusual" characters I suspect a botched conversion from EBCDIC may have messed with some of the data. If there are signed data fields then OP may need to read the original file and treat it as if it were binary data and do any needed translation themselves to retrieve it. Not a task for the faint of heart. [1] http://www.3480-3590-data-conversion.com/article-reading-cobol-layouts-4.html -- Sent from my phone. Please excuse my brevity. On August 28, 2016 7:26:24 AM PDT, jim holtman wrote: >Here is an attempt at parsing the data. It is fixed field so the >regular >expression will extract the data. Some does not seem to make sense >since >it has curly brackets in the data. > > >Jim Holtman >Data Munger Guru > >What is the problem that you are trying to solve? >Tell me what you want to do, not how you want to do it. > >On Sun, Aug 28, 2016 at 8:49 AM, Glenn Schultz >wrote: > >> Hi Jim, >> >> Attached is the layout of the file I would like to parse with dput >sample >> of the data. From the layout it seems to me there are two sets in >the data >> Header and Details. I would like to either parse such that >> >> >>- I have either 1 comma delimited file of all data or >>- 2 comma delimited files one of header the other of details >> >> >> I have never seen a file layout described in the manner before. >> Consequently, I am a little confused as to how to work with the file. >> >> Best, >> Glenn >> >> "1176552 CL20031031367RBV319920901 >> >> >> 217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D2 >> 2D13C13C13C13C13C13C604000{604000{604000{604 >> >000{604000{604000{36{36{36{36{36{36{08500{08500{08500{08500{08500{ >> 08500{1254240 CL20031031371KLV120020201 >> >> >> 225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{3 >> 4A02A01I02{02{02A03B0001121957C123500{92{0001280 >> >000{0001741000{0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{ >> 07000{1254253 CL20031031371KMA620020301 >> >> >> 225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{3 >> 4A02{01I02{02{02A02C946646A35{85{0001030 >> >000{0001205000{000130{35H30{36{36{36{36{06000{06000{06000{06000{06000{ >> 06000{1259455 CL20031031371RE4420020501 >> >> >> 225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B3 >> 4C01H01G01H01H01H02C93E36{765000{995 >> >000{0001384000{0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{ >> 06500{1261060 CI20031031371S5V219940101 >> >> >> 226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B0 >> 6B11H11G11G11H11H11I0001169090I65{95{0001250 >> >000{0001328000{000190{18{18{18{18{18{18{06000{06000{06000{06000{06000{ >> 06000{1335271 CI20031031375HMU519960101 >> >> >> 233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F0 >> 8F09D09D09D09D09E09E717375{464000{55{770 >> >000{0001085500{0001085500{18{18{18{18{18{18{07000{07000{07000{07000{07000{ >> 07000{1440840 CL20031031380HV9519981101 >> >> >> 244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{3 >> 0A06{05I06{06{06{06A615172I25{621000{ >> 673000{75{791000{36{36{36{36{36{36{06000{ >> 06000{06000{06000{06000{06000{1521993 CI20031031384E3A62101 >> >> >>252199306937{06875{06875{06875{07000{07000{12H02H12H13{13D1 >> 3E04E04E04E04E04F04F0001129428F70{955000{0001000 >> 000{0002087000{0002087000{18{18{18{18{18{18{06500{06500{0650 >> 0{06500{06500{06500{1538080 CL20031031384YXH42501 >> >> >> 253808008875{08875{08875{08875{08875{08875{31I31I31I31I31I3 >> 1I04A04A04A04A04A04A0001419300{0001419300{0001419300{0001419 >> >300{0001419300{0001419300{36{36{36{36{36{36{07000{07000{07000{07000{07000{ >> 07000{1659123 CI20031031390XG8720020801 >> >> >> 265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F1 >> 6F01E01D01D01E01E01G998541G162000{792000{0001156 >> >500{000160{000199{18{18{18{18{18{18{06000{06000{06000{06000{06000{ >> 06000{" >> > > > > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package.skeleton, environment argument causes error
It would be helpful for us if you provide a reproducible examples when the current package.skeleton fails. Best, Uwe Ligges On 19.08.2016 00:12, Jacob Strunk wrote: Hello, I have been using package.skeleton from within an lapply statement successfully (assuming good source code) with the following setup in the past: x=try(package.skeleton(package_name,path=pathi,code_files=file_i)) but now fails with error: Error: evaluation nested too deeply: infinite recursion / options(expressions=)? I am working in RStudio Version 0.99.896, with 64 bit R version 3.3.1 (2016-06-21) I have been probing the code for package.skeleton a bit and noticed that the default arguments for 'list' and 'environment' are supplied in the function definition, thus making it impossible to achieve the conditions envIsMissing=TRUE missing(list) = TRUE as a result of the fact that missing(list) cannot be true, the classesList argument is empty and the call classes0 <- .fixPackageFileNames(classesList) then fails with the error Error: evaluation nested too deeply: infinite recursion / options(expressions=)? If I remove the default arguments I get further, but get the same error I had before (Error: evaluation nested too deeply: infinite recursion / options(expressions=)?) after executing the following code: methods0 <- .fixPackageFileNames(methodsList) and the contents of methodsList look like An object of class "ObjectsWithPackage": Object: Package: the function .fixPackageFileNames fails when it reaches list <- as.character(list) where in this case the contents of 'list' look like str(list) Formal class 'ObjectsWithPackage' [package "methods"] with 2 slots ..@ .Data : chr(0) ..@ package: chr(0) I am not sure if the problem arose from changes to package.skeleton or methods::getClasses and methods::getGenerics or if there is something peculiar about my environment. my current ugly fix is to define the function .fixPackageFileNames in the global environment and add a try statement and exit when it results in an object of class "try-error": .fixPackageFileNames= function (list) { list <- *try(*as.character(list)*)* *if(class(list)=="try-error")return(list)* if (length(list) == 0L) return(list) list0 <- gsub("[[:cntrl:]\"*/:<>?\\|]", "_", list) wrong <- grep("^(con|prn|aux|clock\\$|nul|lpt[1-3]|com[1-4])(\\..*|)$", list0) if (length(wrong)) list0[wrong] <- paste0("zz", list0[wrong]) ok <- grepl("^[[:alnum:]]", list0) if (any(!ok)) list0[!ok] <- paste0("z", list0[!ok]) list1 <- tolower(list0) list2 <- make.unique(list1, sep = "_") changed <- (list2 != list1) list0[changed] <- list2[changed] list0 } Any assistance with this error would be greatly appreciated! Thank you, __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] package.skeleton fails
Your code works for me, and I do not see any lapply in the example you provide below. Best, Uwe Ligges On 24.08.2016 21:21, Strunk, Jacob (DNR) wrote: Hello, I have been using package.skeleton from within an lapply statement successfully (assuming good source code) with the following setup in the past: writeLines("testfun=function(){}", "c:\\temp\\testfun.r") x=try(package.skeleton("test_pack",path="c:\\temp\\tests\\",code_files= "c:\\temp\\testfun.r")) but it now fails with the error: Error: evaluation nested too deeply: infinite recursion / options(expressions=)? I am working in RStudio Version 0.99.896, with 64 bit R version 3.3.1 (2016-06-21) I have been poking in the code and the error appears happen within the subfunction '.fixPackageFileNames' Thanks for any assistance you might be able to provide. Jacob [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parsing the file
Here is an attempt at parsing the data. It is fixed field so the regular expression will extract the data. Some does not seem to make sense since it has curly brackets in the data. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sun, Aug 28, 2016 at 8:49 AM, Glenn Schultz wrote: > Hi Jim, > > Attached is the layout of the file I would like to parse with dput sample > of the data. From the layout it seems to me there are two sets in the data > Header and Details. I would like to either parse such that > > >- I have either 1 comma delimited file of all data or >- 2 comma delimited files one of header the other of details > > > I have never seen a file layout described in the manner before. > Consequently, I am a little confused as to how to work with the file. > > Best, > Glenn > > "1176552 CL20031031367RBV319920901 > > > 217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D2 > 2D13C13C13C13C13C13C604000{604000{604000{604 > 000{604000{604000{36{36{36{36{36{36{08500{08500{08500{08500{08500{ > 08500{1254240 CL20031031371KLV120020201 > > > 225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{3 > 4A02A01I02{02{02A03B0001121957C123500{92{0001280 > 000{0001741000{0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{ > 07000{1254253 CL20031031371KMA620020301 > > > 225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{3 > 4A02{01I02{02{02A02C946646A35{85{0001030 > 000{0001205000{000130{35H30{36{36{36{36{06000{06000{06000{06000{06000{ > 06000{1259455 CL20031031371RE4420020501 > > > 225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B3 > 4C01H01G01H01H01H02C93E36{765000{995 > 000{0001384000{0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{ > 06500{1261060 CI20031031371S5V219940101 > > > 226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B0 > 6B11H11G11G11H11H11I0001169090I65{95{0001250 > 000{0001328000{000190{18{18{18{18{18{18{06000{06000{06000{06000{06000{ > 06000{1335271 CI20031031375HMU519960101 > > > 233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F0 > 8F09D09D09D09D09E09E717375{464000{55{770 > 000{0001085500{0001085500{18{18{18{18{18{18{07000{07000{07000{07000{07000{ > 07000{1440840 CL20031031380HV9519981101 > > > 244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{3 > 0A06{05I06{06{06{06A615172I25{621000{ > 673000{75{791000{36{36{36{36{36{36{06000{ > 06000{06000{06000{06000{06000{1521993 CI20031031384E3A62101 > > >252199306937{06875{06875{06875{07000{07000{12H02H12H13{13D1 > 3E04E04E04E04E04F04F0001129428F70{955000{0001000 > 000{0002087000{0002087000{18{18{18{18{18{18{06500{06500{0650 > 0{06500{06500{06500{1538080 CL20031031384YXH42501 > > > 253808008875{08875{08875{08875{08875{08875{31I31I31I31I31I3 > 1I04A04A04A04A04A04A0001419300{0001419300{0001419300{0001419 > 300{0001419300{0001419300{36{36{36{36{36{36{07000{07000{07000{07000{07000{ > 07000{1659123 CI20031031390XG8720020801 > > > 265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F1 > 6F01E01D01D01E01E01G998541G162000{792000{0001156 > 500{000160{000199{18{18{18{18{18{18{06000{06000{06000{06000{06000{ > 06000{" > require(stringr) # input data data2 <- c("1176552 CL20031031367RBV319920901 217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D22D13C13C13C13C13C13C604000{604000{604000{604000{604000{604000{36{36{36{36{36{36{08500{08500{08500{08500{08500{08500{", "1254240 CL20031031371KLV120020201 225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{34A02A01I02{02{02A03B0001121957C123500{92{000128{0001741000{0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{07000{", "1254253 CL20031031371KMA620020301 225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{34A02{01I02{02{02A02C946646A35{85{000103{0001205000{000130{35H30{36{36{36{36{06000{06000{06000{06000{06000{06000{", "1259455 CL20031031371RE4420020501 225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B34C01H01G01H01H01H02C93E36{765000{995000{0001384000{0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{06500{", "1261060 CI20031031371S5V219940101 226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B06B11H11G11G11H11H11I0001169090I65{95{000125{0001328000{000190{18{18{18{18{18{18{06000{06000{06000{06000{06000{06000{", "1335271 CI20031031375HMU519960101 233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F08F09D09D09D09D09E09E717375{464000{55{77{0001085500{0001085500{18{18{18{18{18{18{07000{07000{07000{07000{07000{07000{", "1440840 CL20031031380HV9519981101 244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{30A06{05I06{06{06{06A615172I0