Re: [R] Unexpected behaviour as.data.frame
Santosh, Ivan, This is also what I was looking for. Thanks. Looking at the source of dataFrame.default is seems that it uses the same approach as I did: first create a list then a data.frame from that list. I think I'll stick with the code I already had as I don't want another dependency (multiple actually for R.utils). But thanks again for pointing it out. Jan On 05/16/2011 10:42 AM, Santosh Srinivas wrote: Hi Ivan, Take a look dataFrame in R.utils ... is that what you want? from the help file: Examples df<- dataFrame(colClasses=c(a="integer", b="double"), nrow=10) df[,1]<- sample(1:nrow(df)) df[,2]<- rnorm(nrow(df)) print(df) Thanks, Santosh On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra wrote: I feel like I'm always asking this type of questions, but is it possible to add a base function that allows creating an empty data.frame, as matrix() does? What I mean would be something like: create.data.frame(number_of_columns, mode_of_columns). I think it would make things easier than creating one or several matrices and then combining them Is it possible; does it make sense? Ivan Le 5/15/2011 22:17, Bert Gunter a écrit : Inline below. On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan wrote: Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? I do not know, but why is simply data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE) not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like: z<- matrix(numeric(5000),nr=10) u<- matrix(character(1000),nr=10) frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see. However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight. Cheers, Bert Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: I use the following code to create two data.frames d1 and d2 from a list: types<- c("integer", "character", "double") nlines<- 10 d1<- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2<- lapply(types, do.call, list(nlines)) d2<- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY
Re: [R] Unexpected behaviour as.data.frame
Forget this last email, I oversaw the implementation in the examples... Ivan Le 5/16/2011 11:35, Ivan Calandra a écrit : Actually, what would be even better would be an extra argument to specify the column names. I don't think it's very difficult to implement and it would make things even easier. Ivan Le 5/16/2011 11:25, Ivan Calandra a écrit : Thanks Santosh! The more I learn about R.utils, the more I think that many of its functions should be included in the base distribution. Ivan Le 5/16/2011 10:42, Santosh Srinivas a écrit : Hi Ivan, Take a look dataFrame in R.utils ... is that what you want? from the help file: Examples df<- dataFrame(colClasses=c(a="integer", b="double"), nrow=10) df[,1]<- sample(1:nrow(df)) df[,2]<- rnorm(nrow(df)) print(df) Thanks, Santosh On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra wrote: I feel like I'm always asking this type of questions, but is it possible to add a base function that allows creating an empty data.frame, as matrix() does? What I mean would be something like: create.data.frame(number_of_columns, mode_of_columns). I think it would make things easier than creating one or several matrices and then combining them Is it possible; does it make sense? Ivan Le 5/15/2011 22:17, Bert Gunter a écrit : Inline below. On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan wrote: Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? I do not know, but why is simply data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE) not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like: z<- matrix(numeric(5000),nr=10) u<- matrix(character(1000),nr=10) frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see. However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight. Cheers, Bert Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: I use the following code to create two data.frames d1 and d2 from a list: types<- c("integer", "character", "double") nlines<- 10 d1<- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2<- lapply(types, do.call, list(nlines)) d2<- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Iva
Re: [R] Unexpected behaviour as.data.frame
Actually, what would be even better would be an extra argument to specify the column names. I don't think it's very difficult to implement and it would make things even easier. Ivan Le 5/16/2011 11:25, Ivan Calandra a écrit : Thanks Santosh! The more I learn about R.utils, the more I think that many of its functions should be included in the base distribution. Ivan Le 5/16/2011 10:42, Santosh Srinivas a écrit : Hi Ivan, Take a look dataFrame in R.utils ... is that what you want? from the help file: Examples df<- dataFrame(colClasses=c(a="integer", b="double"), nrow=10) df[,1]<- sample(1:nrow(df)) df[,2]<- rnorm(nrow(df)) print(df) Thanks, Santosh On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra wrote: I feel like I'm always asking this type of questions, but is it possible to add a base function that allows creating an empty data.frame, as matrix() does? What I mean would be something like: create.data.frame(number_of_columns, mode_of_columns). I think it would make things easier than creating one or several matrices and then combining them Is it possible; does it make sense? Ivan Le 5/15/2011 22:17, Bert Gunter a écrit : Inline below. On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan wrote: Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? I do not know, but why is simply data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE) not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like: z<- matrix(numeric(5000),nr=10) u<- matrix(character(1000),nr=10) frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see. However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight. Cheers, Bert Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: I use the following code to create two data.frames d1 and d2 from a list: types<- c("integer", "character", "double") nlines<- 10 d1<- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2<- lapply(types, do.call, list(nlines)) d2<- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3
Re: [R] Unexpected behaviour as.data.frame
Thanks Santosh! The more I learn about R.utils, the more I think that many of its functions should be included in the base distribution. Ivan Le 5/16/2011 10:42, Santosh Srinivas a écrit : Hi Ivan, Take a look dataFrame in R.utils ... is that what you want? from the help file: Examples df<- dataFrame(colClasses=c(a="integer", b="double"), nrow=10) df[,1]<- sample(1:nrow(df)) df[,2]<- rnorm(nrow(df)) print(df) Thanks, Santosh On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra wrote: I feel like I'm always asking this type of questions, but is it possible to add a base function that allows creating an empty data.frame, as matrix() does? What I mean would be something like: create.data.frame(number_of_columns, mode_of_columns). I think it would make things easier than creating one or several matrices and then combining them Is it possible; does it make sense? Ivan Le 5/15/2011 22:17, Bert Gunter a écrit : Inline below. On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan wrote: Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? I do not know, but why is simply data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE) not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like: z<- matrix(numeric(5000),nr=10) u<- matrix(character(1000),nr=10) frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see. However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight. Cheers, Bert Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: I use the following code to create two data.frames d1 and d2 from a list: types<- c("integer", "character", "double") nlines<- 10 d1<- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2<- lapply(types, do.call, list(nlines)) d2<- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing l
Re: [R] Unexpected behaviour as.data.frame
Hi Ivan, Take a look dataFrame in R.utils ... is that what you want? from the help file: Examples df <- dataFrame(colClasses=c(a="integer", b="double"), nrow=10) df[,1] <- sample(1:nrow(df)) df[,2] <- rnorm(nrow(df)) print(df) Thanks, Santosh On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra wrote: > I feel like I'm always asking this type of questions, but is it possible to > add a base function that allows creating an empty data.frame, as matrix() > does? > > What I mean would be something like: > create.data.frame(number_of_columns, mode_of_columns). > I think it would make things easier than creating one or several matrices > and then combining them > > Is it possible; does it make sense? > > Ivan > > Le 5/15/2011 22:17, Bert Gunter a écrit : >> >> Inline below. >> >> On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan >> wrote: >>> >>> Thanks. I also noticed myself minutes after sending my message to the >>> list. >>> My 'please ignore my question it was just a stupid typo' message was sent >>> with the wrong account and is now awaiting moderation. >>> >>> However, my other question still stands: what is the >>> preferred/fastest/simplest way to create a data.fame with given column >>> types >>> and dimensions? >> >> I do not know, but why is simply >> >> data.frame(numeric(10), character(10), integer(10), >> stringsAsFactors=FALSE) >> >> not acceptable? Note that if you had, say, 500, numeric (= double) and >> 100 character columns to add, you might do something like: >> >>> z<- matrix(numeric(5000),nr=10) >>> u<- matrix(character(1000),nr=10) >>> frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns >> >> While this might save some typing, it may not be much more efficient >> than typing it all out -- maybe just some parsing time is saved. You >> can experiment and see. >> >> However, since a data.frame **is** a list with added attributes and a >> great deal of the work of the constructor is in constructing and >> checking these attributes (e.g. row and column names), I see nothing >> terribly inefficient with what you did. It's just a bit obscure. But >> maybe someone with greater expertise will set us both straight. >> >> Cheers, >> Bert >> >> >>> Regards, >>> Jan >>> >>> >>> On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: > > I use the following code to create two data.frames d1 and d2 from a > list: > types<- c("integer", "character", "double") > nlines<- 10 > d1<- as.data.frame(lapply(types, do.call, list(nlines)), > stringsAsFactor=FALSE) > l2<- lapply(types, do.call, list(nlines)) > d2<- as.data.frame(l2, stringsAsFactors=FALSE) > > I would expect d1 and d2 to be the same, however, in d1 the second > column > is > a factor while in d2 it is a character (which I would expect): > >> str(d1) > > 'data.frame': 10 obs. of 3 variables: > $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 > $ c: Factor w/ 1 level "": 1 1 > 1 > 1 > 1 1 1 1 1 1 > $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 >> >> str(d2) > > 'data.frame': 10 obs. of 3 variables: > $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 > $ c: chr "" "" "" "" ... > $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 > > > As different but related question: I use the commands above to create > an > 'empty' data.frame with specified column types and dimensions. I need > this > data.frame to pass on to my c++ routines. Is there a more > simple/elegant > way > of creating this data.frame? > > Regards, > > Jan > > > PS: > I am running R on 64 bit Ubuntu 11.04: > >> sessionInfo() > > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained
Re: [R] Unexpected behaviour as.data.frame
I feel like I'm always asking this type of questions, but is it possible to add a base function that allows creating an empty data.frame, as matrix() does? What I mean would be something like: create.data.frame(number_of_columns, mode_of_columns). I think it would make things easier than creating one or several matrices and then combining them Is it possible; does it make sense? Ivan Le 5/15/2011 22:17, Bert Gunter a écrit : Inline below. On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan wrote: Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? I do not know, but why is simply data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE) not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like: z<- matrix(numeric(5000),nr=10) u<- matrix(character(1000),nr=10) frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see. However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight. Cheers, Bert Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: I use the following code to create two data.frames d1 and d2 from a list: types<- c("integer", "character", "double") nlines<- 10 d1<- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2<- lapply(types, do.call, list(nlines)) d2<- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
Inline below. On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan wrote: > Thanks. I also noticed myself minutes after sending my message to the list. > My 'please ignore my question it was just a stupid typo' message was sent > with the wrong account and is now awaiting moderation. > > However, my other question still stands: what is the > preferred/fastest/simplest way to create a data.fame with given column types > and dimensions? I do not know, but why is simply data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE) not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like: > z <- matrix(numeric(5000),nr=10) > u <- matrix(character(1000),nr=10) > frm <- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see. However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight. Cheers, Bert > > Regards, > Jan > > > On 05/15/2011 04:43 PM, Bert Gunter wrote: >> >> In your post, you're missing the final "s" on the stringsAsFactors >> argument in the d1 assignment. When I typed it correctly, it works as >> expected. >> >> -- Bert >> >> On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan >> wrote: >>> >>> I use the following code to create two data.frames d1 and d2 from a list: >>> types<- c("integer", "character", "double") >>> nlines<- 10 >>> d1<- as.data.frame(lapply(types, do.call, list(nlines)), >>> stringsAsFactor=FALSE) >>> l2<- lapply(types, do.call, list(nlines)) >>> d2<- as.data.frame(l2, stringsAsFactors=FALSE) >>> >>> I would expect d1 and d2 to be the same, however, in d1 the second column >>> is >>> a factor while in d2 it is a character (which I would expect): >>> str(d1) >>> >>> 'data.frame': 10 obs. of 3 variables: >>> $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 >>> $ c: Factor w/ 1 level "": 1 1 1 >>> 1 >>> 1 1 1 1 1 1 >>> $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) >>> >>> 'data.frame': 10 obs. of 3 variables: >>> $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 >>> $ c: chr "" "" "" "" ... >>> $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 >>> >>> >>> As different but related question: I use the commands above to create an >>> 'empty' data.frame with specified column types and dimensions. I need >>> this >>> data.frame to pass on to my c++ routines. Is there a more simple/elegant >>> way >>> of creating this data.frame? >>> >>> Regards, >>> >>> Jan >>> >>> >>> PS: >>> I am running R on 64 bit Ubuntu 11.04: >>> sessionInfo() >>> >>> R version 2.12.1 (2010-12-16) >>> Platform: x86_64-pc-linux-gnu (64-bit) >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: I use the following code to create two data.frames d1 and d2 from a list: types<- c("integer", "character", "double") nlines<- 10 d1<- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2<- lapply(types, do.call, list(nlines)) d2<- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
In your post, you're missing the final "s" on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan wrote: > I use the following code to create two data.frames d1 and d2 from a list: > types <- c("integer", "character", "double") > nlines <- 10 > d1 <- as.data.frame(lapply(types, do.call, list(nlines)), > stringsAsFactor=FALSE) > l2 <- lapply(types, do.call, list(nlines)) > d2 <- as.data.frame(l2, stringsAsFactors=FALSE) > > I would expect d1 and d2 to be the same, however, in d1 the second column is > a factor while in d2 it is a character (which I would expect): > >> str(d1) > > 'data.frame': 10 obs. of 3 variables: > $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 > $ c: Factor w/ 1 level "": 1 1 1 1 > 1 1 1 1 1 1 > $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 >> >> str(d2) > > 'data.frame': 10 obs. of 3 variables: > $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 > $ c: chr "" "" "" "" ... > $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 > > > As different but related question: I use the commands above to create an > 'empty' data.frame with specified column types and dimensions. I need this > data.frame to pass on to my c++ routines. Is there a more simple/elegant way > of creating this data.frame? > > Regards, > > Jan > > > PS: > I am running R on 64 bit Ubuntu 11.04: > >> sessionInfo() > > R version 2.12.1 (2010-12-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected behaviour as.data.frame
I use the following code to create two data.frames d1 and d2 from a list: types <- c("integer", "character", "double") nlines <- 10 d1 <- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2 <- lapply(types, do.call, list(nlines)) d2 <- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
Forget I asked. There was a typo in my example (stringsAsFactor instead of stringAsFactors) which explained the difference. My apologies. My second question however still stands: How does on create a data.frame with given column types and given dimensions? Thanks. Regards, Jan Quoting Jan van der Laan : I use the following code to create two data.frames d1 and d2 from a list: types <- c("integer", "character", "double") nlines <- 10 d1 <- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2 <- lapply(types, do.call, list(nlines)) d2 <- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr "" "" "" "" ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.