Re: [R] Different results when converting a matrix to a data.frame

2016-11-16 Thread David Winsemius

> On Nov 16, 2016, at 8:43 AM, Jeff Newmiller  wrote:
> 
> I will start by admitting I don't know the answer to your question.
> 
> However, I am responding because I think this should not be an issue in real 
> life use of R. Data frames are lists of distinct vectors, each of which has 
> its own reason for being present in the data, and normally each has its own 
> storage mode. Your use of a matrix as a short cut way to create many columns 
> at once does not change this fundamental difference between data frames and 
> matrices. You should not be surprised that putting the finishing touches on 
> this transformation takes some personal attention. 
> 
> Normally you should give explicit names to each column using the argument 
> names in the data.frame function. When using a matrix as a shortcut, you 
> should either immediately follow the creation of the data frame with a 
> names(DF)<- assignment, or wrap it in a setNames function call. 
> 
> setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )
> 
> Note that using a matrix to create many columns is memory inefficient, 
> because you start by setting aside a single block of memory (the matrix) and 
> then you move that data column at a time to separate vectors for use in the 
> data frame. If working with large data you might want to consider allocating 
> each column separately from the beginning. 
> 
> N <- 2
> nms <- c( "A", "B" )
> as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )
> 
> which is not as convenient, but illustrates that data frames are truly 
> different than matrices.
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On November 16, 2016 7:20:38 AM PST, g.maub...@weinwolf.de wrote:
>> Hi All,
>> 
>> I build an empty dataframe to fill it will values later. I did the 
>> following:
>> 
>> -- cut --
>> matrix(NA, 2, 2)
>>[,1] [,2]
>> [1,]   NA   NA
>> [2,]   NA   NA
>>> data.frame(matrix(NA, 2, 2))
>> X1 X2
>> 1 NA NA
>> 2 NA NA
>>> as.data.frame(matrix(NA, 2, 2))
>> V1 V2
>> 1 NA NA
>> 2 NA NA
>> -- cut --
>> 
>> Why does data.frame deliver different results than as.data.frame with 
>> regard to the variable names (V instead of X)?

They are two different functions:

It's fairly easy to see by looking at the code:

as.data.frame.matrix uses: names(value) <- paste0("V", ic)  when there are no 
column names and data.frame calls make.names which prepends an "X" as the first 
letter of invalid or missing names.


As to why the authors did it this way, I'm unable to comment.

>> 
>> Kind regards
>> 
>> Georg
>> 
>>  [[alternative HTML version deleted]]


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results when converting a matrix to a data.frame

2016-11-16 Thread Jeff Newmiller
I will start by admitting I don't know the answer to your question.

However, I am responding because I think this should not be an issue in real 
life use of R. Data frames are lists of distinct vectors, each of which has its 
own reason for being present in the data, and normally each has its own storage 
mode. Your use of a matrix as a short cut way to create many columns at once 
does not change this fundamental difference between data frames and matrices. 
You should not be surprised that putting the finishing touches on this 
transformation takes some personal attention. 

Normally you should give explicit names to each column using the argument names 
in the data.frame function. When using a matrix as a shortcut, you should 
either immediately follow the creation of the data frame with a names(DF)<- 
assignment, or wrap it in a setNames function call. 

setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )

Note that using a matrix to create many columns is memory inefficient, because 
you start by setting aside a single block of memory (the matrix) and then you 
move that data column at a time to separate vectors for use in the data frame. 
If working with large data you might want to consider allocating each column 
separately from the beginning. 

N <- 2
nms <- c( "A", "B" )
as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )

which is not as convenient, but illustrates that data frames are truly 
different than matrices.
-- 
Sent from my phone. Please excuse my brevity.

On November 16, 2016 7:20:38 AM PST, g.maub...@weinwolf.de wrote:
>Hi All,
>
>I build an empty dataframe to fill it will values later. I did the 
>following:
>
>-- cut --
>matrix(NA, 2, 2)
> [,1] [,2]
>[1,]   NA   NA
>[2,]   NA   NA
>> data.frame(matrix(NA, 2, 2))
>  X1 X2
>1 NA NA
>2 NA NA
>> as.data.frame(matrix(NA, 2, 2))
>  V1 V2
>1 NA NA
>2 NA NA
>-- cut --
>
>Why does data.frame deliver different results than as.data.frame with 
>regard to the variable names (V instead of X)?
>
>Kind regards
>
>Georg
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Different results when converting a matrix to a data.frame

2016-11-16 Thread G . Maubach
Hi All,

I build an empty dataframe to fill it will values later. I did the 
following:

-- cut --
matrix(NA, 2, 2)
 [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> data.frame(matrix(NA, 2, 2))
  X1 X2
1 NA NA
2 NA NA
> as.data.frame(matrix(NA, 2, 2))
  V1 V2
1 NA NA
2 NA NA
-- cut --

Why does data.frame deliver different results than as.data.frame with 
regard to the variable names (V instead of X)?

Kind regards

Georg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.