Re: [R] Extract row form a dataframe by row names in another vector and factor . Need explanation

2016-03-03 Thread Mohammad Tanvir Ahamed via R-help
Dear Dennis

Thank you very much for your detail reply . It was really helpful to understand.

 
Tanvir Ahamed 
Göteborg, Sweden  |  mashra...@yahoo.com 




From: Dennis Murphy 

Sent: Thursday, 3 March 2016, 4:38
Subject: Re: [R] Extract row form a dataframe by row names in another vector 
and factor . Need explanation


Welcome to the wonderful world of factors. In your second case, v2,
the vector is character, so R matches the input character string to
the lookup table of row names. OTOH, v1 is a factor - it behaves
differently when used for subsetting, and this example illustrates why
you shouldn't use them for this purpose. Let's look at it:

> v1
[1] f g h i j
Levels: f g h i j
> str(v1)
Factor w/ 5 levels "f","g","h","i",..: 1 2 3 4 5
> levels(v1)
[1] "f" "g" "h" "i" "j"
> as.integer(v1)
[1] 1 2 3 4 5
> str(levels(v1))
chr [1:5] "f" "g" "h" "i" "j"

When you used v1 to subset rows, it uses the labels of the factor for
subsetting. Since these were not set, R defaults to the factor's
underlying numeric codes. This is why res1 selected the first five
observations. These alternatives do what you want:

dat[levels(v1), ]
dat[as.character(v1), ]# behaves like v2 (an atomic vector)

# Another approach: define a factor with appropriate labels:

x <- as.character(dat1[, "BB"])
v3 <- factor(x, levels = unique(x), labels = unique(x))
dat[v3, ]

There are a couple alternative avenues you could have chosen (e.g.,
match() or which()), but they are overkill for this simple case.


Your real problem was converting a character matrix to a data frame in
the first place - this converted all of the columns to factors with
different sets of levels:

str(dat1)

This illustrates one of the important differences between data frames
and matrices. In a matrix, every element must be of the same class.
Specifically, a matrix is an atomic vector with a 'dim' attribute. In
contrast, each _column_ of a data frame must have elements of the same
class, but they do not have to be the same class from one column to
the next.

One way to have avoided the conversion to factor would have been to
use the argument stringsAsFactors = FALSE in the data.frame() call -
by default, it is TRUE. More importantly, the conversion to data frame
for dat1 was unnecessary - observe:

> dat1<-matrix(letters[1:20],ncol=4)
> colnames(dat1)<-c("AA","BB","CC","DD")
> dat[dat1[, "BB"], ]
  SA1 SA2 SA3 SA4 SA5
f   6  16  26  36  46
g   7  17  27  37  47
h   8  18  28  38  48
i   9  19  29  39  49
j  10  20  30  40  50

For the same reason, it was unnecessary to convert dat to a data
frame. Let's look at a matrix version instead:

dat2 <- matrix(seq(50), nrow = 10)
rownames(dat2) <- letters[1:10]
colnames(dat2) <- paste0("SA", 1:5)

dat2[dat1[, "BB"], ] # desired result

Hint: You might want to spend some time to carefully learn the
different major data types in R and the various modes of indexing. In
general, it is not a good default practice to convert matrices to data
frames.

Dennis


On Wed, Mar 2, 2016 at 6:05 PM, Mohammad Tanvir Ahamed via R-help
 wrote:
> Hi,Here i have written an example to explain my problem
> ## Data Generationdat<-data.frame(matrix(1:50,ncol=5))
> rownames(dat)<-letters[1:10]
> colnames(dat)<- c("SA1","SA2","SA3","SA4","SA5")
>
> dat1<-data.frame(matrix(letters[1:20],ncol=4))
> colnames(dat1)<-c("AA","BB","CC","DD")
>
> ## Row names
> v1<-dat1[,"BB"]   # Factor
> v2<-as.vector(dat1[,"BB"])  # Vector
>
> is(v1) # Factor
> is(v2) # Vector
>
> # Result
> res1<-dat[v1,]
> res2<-dat[v2,]
> ##i assumed res1 and 
> res2 are same . but it is not . Can any body please explain why ?
>
>
> Tanvir Ahamed

> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extract row form a dataframe by row names in another vector and factor . Need explanation

2016-03-02 Thread Jeff Newmiller
Posting in HTML makes a mess of your code... learn to post in plain text. 

Not sure what you thought you would accomplish by using as.vector... perhaps 
you should read the help file ?as.vector.

Did you look at str( dat1 )?

Factors are more closely akin to integers than to characters. Indexing with 
factors is the same as indexing with the integers that factors are implemented 
with (see the discussion of factors in the Introduction to R document that 
comes with R) than indexing with character strings. If you want character 
indexing,  use character vectors. 

Advice: 98% of the time making a data frame using a matrix causes more damage 
than help. Data frames are a list of columns, each of which potentially has its 
own storage mode. Matrices are all one type, implemented as a folded vector. 
Use named arguments to the data.frame function instead. Also use the 
stringsAsFactors = FALSE argument to data.frame unless you know you won't want 
character strings. 
-- 
Sent from my phone. Please excuse my brevity.

On March 2, 2016 6:05:37 PM PST, Mohammad Tanvir Ahamed via R-help 
 wrote:
>Hi,Here i have written an example to explain my problem
>## Data Generationdat<-data.frame(matrix(1:50,ncol=5))
>rownames(dat)<-letters[1:10]
>colnames(dat)<- c("SA1","SA2","SA3","SA4","SA5")
>
>dat1<-data.frame(matrix(letters[1:20],ncol=4))
>colnames(dat1)<-c("AA","BB","CC","DD")
>
>## Row names
>v1<-dat1[,"BB"]                   # Factor
>v2<-as.vector(dat1[,"BB"])  # Vector
>
>is(v1) # Factor
>is(v2) # Vector
>
># Result
>res1<-dat[v1,]
>res2<-dat[v2,]
>##i assumed
>res1 and res2 are same . but it is not . Can any body please explain
>why ? 
> 
> 
>Tanvir Ahamed 
>Göteborg, Sweden  | mashra...@yahoo.com
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.