Re: [R] combining data.frames with is.na & match (), two questions
Hi Keep posts also to r-help, others could give you different/better solutions. Regarding ordering, see ?order or ?sort. However this is mainly necessary only for plotting or exporting data. Cheers Petr From: Drake Gossi Sent: Thursday, April 18, 2019 9:27 PM To: PIKAL Petr Subject: Re: [R] combining data.frames with is.na & match (), two questions Thanks Pikal, Your answer was super helpful. I just learned a lot from you. The only thing I have to figure out now is how to rearrange the numbers, say, so that 200 is on top, and NA is on bottom, or so that the two 100 calories are together. Something like that. Perhaps I'll try an ascending/descending function. Thank you again. D On Thu, Apr 18, 2019 at 1:31 AM PIKAL Petr mailto:petr.pi...@precheza.cz>> wrote: Hi I wonder why such combination is so complicated in your text book. Having data frames fr1 and fr2 > dput(fr1) structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana", "mango", "pear"), class = "factor"), Calories = c(100L, 100L, 200L)), class = "data.frame", row.names = c("1", "2", "3")) > dput(fr2) structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple", "banana", "kiwi", "orange", "pear"), class = "factor"), Color = structure(c(3L, 4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow" ), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label = c("oblong", "pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1, 0)), class = "data.frame", row.names = c("1", "2", "3", "4", "5")) > > fr1 Fruit Calories 1 banana 100 2 pear 100 3 mango 200 > you can use merge to combine those 2 data frames to get either all values from both > merge(fr2, fr1, all=T) Fruit Color Shape Juice Calories 1 applered round 1.0 NA 2 banana yellow oblong 0.0 100 3 kiwi green round 0.0 NA 4 orange orange round 1.0 NA 5 pear green pear 0.5 100 6 mango NA 200 just values from data frame with calories > merge(fr2, fr1, all.y=T) Fruit Color Shape Juice Calories 1 banana yellow oblong 0.0 100 2 pear green pear 0.5 100 3 mango NA 200 or just values from data frame with colours > merge(fr2, fr1, all.x=T) Fruit Color Shape Juice Calories 1 applered round 1.0 NA 2 banana yellow oblong 0.0 100 3 kiwi green round 0.0 NA 4 orange orange round 1.0 NA 5 pear green pear 0.5 100 Cheers Petr > -Original Message- > From: R-help > mailto:r-help-boun...@r-project.org>> On Behalf > Of Drake Gossi > Sent: Thursday, April 18, 2019 1:24 AM > To: r-help@r-project.org<mailto:r-help@r-project.org> > Subject: [R] combining data.frames with is.na<http://is.na> & match (), two > questions > > Hello everyone, > > I'm working through this book, *Humanities Data in R* (Arnold & Tilton), and > I'm just having trouble understanding this maneuver. > > In sum, I'm trying to combine data in two different data.frames. > > This data.frame is called fruitNutr > > Fruit Calories > 1 banana 100 > 2 pear 100 > 3 mango 200 > > And this data.frame is called fruitData > > Fruit Color Shape Juice > 1 apple red round 1 > 2 banana yellow oblong 0 > 3 pear green pear 0.5 > 4 orange orange round 1 > 5 kiwi green round 0 > > So, as you can see, these two data.frames overlap insofar as they both have > banana and pear. So, what happens next is the book suggests this: > > fruitData$calories <- NA > > > As a result, I've created a new column for the fruitData data.frame: > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0N/A > 3 pear green pear 0.5N/A > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > Then: > > > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index > [1]NA 1 2 NA NA > > is.na<http://is.na>(index) > [1]TRUE FALSEFALSE TRUETRUE > > fruitData$Calories [!is.na<http://is.na>(index)] <- > > fruitNutr$Calories[index[!is.na<http://is.na> > (index)]] > > fruitData > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0 100 > 3 pear green pear 0.5 100 > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > I get what the first
Re: [R] combining data.frames with is.na & match (), two questions
Hi Drake, Petr's suggestion to use the merge() function is good. Another (possibly overkill) approach is to use functions from the dplyr package, which is a fantastic package to get familiar with. For example, the last alternative that Petr suggests is an example of what is called a "left join" (meaning, when joining structures x and y, keep all the x rows, even if there is no corresponding row for y). You can do this via dplyr as follows: dplyr::left_join( fr2, fr1, by="Fruit") HTH, Eric On Thu, Apr 18, 2019 at 11:40 AM PIKAL Petr wrote: > Hi > > I wonder why such combination is so complicated in your text book. > > Having data frames fr1 and fr2 > > > dput(fr1) > structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana", > "mango", "pear"), class = "factor"), Calories = c(100L, 100L, > 200L)), class = "data.frame", row.names = c("1", "2", "3")) > > dput(fr2) > structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple", > "banana", "kiwi", "orange", "pear"), class = "factor"), Color = > structure(c(3L, > 4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow" > ), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label = > c("oblong", > "pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1, > 0)), class = "data.frame", row.names = c("1", "2", "3", "4", > "5")) > > > > > fr1 >Fruit Calories > 1 banana 100 > 2 pear 100 > 3 mango 200 > > > > you can use merge to combine those 2 data frames to get either all values > from both > > > merge(fr2, fr1, all=T) >Fruit Color Shape Juice Calories > 1 applered round 1.0 NA > 2 banana yellow oblong 0.0 100 > 3 kiwi green round 0.0 NA > 4 orange orange round 1.0 NA > 5 pear green pear 0.5 100 > 6 mango NA 200 > > just values from data frame with calories > > > merge(fr2, fr1, all.y=T) >Fruit Color Shape Juice Calories > 1 banana yellow oblong 0.0 100 > 2 pear green pear 0.5 100 > 3 mango NA 200 > > or just values from data frame with colours > > > merge(fr2, fr1, all.x=T) >Fruit Color Shape Juice Calories > 1 applered round 1.0 NA > 2 banana yellow oblong 0.0 100 > 3 kiwi green round 0.0 NA > 4 orange orange round 1.0 NA > 5 pear green pear 0.5 100 > > Cheers > Petr > > > > -Original Message- > > From: R-help On Behalf Of Drake Gossi > > Sent: Thursday, April 18, 2019 1:24 AM > > To: r-help@r-project.org > > Subject: [R] combining data.frames with is.na & match (), two questions > > > > Hello everyone, > > > > I'm working through this book, *Humanities Data in R* (Arnold & Tilton), > and > > I'm just having trouble understanding this maneuver. > > > > In sum, I'm trying to combine data in two different data.frames. > > > > This data.frame is called fruitNutr > > > > Fruit Calories > > 1 banana 100 > > 2 pear 100 > > 3 mango 200 > > > > And this data.frame is called fruitData > > > > Fruit Color Shape Juice > > 1 apple red round 1 > > 2 banana yellow oblong 0 > > 3 pear green pear 0.5 > > 4 orange orange round 1 > > 5 kiwi green round 0 > > > > So, as you can see, these two data.frames overlap insofar as they both > have > > banana and pear. So, what happens next is the book suggests this: > > > > fruitData$calories <- NA > > > > > > As a result, I've created a new column for the fruitData data.frame: > > > > Fruit Color Shape Juice Calories > > 1 apple red round 1N/A > > 2 banana yellow oblong 0N/A > > 3 pear green pear 0.5N/A > > 4 orange orange round 1N/A > > 5 kiwi green round 0N/A > > > > Then: > > > > > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index > > [1]NA 1 2 NA NA > > > is.na(index) > > [1]TRUE FALSEFALSE TRUETRUE > > > fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na > > (index)]] > > > fruitData > > > > Fruit Color Shape Juice Calories > > 1 apple red round 1N/A > > 2 banana yellow oblong 0 100 > > 3 pear green pear 0.5 100 > > 4 orange orange round 1N/A > > 5 kiwi green round 0N/A > > > > I get what the first part means, that first part being this: > > fruitData$Calories [!is.na(index)] > > go into the fruitData data.frame, specifically into the calories column, > and only > > for what's true according to is.na(index). But I just literally can't > understand > > this last part. fruitNutr$Calories[index[!is.na(index)]] > > > > Two questions. > > > > > >1. I just literally don't understand how this code works. It does > work, > >of course, but I don't know what it's doing, specifically this > [index[! > >is.na(index)]] part. Could someone explain it to me like I'm five? > I'm > >new at this... > >2. And then: is there any other way to combine these two data.frames > so > >that we get this same
Re: [R] combining data.frames with is.na & match (), two questions
Hi I wonder why such combination is so complicated in your text book. Having data frames fr1 and fr2 > dput(fr1) structure(list(Fruit = structure(c(1L, 3L, 2L), .Label = c("banana", "mango", "pear"), class = "factor"), Calories = c(100L, 100L, 200L)), class = "data.frame", row.names = c("1", "2", "3")) > dput(fr2) structure(list(Fruit = structure(c(1L, 2L, 5L, 4L, 3L), .Label = c("apple", "banana", "kiwi", "orange", "pear"), class = "factor"), Color = structure(c(3L, 4L, 1L, 2L, 1L), .Label = c("green", "orange", "red", "yellow" ), class = "factor"), Shape = structure(c(3L, 1L, 2L, 3L, 3L), .Label = c("oblong", "pear", "round"), class = "factor"), Juice = c(1, 0, 0.5, 1, 0)), class = "data.frame", row.names = c("1", "2", "3", "4", "5")) > > fr1 Fruit Calories 1 banana 100 2 pear 100 3 mango 200 > you can use merge to combine those 2 data frames to get either all values from both > merge(fr2, fr1, all=T) Fruit Color Shape Juice Calories 1 applered round 1.0 NA 2 banana yellow oblong 0.0 100 3 kiwi green round 0.0 NA 4 orange orange round 1.0 NA 5 pear green pear 0.5 100 6 mango NA 200 just values from data frame with calories > merge(fr2, fr1, all.y=T) Fruit Color Shape Juice Calories 1 banana yellow oblong 0.0 100 2 pear green pear 0.5 100 3 mango NA 200 or just values from data frame with colours > merge(fr2, fr1, all.x=T) Fruit Color Shape Juice Calories 1 applered round 1.0 NA 2 banana yellow oblong 0.0 100 3 kiwi green round 0.0 NA 4 orange orange round 1.0 NA 5 pear green pear 0.5 100 Cheers Petr > -Original Message- > From: R-help On Behalf Of Drake Gossi > Sent: Thursday, April 18, 2019 1:24 AM > To: r-help@r-project.org > Subject: [R] combining data.frames with is.na & match (), two questions > > Hello everyone, > > I'm working through this book, *Humanities Data in R* (Arnold & Tilton), and > I'm just having trouble understanding this maneuver. > > In sum, I'm trying to combine data in two different data.frames. > > This data.frame is called fruitNutr > > Fruit Calories > 1 banana 100 > 2 pear 100 > 3 mango 200 > > And this data.frame is called fruitData > > Fruit Color Shape Juice > 1 apple red round 1 > 2 banana yellow oblong 0 > 3 pear green pear 0.5 > 4 orange orange round 1 > 5 kiwi green round 0 > > So, as you can see, these two data.frames overlap insofar as they both have > banana and pear. So, what happens next is the book suggests this: > > fruitData$calories <- NA > > > As a result, I've created a new column for the fruitData data.frame: > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0N/A > 3 pear green pear 0.5N/A > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > Then: > > > index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index > [1]NA 1 2 NA NA > > is.na(index) > [1]TRUE FALSEFALSE TRUETRUE > > fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na > (index)]] > > fruitData > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0 100 > 3 pear green pear 0.5 100 > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > I get what the first part means, that first part being this: > fruitData$Calories [!is.na(index)] > go into the fruitData data.frame, specifically into the calories column, and > only > for what's true according to is.na(index). But I just literally can't > understand > this last part. fruitNutr$Calories[index[!is.na(index)]] > > Two questions. > > >1. I just literally don't understand how this code works. It does work, >of course, but I don't know what it's doing, specifically this [index[! >is.na(index)]] part. Could someone explain it to me like I'm five? I'm >new at this... >2. And then: is there any other way to combine these two data.frames so >that we get this same result? maybe an easier to understand method? > > That same result, again, is > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0 100 > 3 pear green pear 0.5 100 > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > > Drake > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na:
Re: [R] combining data.frames with is.na & match (), two questions
The whole thing is a merge operation, i.e. > FruitNutr <- read.table(text=" + Fruit Calories + 1 banana 100 + 2 pear 100 + 3 mango 200 + ") > FruitData <- read.table(text=" + Fruit Color Shape Juice + 1 apple red round 1 + 2 banana yellow oblong 0 + 3 pear green pear 0.5 + 4 orange orange round 1 + 5 kiwi green round 0 + ") > merge(FruitData, FruitNutr) Fruit Color Shape Juice Calories 1 banana yellow oblong 0.0 100 2 pear green pear 0.5 100 > merge(FruitData, FruitNutr, all.x=TRUE) Fruit Color Shape Juice Calories 1 applered round 1.0 NA 2 banana yellow oblong 0.0 100 3 kiwi green round 0.0 NA 4 orange orange round 1.0 NA 5 pear green pear 0.5 100 Mind you, merge() comes with its own set of confusing options in the more complex cases, which may be why the authors have chosen a more elementary approach. -pd > On 18 Apr 2019, at 01:24 , Drake Gossi wrote: > > Hello everyone, > > I'm working through this book, *Humanities Data in R* (Arnold & Tilton), > and I'm just having trouble understanding this maneuver. > > In sum, I'm trying to combine data in two different data.frames. > > This data.frame is called fruitNutr > > Fruit Calories > 1 banana 100 > 2 pear 100 > 3 mango 200 > > And this data.frame is called fruitData > > Fruit Color Shape Juice > 1 apple red round 1 > 2 banana yellow oblong 0 > 3 pear green pear 0.5 > 4 orange orange round 1 > 5 kiwi green round 0 > > So, as you can see, these two data.frames overlap insofar as they both have > banana and pear. So, what happens next is the book suggests this: > > fruitData$calories <- NA > > > As a result, I've created a new column for the fruitData data.frame: > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0N/A > 3 pear green pear 0.5N/A > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > Then: > >> index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) >> index > [1]NA 1 2 NA NA >> is.na(index) > [1]TRUE FALSEFALSE TRUETRUE >> fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na > (index)]] >> fruitData > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0 100 > 3 pear green pear 0.5 100 > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > I get what the first part means, that first part being this: > fruitData$Calories [!is.na(index)] > go into the fruitData data.frame, specifically into the calories column, > and only for what's true according to is.na(index). But I just literally > can't understand this last part. fruitNutr$Calories[index[!is.na(index)]] > > Two questions. > > > 1. I just literally don't understand how this code works. It does work, > of course, but I don't know what it's doing, specifically this [index[! > is.na(index)]] part. Could someone explain it to me like I'm five? I'm > new at this... > 2. And then: is there any other way to combine these two data.frames so > that we get this same result? maybe an easier to understand method? > > That same result, again, is > > Fruit Color Shape Juice Calories > 1 apple red round 1N/A > 2 banana yellow oblong 0 100 > 3 pear green pear 0.5 100 > 4 orange orange round 1N/A > 5 kiwi green round 0N/A > > > Drake > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combining data.frames with is.na & match (), two questions
Dear Drake See in-line comments On 18/04/2019 00:24, Drake Gossi wrote: Hello everyone, I'm working through this book, *Humanities Data in R* (Arnold & Tilton), and I'm just having trouble understanding this maneuver. In sum, I'm trying to combine data in two different data.frames. This data.frame is called fruitNutr Fruit Calories 1 banana 100 2 pear 100 3 mango 200 And this data.frame is called fruitData Fruit Color Shape Juice 1 apple red round 1 2 banana yellow oblong 0 3 pear green pear 0.5 4 orange orange round 1 5 kiwi green round 0 So, as you can see, these two data.frames overlap insofar as they both have banana and pear. So, what happens next is the book suggests this: fruitData$calories <- NA As a result, I've created a new column for the fruitData data.frame: Fruit Color Shape Juice Calories 1 apple red round 1N/A 2 banana yellow oblong 0N/A 3 pear green pear 0.5N/A 4 orange orange round 1N/A 5 kiwi green round 0N/A Then: index <- match (x=fruitData$Fruit, table=fruitNutr$Fruit) index [1]NA 1 2 NA NA is.na(index) [1]TRUE FALSEFALSE TRUETRUE fruitData$Calories [!is.na(index)] <- fruitNutr$Calories[index[!is.na (index)]] fruitData Fruit Color Shape Juice Calories 1 apple red round 1N/A 2 banana yellow oblong 0 100 3 pear green pear 0.5 100 4 orange orange round 1N/A 5 kiwi green round 0N/A I get what the first part means, that first part being this: fruitData$Calories [!is.na(index)] go into the fruitData data.frame, specifically into the calories column, and only for what's true according to is.na(index). But I just literally can't understand this last part. fruitNutr$Calories[index[!is.na(index)]] Two questions. 1. I just literally don't understand how this code works. It does work, of course, but I don't know what it's doing, specifically this [index[! is.na(index)]] part. Could someone explain it to me like I'm five? I'm new at this... Decompose it from the inside out. So !is.na(index) gives you a vector the same length as index which is true if index has a value and false if it is NA index[ something ] gives you a vector of all the values of index corresponding to something being true (in this case). Note this vector may be shorter than something if that contains FALSE. That should help you get started. My personal opinion is that it is much clearer with these things to do it in separate stages. keep <= !is.na(index) index[keep] and check the value of keep if it seems to have gone wrong 2. And then: is there any other way to combine these two data.frames so that we get this same result? maybe an easier to understand method? That same result, again, is Fruit Color Shape Juice Calories 1 apple red round 1N/A 2 banana yellow oblong 0 100 3 pear green pear 0.5 100 4 orange orange round 1N/A 5 kiwi green round 0N/A Drake [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- This email has been checked for viruses by AVG. https://www.avg.com -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.