subject:"\[R\] Merge Data"

Re: [R] Merge data by coordinates

2016-10-18 Thread Michal Kubista

Dear Milu,
If your objective is to match the places from one table to the nearest
place in the second table, you can generally use knn algorithm for 1
nearest neighbourhood.
But please, check what David suggests first.

Best regards,
Michal

2016-10-16 19:24 GMT+02:00 David Winsemius :

>
> > On Oct 16, 2016, at 6:32 AM, Miluji Sb  wrote:
> >
> > Dear all,
> >
> > I have two dataframe 1 by latitude and longitude but they always do not
> > match. Is it possible to merge them (e.g. nearest distance)?
> >
> > # Dataframe 1
> > structure(list(lat = c(54L, 55L, 51L, 54L, 53L, 50L, 47L, 51L,
> > 49L, 54L), lon = c(14L, 8L, 15L, 7L, 6L, 5L, 13L, 5L, 13L, 11L
> > ), PPP2000_40 = c(4606, 6575, 6593, 7431, 9393, 10773, 11716,
> > 12226, 13544, 14526)), .Names = c("lat", "lon", "PPP2000_40"), row.names
> =
> > c(6764L,
> > 8796L, 8901L, 9611L, 11649L, 12819L, 13763L, 14389L, 15641L,
> > 16571L), class = "data.frame")
> >
> > # Dataframe 2
> > structure(list(lat = c(47, 47, 47, 47, 47, 47, 48, 48, 48, 48
> > ), lon = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), GDP = c(19.09982,
> > 13.31977, 14.95925, 6.8575635, 23.334565, 6.485748, 24.01197,
> > 14.30393075, 21.33759675, 9.71803675)), .Names = c("lat", "lon",
> > "GDP"), row.names = c(NA, 10L), class = "data.frame")
>
> I think you should first do this:
>
> plot(d1$lat,d1$lon)
> points(d2$lat,d2$lon, col="red")
>
> And then respond to my suggestion that this is not a well-posed computing
> problem. Explain why the red dots should have a 1-1 relationship with the
> black dots.
>
>
> --
> David.
>
> >
> > Thank you so much!
> >
> > Sincerely,
> >
> > Milu
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data by coordinates

2016-10-16 Thread David Winsemius


> On Oct 16, 2016, at 6:32 AM, Miluji Sb  wrote:
> 
> Dear all,
> 
> I have two dataframe 1 by latitude and longitude but they always do not
> match. Is it possible to merge them (e.g. nearest distance)?
> 
> # Dataframe 1
> structure(list(lat = c(54L, 55L, 51L, 54L, 53L, 50L, 47L, 51L,
> 49L, 54L), lon = c(14L, 8L, 15L, 7L, 6L, 5L, 13L, 5L, 13L, 11L
> ), PPP2000_40 = c(4606, 6575, 6593, 7431, 9393, 10773, 11716,
> 12226, 13544, 14526)), .Names = c("lat", "lon", "PPP2000_40"), row.names =
> c(6764L,
> 8796L, 8901L, 9611L, 11649L, 12819L, 13763L, 14389L, 15641L,
> 16571L), class = "data.frame")
> 
> # Dataframe 2
> structure(list(lat = c(47, 47, 47, 47, 47, 47, 48, 48, 48, 48
> ), lon = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), GDP = c(19.09982,
> 13.31977, 14.95925, 6.8575635, 23.334565, 6.485748, 24.01197,
> 14.30393075, 21.33759675, 9.71803675)), .Names = c("lat", "lon",
> "GDP"), row.names = c(NA, 10L), class = "data.frame")

I think you should first do this:

plot(d1$lat,d1$lon)
points(d2$lat,d2$lon, col="red")

And then respond to my suggestion that this is not a well-posed computing 
problem. Explain why the red dots should have a 1-1 relationship with the black 
dots.


-- 
David.

> 
> Thank you so much!
> 
> Sincerely,
> 
> Milu
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge data by coordinates

2016-10-16 Thread Miluji Sb

Dear all,

I have two dataframe 1 by latitude and longitude but they always do not
match. Is it possible to merge them (e.g. nearest distance)?

# Dataframe 1
structure(list(lat = c(54L, 55L, 51L, 54L, 53L, 50L, 47L, 51L,
49L, 54L), lon = c(14L, 8L, 15L, 7L, 6L, 5L, 13L, 5L, 13L, 11L
), PPP2000_40 = c(4606, 6575, 6593, 7431, 9393, 10773, 11716,
12226, 13544, 14526)), .Names = c("lat", "lon", "PPP2000_40"), row.names =
c(6764L,
8796L, 8901L, 9611L, 11649L, 12819L, 13763L, 14389L, 15641L,
16571L), class = "data.frame")

# Dataframe 2
structure(list(lat = c(47, 47, 47, 47, 47, 47, 48, 48, 48, 48
), lon = c(7, 8, 9, 10, 11, 12, 7, 8, 9, 10), GDP = c(19.09982,
13.31977, 14.95925, 6.8575635, 23.334565, 6.485748, 24.01197,
14.30393075, 21.33759675, 9.71803675)), .Names = c("lat", "lon",
"GDP"), row.names = c(NA, 10L), class = "data.frame")

Thank you so much!

Sincerely,

Milu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frame with mispelling characters

2012-11-02 Thread VictorDelgado

David Winsemius wrote
> On Nov 2, 2012, at 11:20 AM, VictorDelgado wrote:
> 
>> Hello dear R-helpers,
>> 
>> I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of
>> two
>> data frames by characters. 
>> In each data frame I got two different list of names, that is my main-key
>> to
>> be merged.
>> 
>> To figure out what I'm saying, I build up a modified "?merge" example,
>> with
>> errors by purpose:
>> 
>> # Data for authors:
>> 
>> authors <- data.frame(
>>surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")),
>>nationality = c("US", "Australia", "US", "UK", "Australia"),
>>deceased = c("yes", rep("no", 4)))
>> 
>> "Venables" is without  the final 's', and "Tierney, without "i".
>> 
>> # Data for books:
>> 
>> books <- data.frame(
>>surname = I(c("Tukey", "Venables", "Tierney",
>> "Ripley", "Rippley", "McNeil", "R Core")),
>>title = c("Exploratory Data Analysis",
>>  "Modern Applied Statistics ...",
>>  "LISP-STAT",
>>  "Spatial Statistics", "Stochastic Simulation",
>>  "Interactive Data Analysis",
>>  "An Introduction to R"),
>>other.author = c(NA, "Ripley", NA, NA, NA, NA,
>> "Venables & Smith"))
> 
> In your example the authors list has better spelling. The 'agrep'
> functions by default will return matches that are 90% ( or more precisely
> Levenshtein distance of less than or equalt to 0.1) :
> 
> 
> books$altname <- NA
> altidx <- unlist( sapply(books$surname, agrep, authors$surname) )
> books$altname[seq(altidx)] <- authors$surname[altidx]
> books
> #---
>surname title other.author altname
> 1Tukey Exploratory Data Analysis 
> 
>Tukey
> 2 Venables Modern Applied Statistics ...   Ripley Venable
> 3  Tierney LISP-STAT 
> 
>   Terney
> 4   RipleySpatial Statistics 
> 
>   Ripley
> 5  Rippley Stochastic Simulation 
> 
>   Ripley
> 6   McNeil Interactive Data Analysis 
> 
>   McNeil
> 7   R Core  An Introduction to R Venables & Smith
> 
> If you then match 'books' to 'authors' with a merge on authors$surname and
> books$altname, you should get closer to your goals
> 
> -- 
> David. 
>> 
>> With "surname" column instead of "name" (differs from original example
>> for
>> more easy going merge). And the second "Ripley" with double "p".
>> 
>> So, if I ask for:
>> 
>> merge(authors, books, all=TRUE)
>> 
>> I got:
>> 
>> 
>> But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney"
>> and "Venable" to "Venables". I was wondering if there was any way to work
>> around this problem. My orginal data have around 27,000 name entries, and
>> if
>> I take "all=FALSE", this database drops out to around 17,000, most
>> because
>> mispelling (or truncated expressions). If I take "all=TRUE", I got many
>> of
>> this 
> 
>  cases like the example above.
>> 
>> Has anyone experienced this? Any idea how I can get out? I'm thinking to
>> take the longest match possible to each entry. For example, in
>> "Venable"/"Venables" there is a 87.5% match. As I have name and surname,
>> and
>> also auxiliary keys to this match, I think this could work.
>> 
>> Thank you in advance.
>> 
>> 
>> 
>> -
>> Victor Delgado
>> cedeplar.ufmg.br P.H.D. student
>> www.fjp.mg.gov.br reseacher
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html
>> Sent from the R help mailing list archive at Nabble.com.
>> 
>> __
>> 

> R-help@

>  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> Alameda, CA, USA
> 
> __

> R-help@

>  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

It's amazing to get such useful answers so fast.  I did not know the
RecordLinkage package, it looks very sophisticated and useful for this kind
of demand. I just made some tests and I think it could be very useful. 
I'm working with portuguese spelling names, so I will also test agrep and
see what function returns better results, giving less data loss. 
Thank you a lot, Jim Holtman and also David Winsemius.



-
Victor Delgado
cedeplar.ufmg.br P.H.D. student
UFOP assistant professor
--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255p4648266.html
Sent from the R help mailing list archive at Nabble.com.

__
R-hel

Re: [R] Merge data frame with mispelling characters

2012-11-02 Thread David Winsemius


On Nov 2, 2012, at 11:20 AM, VictorDelgado wrote:

> Hello dear R-helpers,
> 
> I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of two
> data frames by characters. 
> In each data frame I got two different list of names, that is my main-key to
> be merged.
> 
> To figure out what I'm saying, I build up a modified "?merge" example, with
> errors by purpose:
> 
> # Data for authors:
> 
> authors <- data.frame(
>surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")),
>nationality = c("US", "Australia", "US", "UK", "Australia"),
>deceased = c("yes", rep("no", 4)))
> 
> "Venables" is without  the final 's', and "Tierney, without "i".
> 
> # Data for books:
> 
> books <- data.frame(
>surname = I(c("Tukey", "Venables", "Tierney",
> "Ripley", "Rippley", "McNeil", "R Core")),
>title = c("Exploratory Data Analysis",
>  "Modern Applied Statistics ...",
>  "LISP-STAT",
>  "Spatial Statistics", "Stochastic Simulation",
>  "Interactive Data Analysis",
>  "An Introduction to R"),
>other.author = c(NA, "Ripley", NA, NA, NA, NA,
> "Venables & Smith"))

In your example the authors list has better spelling. The 'agrep' functions by 
default will return matches that are 90% ( or more precisely Levenshtein 
distance of less than or equalt to 0.1) :


books$altname <- NA
altidx <- unlist( sapply(books$surname, agrep, authors$surname) )
books$altname[seq(altidx)] <- authors$surname[altidx]
books
#---
   surname title other.author altname
1Tukey Exploratory Data AnalysisTukey
2 Venables Modern Applied Statistics ...   Ripley Venable
3  Tierney LISP-STAT   Terney
4   RipleySpatial Statistics   Ripley
5  Rippley Stochastic Simulation   Ripley
6   McNeil Interactive Data Analysis   McNeil
7   R Core  An Introduction to R Venables & Smith

If you then match 'books' to 'authors' with a merge on authors$surname and 
books$altname, you should get closer to your goals

-- 
David. 
> 
> With "surname" column instead of "name" (differs from original example for
> more easy going merge). And the second "Ripley" with double "p".
> 
> So, if I ask for:
> 
> merge(authors, books, all=TRUE)
> 
> I got:
> 
> 
> But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney"
> and "Venable" to "Venables". I was wondering if there was any way to work
> around this problem. My orginal data have around 27,000 name entries, and if
> I take "all=FALSE", this database drops out to around 17,000, most because
> mispelling (or truncated expressions). If I take "all=TRUE", I got many of
> this  cases like the example above.
> 
> Has anyone experienced this? Any idea how I can get out? I'm thinking to
> take the longest match possible to each entry. For example, in
> "Venable"/"Venables" there is a 87.5% match. As I have name and surname, and
> also auxiliary keys to this match, I think this could work.
> 
> Thank you in advance.
> 
> 
> 
> -
> Victor Delgado
> cedeplar.ufmg.br P.H.D. student
> www.fjp.mg.gov.br reseacher
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frame with mispelling characters

2012-11-02 Thread jim holtman

You might try the 'soundex' function in the RecordLinkage package:

> soundex('ripley')
[1] "R140"
> soundex('rippley')
[1] "R140"
> soundex('venable')
[1] "V514"
> soundex('venables')
[1] "V514"
> soundex('terney')
[1] "T650"
> soundex('tierney')
[1] "T650"


On Fri, Nov 2, 2012 at 2:20 PM, VictorDelgado  wrote:
> Hello dear R-helpers,
>
> I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of two
> data frames by characters.
> In each data frame I got two different list of names, that is my main-key to
> be merged.
>
> To figure out what I'm saying, I build up a modified "?merge" example, with
> errors by purpose:
>
> # Data for authors:
>
> authors <- data.frame(
> surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")),
> nationality = c("US", "Australia", "US", "UK", "Australia"),
> deceased = c("yes", rep("no", 4)))
>
> "Venables" is without  the final 's', and "Tierney, without "i".
>
> # Data for books:
>
> books <- data.frame(
> surname = I(c("Tukey", "Venables", "Tierney",
>  "Ripley", "Rippley", "McNeil", "R Core")),
> title = c("Exploratory Data Analysis",
>   "Modern Applied Statistics ...",
>   "LISP-STAT",
>   "Spatial Statistics", "Stochastic Simulation",
>   "Interactive Data Analysis",
>   "An Introduction to R"),
> other.author = c(NA, "Ripley", NA, NA, NA, NA,
>  "Venables & Smith"))
>
> With "surname" column instead of "name" (differs from original example for
> more easy going merge). And the second "Ripley" with double "p".
>
> So, if I ask for:
>
> merge(authors, books, all=TRUE)
>
> I got:
>
>
> But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney"
> and "Venable" to "Venables". I was wondering if there was any way to work
> around this problem. My orginal data have around 27,000 name entries, and if
> I take "all=FALSE", this database drops out to around 17,000, most because
> mispelling (or truncated expressions). If I take "all=TRUE", I got many of
> this  cases like the example above.
>
> Has anyone experienced this? Any idea how I can get out? I'm thinking to
> take the longest match possible to each entry. For example, in
> "Venable"/"Venables" there is a 87.5% match. As I have name and surname, and
> also auxiliary keys to this match, I think this could work.
>
> Thank you in advance.
>
>
>
> -
> Victor Delgado
> cedeplar.ufmg.br P.H.D. student
> www.fjp.mg.gov.br reseacher
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge data frame with mispelling characters

2012-11-02 Thread VictorDelgado

Hello dear R-helpers,

I'm working with R-2.15.2 on Windows 7 OS. I'm stucked with a merge of two
data frames by characters. 
In each data frame I got two different list of names, that is my main-key to
be merged.

To figure out what I'm saying, I build up a modified "?merge" example, with
errors by purpose:

# Data for authors:

authors <- data.frame(
surname = I(c("Tukey", "Venable", "Terney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))

"Venables" is without  the final 's', and "Tierney, without "i".

# Data for books:

books <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney",
 "Ripley", "Rippley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
  "Modern Applied Statistics ...",
  "LISP-STAT",
  "Spatial Statistics", "Stochastic Simulation",
  "Interactive Data Analysis",
  "An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
 "Venables & Smith"))
 
With "surname" column instead of "name" (differs from original example for
more easy going merge). And the second "Ripley" with double "p".

So, if I ask for:

merge(authors, books, all=TRUE)

I got:


But we know that "Rippley" corresponds to "Ripley", "Terney" to "Tierney"
and "Venable" to "Venables". I was wondering if there was any way to work
around this problem. My orginal data have around 27,000 name entries, and if
I take "all=FALSE", this database drops out to around 17,000, most because
mispelling (or truncated expressions). If I take "all=TRUE", I got many of
this  cases like the example above.

Has anyone experienced this? Any idea how I can get out? I'm thinking to
take the longest match possible to each entry. For example, in
"Venable"/"Venables" there is a 87.5% match. As I have name and surname, and
also auxiliary keys to this match, I think this could work.

Thank you in advance.



-
Victor Delgado
cedeplar.ufmg.br P.H.D. student
www.fjp.mg.gov.br reseacher
--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-data-frame-with-mispelling-characters-tp4648255.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge Data by time stamps

2011-10-10 Thread David Winsemius



On Oct 10, 2011, at 1:28 PM, Alaios wrote:


Dear all,
I have some device measurements and the time stamps I get from it  
have the below format:


MyStruct$TimeStamps[1,]

[1] 2011.000   10.0006.000   16.000   23.000   30.539


I can convert them easily with ISOdate() to a number and do the  
calculations I need.


One of my problems is that I want to gather my measurements to piles  
of duration (let's say) 5 minutes.

Afterwards I will apply a function to these piles.
As the device is not super-precise please find below the time needed  
for one operation to complete (in seconds)

.

1.10
1.90
1.34
1.23
1.56
1.22
1.34



Assuming I understand your presentation and lacking  R-coded examples  
and desired output on which to test:


?cumsum
?cut



as you understand I can not say that 5 minutes measurements are  
specific to X consecutive measurements but differ. How I can ask  
from R to do the summation and whenever there is a 5 minute data set  
to split it so to apply it into a function?


I would like to thank you in advance for your help


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge Data by time stamps

2011-10-10 Thread Alaios

Dear all,
I have some device measurements and the time stamps I get from it have the 
below format:

MyStruct$TimeStamps[1,]
> [1] 2011.000   10.000    6.000   16.000   23.000   30.539

I can convert them easily with ISOdate() to a number and do the calculations I 
need.

One of my problems is that I want to gather my measurements to piles of 
duration (let's say) 5 minutes.
Afterwards I will apply a function to these piles.
As the device is not super-precise please find below the time needed for one 
operation to complete (in seconds)
.

1.10
1.90
1.34
1.23
1.56
1.22
1.34


as you understand I can not say that 5 minutes measurements are specific to X 
consecutive measurements but differ. How I can ask from R to do the summation 
and whenever there is a 5 minute data set to split it so to apply it into a 
function?

I would like to thank you in advance for your help

B.R
Alex

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data under conditions

2011-03-13 Thread flymer

Thanks to both of you for your help! 

Jim, my problem is to match some observations of a time serie (vector 'a' in
my example) with theoretical predictions of this process (vector 'b' in my
example), with a small time lag between them.



--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-data-under-conditions-tp3350864p3351472.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data under conditions

2011-03-12 Thread jim holtman

use the sqldf package:

> require(sqldf)
> a
  time x
1  1.0 4
2  2.2 5
3  5.2 6
> b
  time y
10 1
21 3
32 5
44 7
55 9
> sqldf("
+ select a.time, a.x, b.y
+ from a, b
+ where abs(a.time - b.time) < 0.5
+ ")
  time x y
1  1.0 4 3
2  2.2 5 5
3  5.2 6 9
>


On Sat, Mar 12, 2011 at 4:14 PM, flymer  wrote:
> Dear All,
>
> Debuting in R, I'm facing a problem.
> I have 2 vectors, say 'a' et 'b', and I'd like to merge them according to
> the proximity of their variable 'time'.
> How to do to keep elements which satisfy (for example) 'a$time-b$time<0.5'?
>
> For example :
>
>> a
>  time x
> 1  1.0 4
> 2  2.2 5
> 3  5.2 6
>
>> b
>  time y
> 1    0 1
> 2    1 3
> 3    2 5
> 4    4 7
> 5    5 9
>
> I'd like to get :
>
>>
>  time x y
> 1  1.0 4 3
> 2  2.2 5 5
> 3  5.2 6 9
>
> I thought using the fonction 'merge'...
> I hope you can help me! Thanks in advance!
>
> Jerome.
>
>
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Merge-data-under-conditions-tp3350864p3350864.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data under conditions

2011-03-12 Thread David Winsemius



On Mar 12, 2011, at 4:14 PM, flymer wrote:


Dear All,

Debuting in R, I'm facing a problem.
I have 2 vectors, say 'a' et 'b', and I'd like to merge them  
according to

the proximity of their variable 'time'.
How to do to keep elements which satisfy (for example) 'a$time-b 
$time<0.5'?


For example :


a

 time x
1  1.0 4
2  2.2 5
3  5.2 6


b

 time y
10 1
21 3
32 5
44 7
55 9

I'd like to get :




 time x y
1  1.0 4 3
2  2.2 5 5
3  5.2 6 9

I thought using the fonction 'merge'...


There are often SQL magical incantation to acheive such, and there is  
an `sqldf` package that might help, but I am not competent with it.  
Here is a base R solution using three functions (six, if you count  
"$", "<", and "-":


?expand.grid
?rep
?"["

dfrm<- expand.grid(a$time, b$time)
dfrm$x <- a$x  # by virtue of recycling
dfrm$y <- rep(b$y,  each=3)

> dfrm[abs(dfrm$Var1-dfrm$Var2) < 0.5, ]
   Var1 Var2 x y
4   1.01 4 3
8   2.22 5 5
15  5.25 6 9

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge data under conditions

2011-03-12 Thread flymer

Dear All,

Debuting in R, I'm facing a problem. 
I have 2 vectors, say 'a' et 'b', and I'd like to merge them according to
the proximity of their variable 'time'. 
How to do to keep elements which satisfy (for example) 'a$time-b$time<0.5'?

For example :

> a
  time x
1  1.0 4
2  2.2 5
3  5.2 6

> b
  time y
10 1
21 3
32 5
44 7
55 9

I'd like to get :

> 
  time x y
1  1.0 4 3
2  2.2 5 5
3  5.2 6 9

I thought using the fonction 'merge'...
I hope you can help me! Thanks in advance!

Jerome.


--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-data-under-conditions-tp3350864p3350864.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge Data

2010-11-10 Thread Tal Galili

Hello Nasrin,

Please attach how each of your files look like (their first few rows), so we
could understand why the rbind doesn't work.

In general, be sure to keep the r-help e-mail also corresponded so others
might help if I don't know the answer (or if they answer before me).

Best,
Tal


Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Wed, Nov 10, 2010 at 2:37 AM, Nasrin Pak  wrote:

>   Hello;
>
>  I didn`t manage to do it with rbind command! I`m wondering if this kind of
> combining helps for my work.(actually I`m a new user of R ) My problem is
> that I have a data set for every day of measurement in a seperate file and I
> want to plot one parameter of the data for all the days in one graph. I
> tried to use for loop but only the last data remains in the program memory,
> I don`t know how to plot each day`s data continusly after the others(or how
> to extending the x axis.) Would you please help me with it?
>
> Thanks for your help!
>
>
> On Tue, Nov 9, 2010 at 12:23 PM, Tal Galili  wrote:
>
>> Hello Nasrin,
>>
>> I think you might be wanting to use
>> rbind
>> instead of
>> merge
>>
>> 
>>
>> Contact
>> Details:---
>> Contact me: tal.gal...@gmail.com |  972-52-7275845
>> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
>> www.r-statistics.com (English)
>>
>> --
>>
>>
>>
>>
>> On Tue, Nov 9, 2010 at 8:22 PM, Nasrin Pak  wrote:
>>
>>> Hello;
>>> I have a problem merging data sets. I use this command:
>>>
>>>  FileNames <- list.files(path="C:/updated_CFL_Rad_files/2007/11",
>>> full.names=TRUE)
>>> > dataMerge <- data.frame()
>>> > for(f in FileNames){
>>> +   ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")
>>> +   dataMerge <- merge(dataMerge, ReadInMerge,all=T)
>>> +
>>> + }
>>>
>>> and an error occurs.The size of the data is about 7.5 Mb, I don't know
>>> what
>>> does 221 Mb mean!
>>>
>>> Error: cannot allocate vector of size 221.6 Mb
>>> In addition: Warning messages:
>>> 1: Reached total allocation of 502Mb: see help(memory.size)
>>> 2: Reached total allocation of 502Mb: see help(memory.size)
>>> 3: Reached total allocation of 502Mb: see help(memory.size)
>>> 4: Reached total allocation of 502Mb: see help(memory.size)
>>> --
>>> Sincerely
>>>
>>> Nasrin  Pak
>>>
>>>[[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
>
> --
> Sincerely
>
> Nasrin  Pak
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge Data

2010-11-09 Thread Tal Galili

Hello Nasrin,

I think you might be wanting to use
rbind
instead of
merge



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Tue, Nov 9, 2010 at 8:22 PM, Nasrin Pak  wrote:

> Hello;
> I have a problem merging data sets. I use this command:
>
>  FileNames <- list.files(path="C:/updated_CFL_Rad_files/2007/11",
> full.names=TRUE)
> > dataMerge <- data.frame()
> > for(f in FileNames){
> +   ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")
> +   dataMerge <- merge(dataMerge, ReadInMerge,all=T)
> +
> + }
>
> and an error occurs.The size of the data is about 7.5 Mb, I don't know what
> does 221 Mb mean!
>
> Error: cannot allocate vector of size 221.6 Mb
> In addition: Warning messages:
> 1: Reached total allocation of 502Mb: see help(memory.size)
> 2: Reached total allocation of 502Mb: see help(memory.size)
> 3: Reached total allocation of 502Mb: see help(memory.size)
> 4: Reached total allocation of 502Mb: see help(memory.size)
> --
> Sincerely
>
> Nasrin  Pak
>
>[[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge Data

2010-11-09 Thread Nasrin Pak

Hello;
I have a problem merging data sets. I use this command:

 FileNames <- list.files(path="C:/updated_CFL_Rad_files/2007/11",
full.names=TRUE)
> dataMerge <- data.frame()
> for(f in FileNames){
+   ReadInMerge <- read.csv(file=f, header=T, na.strings="NULL")
+   dataMerge <- merge(dataMerge, ReadInMerge,all=T)
+
+ }

and an error occurs.The size of the data is about 7.5 Mb, I don't know what
does 221 Mb mean!

Error: cannot allocate vector of size 221.6 Mb
In addition: Warning messages:
1: Reached total allocation of 502Mb: see help(memory.size)
2: Reached total allocation of 502Mb: see help(memory.size)
3: Reached total allocation of 502Mb: see help(memory.size)
4: Reached total allocation of 502Mb: see help(memory.size)
-- 
Sincerely

Nasrin  Pak

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data

2009-11-10 Thread Chuck White

David -- thank you for your response. 

merge does work but it creates another dataframe. df1 is very large and I did 
not want another copy created. What I ended up doing is:
df1 <- merge(df1, df2, by="week")

In terms of memory allocation, will memory for two dataframes be allocated or 
will the additional column be added to df1?

Thanks.

 David Winsemius  wrote: 
> 
> On Nov 10, 2009, at 12:36 PM, Chuck White wrote:
> 
> > df1 -- dataframe with column date and several other columns. #rows  
> > >40k  Several of the dates are repeated.
> > df2 -- dataframe with two columns date and index. #rows ~130  This  
> > is really a map from date to index.
> >
> > I would like to create a column called index in df1 which has the  
> > corresponding index from df2.
> >
> > The following works:
> > index <- NULL
> > for(wk in df1$week){
> >index <- c(index,df2$index[df2$week==wk])
> > }
> > and then add index to df1.
> >
> > Can you please suggest a better way of doing this? I didn't think  
> > merge was suitable for this...is it? THANKS.
> 
> I think merge should work, but if you really have looked at the  
> various arguments, tested reasonable examples and are still convinced  
> it wouldn't, then see what you get with:
> 
>  > df1 <- data.frame(dt = Sys.Date() - sample(100:120, 30,  
> replace=TRUE), 1:30)
>  > df2 <- data.frame(dt2 = Sys.Date() -100:120, index=LETTERS[1:21])
> 
>  > df1$index <- df2[ match(df1$dt,df2$dt2), "index"]
>  > df1
> dt X1.30 index
> 1  2009-07-30 1 D
> 2  2009-07-16 2 R
> 3  2009-07-23 3 K
> 4  2009-07-29 4 E
> 5  2009-07-15 5 S
> 6  2009-08-02 6 A
> 7  2009-07-18 7 P
> 8  2009-07-21 8 M
> 9  2009-07-27 9 G
> 10 2009-07-2610 H
> 11 2009-07-3111 C
> 12 2009-07-2612 H
> 13 2009-07-1813 P
> 14 2009-07-2314 K
> 15 2009-07-2115 M
> 16 2009-07-1916 O
> 17 2009-07-1417 T
> 18 2009-07-1618 R
> 19 2009-07-1519 S
> 20 2009-07-1320 U
> 21 2009-07-2821 F
> 22 2009-07-2022 N
> 23 2009-07-2423 J
> 24 2009-07-2024 N
> 25 2009-07-1625 R
> 26 2009-07-3026 D
> 27 2009-07-1427 T
> 28 2009-08-0228 A
> 29 2009-07-1929 O
> 30 2009-07-2630 H
> 
> I tried merge(df1, df2, by.x=1, by.y=1) and got the same result modulo  
> the order of the output.
> 
> 
> --
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data

2009-11-10 Thread David Winsemius

On Nov 10, 2009, at 12:36 PM, Chuck White wrote:

df1 -- dataframe with column date and several other columns. #rows  
>40k  Several of the dates are repeated.
df2 -- dataframe with two columns date and index. #rows ~130  This  
is really a map from date to index.

I would like to create a column called index in df1 which has the  
corresponding index from df2.

The following works:
index <- NULL
for(wk in df1$week){
   index <- c(index,df2$index[df2$week==wk])
}
and then add index to df1.

Can you please suggest a better way of doing this? I didn't think  
merge was suitable for this...is it? THANKS.

I think merge should work, but if you really have looked at the  
various arguments, tested reasonable examples and are still convinced  
it wouldn't, then see what you get with:

> df1 <- data.frame(dt = Sys.Date() - sample(100:120, 30,  
replace=TRUE), 1:30)

> df2 <- data.frame(dt2 = Sys.Date() -100:120, index=LETTERS[1:21])

> df1$index <- df2[ match(df1$dt,df2$dt2), "index"]
> df1
   dt X1.30 index
1  2009-07-30 1 D
2  2009-07-16 2 R
3  2009-07-23 3 K
4  2009-07-29 4 E
5  2009-07-15 5 S
6  2009-08-02 6 A
7  2009-07-18 7 P
8  2009-07-21 8 M
9  2009-07-27 9 G
10 2009-07-2610 H
11 2009-07-3111 C
12 2009-07-2612 H
13 2009-07-1813 P
14 2009-07-2314 K
15 2009-07-2115 M
16 2009-07-1916 O
17 2009-07-1417 T
18 2009-07-1618 R
19 2009-07-1519 S
20 2009-07-1320 U
21 2009-07-2821 F
22 2009-07-2022 N
23 2009-07-2423 J
24 2009-07-2024 N
25 2009-07-1625 R
26 2009-07-3026 D
27 2009-07-1427 T
28 2009-08-0228 A
29 2009-07-1929 O
30 2009-07-2630 H

I tried merge(df1, df2, by.x=1, by.y=1) and got the same result modulo  
the order of the output.

--

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] merge data

2009-11-10 Thread Chuck White

df1 -- dataframe with column date and several other columns. #rows >40k  
Several of the dates are repeated.
df2 -- dataframe with two columns date and index. #rows ~130  This is really a 
map from date to index.

I would like to create a column called index in df1 which has the corresponding 
index from df2.

The following works:
index <- NULL
for(wk in df1$week){
index <- c(index,df2$index[df2$week==wk])
}
and then add index to df1.

Can you please suggest a better way of doing this? I didn't think merge was 
suitable for this...is it? THANKS.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but prefer values in on

2009-09-14 Thread Nandi

No you cannot. You may want to write a merge function with the special
capability but there is no better way than the one suggested by
Henrique.

On Sep 14, 12:18 pm, JiHO  wrote:
> On 2009-September-11  , at 13:55 ,  wrote:
>
> > Maybe:
>
> > do.call(rbind, lapply(with(xy <- rbind(x, y), split(xy, list(a, b),  
> > drop = TRUE)), tail, 1))
>
> > On Fri, Sep 11, 2009 at 3:45 AM, jo  wrote:
> > Thanks for the post-processing ideas. But is there any way to do that
> > in one step?
>
> Thanks but by "in one step" I meant within the merge, not in one post-
> processing step ;)
>
> JiHO
> ---http://maururu.net
>
> __
> r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but prefer values in on

2009-09-14 Thread JiHO


On 2009-September-11  , at 13:55 ,  wrote:


Maybe:

do.call(rbind, lapply(with(xy <- rbind(x, y), split(xy, list(a, b),  
drop = TRUE)), tail, 1))


On Fri, Sep 11, 2009 at 3:45 AM, jo  wrote:
Thanks for the post-processing ideas. But is there any way to do that
in one step?


Thanks but by "in one step" I meant within the merge, not in one post- 
processing step ;)


JiHO
---
http://maururu.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but prefer values in one

2009-09-11 Thread Henrique Dallazuanna

Maybe:

do.call(rbind, lapply(with(xy <- rbind(x, y), split(xy, list(a, b), drop =
TRUE)), tail, 1))

On Fri, Sep 11, 2009 at 3:45 AM, jo  wrote:

> Thanks for the post-processing ideas. But is there any way to do that
> in one step?
>
> On Thu, Sep 10, 2009 at 7:20 PM, Henrique Dallazuanna 
> wrote:
> >
> > Try this:
> >
> > xy <- merge(x, y, by = c("a","b"),all = TRUE)
> > xy$c <- ifelse(rowSums(!is.na(.x <- xy[, c('c.x', 'c.y')])) > 1, .x[,1],
> rowSums(.x, na.rm = TRUE))
> > xy
> >
> > On Thu, Sep 10, 2009 at 12:21 PM, JiHO  wrote:
>
> JiHO
> ---
> http://maururu.net
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but prefer values in one

2009-09-10 Thread jo

Thanks for the post-processing ideas. But is there any way to do that
in one step?

On Thu, Sep 10, 2009 at 7:20 PM, Henrique Dallazuanna  wrote:
>
> Try this:
>
> xy <- merge(x, y, by = c("a","b"),all = TRUE)
> xy$c <- ifelse(rowSums(!is.na(.x <- xy[, c('c.x', 'c.y')])) > 1, .x[,1], 
> rowSums(.x, na.rm = TRUE))
> xy
>
> On Thu, Sep 10, 2009 at 12:21 PM, JiHO  wrote:

JiHO
---
http://maururu.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but prefer values in one

2009-09-10 Thread Henrique Dallazuanna

Try this:

xy <- merge(x, y, by = c("a","b"),all = TRUE)
xy$c <- ifelse(rowSums(!is.na(.x <- xy[, c('c.x', 'c.y')])) > 1, .x[,1],
rowSums(.x, na.rm = TRUE))
xy

On Thu, Sep 10, 2009 at 12:21 PM, JiHO  wrote:

> Hello everyone,
>
> My problem is better explained with an example:
>
> > x=data.frame(a=1:4,b=1:4,c=rnorm(4))
> > x
>  a b  c
> 1 1 1 -0.8821089
> 2 2 2 -0.7082583
> 3 3 3 -0.5948835
> 4 4 4 -1.8571443
> > y=data.frame(a=c(1,3),b=3,c=rnorm(2))
> > y
>  a bc
> 1 1 3 -0.273155973
> 2 3 3  0.009517862
>
> Now I want to merge x and y by columns a and b, hence creating a data.frame
> with all a:b combinations observed in x and y. That's easily done with
> merge:
>
> > merge(x,y,by=c("a","b"),all=T)
>  a bc.x  c.y
> 1 1 1 -0.8821089   NA
> 2 1 3 NA -0.273155973
> 3 2 2 -0.7082583   NA
> 4 3 3 -0.5948835  0.009517862
> 5 4 4 -1.8571443   NA
>
> But rather than two c columns I would want the merge to:
> - keep the value in x if there is no corresponding value in y
> - keep the value in y if there is no corresponding value in x
> - prefer the value in y when the a:b combination exists in both x and y
>
> So basically I want my result to look like:
>  a b  c
> 1 1 1 -0.8821089
> 2 1 3 -0.2731559
> 3 2 2 -0.7082583
> 4 3 3  0.0095178
> 5 4 4 -1.8571443
>
> I can't find a combinations of options for merge that does this. Is there
> another fonction that would do that or do I have to resort to some
> post-processing after merge? It seems that it might be something like a
> "right merge" for data bases but I don't know this world at all. I would be
> happy to look into sqldf if that allows to do things like that.
>
> Thanks in advance. Sincerely,
>
> JiHO
> ---
> http://maururu.net
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge data frames but prefer values in one

2009-09-10 Thread JiHO


Hello everyone,

My problem is better explained with an example:

> x=data.frame(a=1:4,b=1:4,c=rnorm(4))
> x
 a b  c
1 1 1 -0.8821089
2 2 2 -0.7082583
3 3 3 -0.5948835
4 4 4 -1.8571443
> y=data.frame(a=c(1,3),b=3,c=rnorm(2))
> y
 a bc
1 1 3 -0.273155973
2 3 3  0.009517862

Now I want to merge x and y by columns a and b, hence creating a  
data.frame with all a:b combinations observed in x and y. That's  
easily done with merge:


> merge(x,y,by=c("a","b"),all=T)
 a bc.x  c.y
1 1 1 -0.8821089   NA
2 1 3 NA -0.273155973
3 2 2 -0.7082583   NA
4 3 3 -0.5948835  0.009517862
5 4 4 -1.8571443   NA

But rather than two c columns I would want the merge to:
- keep the value in x if there is no corresponding value in y
- keep the value in y if there is no corresponding value in x
- prefer the value in y when the a:b combination exists in both x and y

So basically I want my result to look like:
 a b  c
1 1 1 -0.8821089
2 1 3 -0.2731559
3 2 2 -0.7082583
4 3 3  0.0095178
5 4 4 -1.8571443

I can't find a combinations of options for merge that does this. Is  
there another fonction that would do that or do I have to resort to  
some post-processing after merge? It seems that it might be something  
like a "right merge" for data bases but I don't know this world at  
all. I would be happy to look into sqldf if that allows to do things  
like that.


Thanks in advance. Sincerely,

JiHO
---
http://maururu.net

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but with a twist.

2009-08-27 Thread Gabor Grothendieck

The inconsistency arose in order to satisfy backward compatibility
while giving chron a direct way to use % codes.

chron used its own format specification so it would have been difficult
to add % codes there; however, as.chron, at the time, did not support a
format specification at all so it was still possible to add a format specifier
using % codes without disrupting existing code.

On Thu, Aug 27, 2009 at 2:21 PM, Stephen Tucker wrote:
> Ah, thanks always -
> I originally thought as.chron() was required to have all fields (m/d/y 
> hh:mm:ss) as for chron() but I see that the former passes its 'format' 
> argument to  as.POSIXct()
> Good deal!
> Stephen
>
>
>
> - Original Message 
> From: Gabor Grothendieck 
> To: Stephen Tucker 
> Cc: Tony Breyal ; r-help@r-project.org
> Sent: Thursday, August 27, 2009 7:27:26 AM
> Subject: Re: [R] Merge data frames but with a twist.
>
> On Thu, Aug 27, 2009 at 9:55 AM, Stephen Tucker wrote:
>> You may want to use the reshape package for this task:
>>
>>> library(reshape)
>>> recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure")
>>       Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM
>> 1   Firefly                   3                  1
>> 2 Red Dwarf                   4                  2
>>
>> If you want to plot time series, you can do something like the following
>>
>>> mydf <- .Last.value ## save the output from above to mydf
>>> library(zoo)
>>> zobj <- zoo(`mode<-`(t(mydf),"numeric"),
>>>             as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p")))
>>> plot(zobj)
>>
>> (zobj is a time series object of the zoo class)
>
> Note that as.chron can take % codes directly so the as.chron portion
> can be shortened to:
>
> as.chron(names(mydf)[-1],"%m/%d/%Y %I:%M %p")
>
>
>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but with a twist.

2009-08-27 Thread Stephen Tucker

Ah, thanks always - 
I originally thought as.chron() was required to have all fields (m/d/y 
hh:mm:ss) as for chron() but I see that the former passes its 'format' argument 
to  as.POSIXct()
Good deal!
Stephen

- Original Message 
From: Gabor Grothendieck 
To: Stephen Tucker 
Cc: Tony Breyal ; r-help@r-project.org
Sent: Thursday, August 27, 2009 7:27:26 AM
Subject: Re: [R] Merge data frames but with a twist.

On Thu, Aug 27, 2009 at 9:55 AM, Stephen Tucker wrote:
> You may want to use the reshape package for this task:
>
>> library(reshape)
>> recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure")
>   Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM
> 1   Firefly   3  1
> 2 Red Dwarf   4  2
>
> If you want to plot time series, you can do something like the following
>
>> mydf <- .Last.value ## save the output from above to mydf
>> library(zoo)
>> zobj <- zoo(`mode<-`(t(mydf),"numeric"),
>> as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p")))
>> plot(zobj)
>
> (zobj is a time series object of the zoo class)

Note that as.chron can take % codes directly so the as.chron portion
can be shortened to:

as.chron(names(mydf)[-1],"%m/%d/%Y %I:%M %p")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but with a twist.

2009-08-27 Thread Gabor Grothendieck

On Thu, Aug 27, 2009 at 9:55 AM, Stephen Tucker wrote:
> You may want to use the reshape package for this task:
>
>> library(reshape)
>> recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure")
>       Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM
> 1   Firefly                   3                  1
> 2 Red Dwarf                   4                  2
>
> If you want to plot time series, you can do something like the following
>
>> mydf <- .Last.value ## save the output from above to mydf
>> library(zoo)
>> zobj <- zoo(`mode<-`(t(mydf),"numeric"),
>>             as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p")))
>> plot(zobj)
>
> (zobj is a time series object of the zoo class)

Note that as.chron can take % codes directly so the as.chron portion
can be shortened to:

as.chron(names(mydf)[-1],"%m/%d/%Y %I:%M %p")

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but with a twist.

2009-08-27 Thread Stephen Tucker

You may want to use the reshape package for this task:

> library(reshape)
> recast(DF3,Show ~ Datetime, id.var=names(DF3),value="Measure")
   Show 08/26/2009 11:30 AM 08/26/2009 9:30 AM
1   Firefly   3  1
2 Red Dwarf   4  2

If you want to plot time series, you can do something like the following

> mydf <- .Last.value ## save the output from above to mydf
> library(zoo)
> zobj <- zoo(`mode<-`(t(mydf),"numeric"),
> as.chron(strptime(names(mydf)[-1],"%m/%d/%Y %I:%M %p")))
> plot(zobj)

(zobj is a time series object of the zoo class)



- Original Message 
From: Tony Breyal 
To: r-help@r-project.org
Sent: Thursday, August 27, 2009 4:04:30 AM
Subject: [R] Merge data frames but with a twist.

Dear all,

Question: How to merge two data frames such that new column are added
in a particular way?

I'm not actually sure how to best articulate my question to be honest,
so i hope showing you what I want to achieve will communicate my
question better.

Lets say I have two data frames:

> DF1 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=1:2, 
> Datetime=c('08/26/2009 9:30 AM', '08/26/2009 9:30 AM')))

> DF2 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=3:4, 
> Datetime=c('08/26/2009 11:30 AM', '08/26/2009 11:30 AM')))

And then let us merge these:

> DF3 <- merge(DF1, DF2, all=TRUE)

   Show MeasureDatetime
1 Firefly 1  08/26/2009 9:30 AM
2 Firefly 3  08/26/2009 11:30 AM
3 Red Dwarf   2  08/26/2009 9:30 AM
4 Red Dwarf   4  08/26/2009 11:30 AM


What i would like to do is merge the data frames such that i end up
with the following:

Show   08/26/2009 9:30 AM08/26/2009 11:30 AM
Firefly  13
Red Dwarf24

my reason for doing this is so that i can plot a time series somehow.

I hope the formating stays when i post this message and that what i'm
trying to do is easy to understand. Thank you kindly for any help in
advance.

Tony

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frames but with a twist.

2009-08-27 Thread Henrique Dallazuanna

Try this:

xtabs(as.numeric(Measure) ~ Show + Datetime, data = DF3)


On Thu, Aug 27, 2009 at 8:04 AM, Tony Breyal wrote:

> Dear all,
>
> Question: How to merge two data frames such that new column are added
> in a particular way?
>
> I'm not actually sure how to best articulate my question to be honest,
> so i hope showing you what I want to achieve will communicate my
> question better.
>
> Lets say I have two data frames:
>
> > DF1 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=1:2,
> Datetime=c('08/26/2009 9:30 AM', '08/26/2009 9:30 AM')))
>
> > DF2 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=3:4,
> Datetime=c('08/26/2009 11:30 AM', '08/26/2009 11:30 AM')))
>
> And then let us merge these:
>
> > DF3 <- merge(DF1, DF2, all=TRUE)
>
>   Show MeasureDatetime
> 1 Firefly 1  08/26/2009 9:30 AM
> 2 Firefly 3  08/26/2009 11:30 AM
> 3 Red Dwarf   2  08/26/2009 9:30 AM
> 4 Red Dwarf   4  08/26/2009 11:30 AM
>
>
> What i would like to do is merge the data frames such that i end up
> with the following:
>
> Show   08/26/2009 9:30 AM08/26/2009 11:30 AM
> Firefly  13
> Red Dwarf24
>
> my reason for doing this is so that i can plot a time series somehow.
>
> I hope the formating stays when i post this message and that what i'm
> trying to do is easy to understand. Thank you kindly for any help in
> advance.
>
> Tony
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge data frames but with a twist.

2009-08-27 Thread Tony Breyal

Dear all,

Question: How to merge two data frames such that new column are added
in a particular way?

I'm not actually sure how to best articulate my question to be honest,
so i hope showing you what I want to achieve will communicate my
question better.

Lets say I have two data frames:

> DF1 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=1:2, 
> Datetime=c('08/26/2009 9:30 AM', '08/26/2009 9:30 AM')))

> DF2 <- data.frame(cbind(Show=c('Firefly', 'Red Dwarf'), Measure=3:4, 
> Datetime=c('08/26/2009 11:30 AM', '08/26/2009 11:30 AM')))

And then let us merge these:

> DF3 <- merge(DF1, DF2, all=TRUE)

   Show MeasureDatetime
1 Firefly 1  08/26/2009 9:30 AM
2 Firefly 3  08/26/2009 11:30 AM
3 Red Dwarf   2  08/26/2009 9:30 AM
4 Red Dwarf   4  08/26/2009 11:30 AM


What i would like to do is merge the data frames such that i end up
with the following:

Show   08/26/2009 9:30 AM08/26/2009 11:30 AM
Firefly  13
Red Dwarf24

my reason for doing this is so that i can plot a time series somehow.

I hope the formating stays when i post this message and that what i'm
trying to do is easy to understand. Thank you kindly for any help in
advance.

Tony

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frame and keep unmatched

2009-07-13 Thread Matthew Dowle

Or if you need it to be fast,  try data.table.   X[Y] is a join when X and Y 
are both data.tables. X[Y] is a left join, Y[X] is a right join. 'nomatch' 
controls the inner/outer join i.e. what happens for unmatched rows.   This 
is much faster than merge().

"Gabor Grothendieck"  wrote in message 
news:971536df0906100704q433f5f99ld3f9c23e69d95...@mail.gmail.com...
Try:

merge(completedf, partdf, all.x = TRUE)

or

library(sqldf) # see http://sqldf.googlecode.com
sqldf("select * from completedf left join partdf using(beta, alpha)")

On Wed, Jun 10, 2009 at 9:56 AM, Etienne B. Racine 
wrote:
>
> Hi,
>
> With two data sets, one complete and another one partial, I would like to
> merge them and keep the unmatched lines. The problem is that merge() 
> dosen't
> keep the unmatched lines. Is there another function that I could use to
> merge the data frames.
>
> Example:
>
> completedf <- expand.grid(alpha=letters[1:3],beta=1:3)
> partdf <- data.frame(
> alpha= c('a','a','c'),
> beta = c(1,3,2),
> val = c(2,6,4))
>
> mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta'))
> # it only kept the common rows
> nrow(mergedf)
>
> Thanks,
> Etienne
> --
> View this message in context: 
> http://www.nabble.com/Merge-data-frame-and-keep-unmatched-tp23962874p23962874.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frame and keep unmatched

2009-06-10 Thread Gabor Grothendieck

Try:

merge(completedf, partdf, all.x = TRUE)

or

library(sqldf) # see http://sqldf.googlecode.com
sqldf("select * from completedf left join partdf using(beta, alpha)")


On Wed, Jun 10, 2009 at 9:56 AM, Etienne B. Racine wrote:
>
> Hi,
>
> With two data sets, one complete and another one partial, I would like to
> merge them and keep the unmatched lines. The problem is that merge() dosen't
> keep the unmatched lines. Is there another function that I could use to
> merge the data frames.
>
> Example:
>
> completedf <- expand.grid(alpha=letters[1:3],beta=1:3)
> partdf <- data.frame(
>        alpha= c('a','a','c'),
>        beta = c(1,3,2),
>        val = c(2,6,4))
>
> mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta'))
> # it only kept the common rows
> nrow(mergedf)
>
> Thanks,
> Etienne
> --
> View this message in context: 
> http://www.nabble.com/Merge-data-frame-and-keep-unmatched-tp23962874p23962874.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge data frame and keep unmatched

2009-06-10 Thread Marc Schwartz



On Jun 10, 2009, at 8:56 AM, Etienne B. Racine wrote:



Hi,

With two data sets, one complete and another one partial, I would  
like to
merge them and keep the unmatched lines. The problem is that merge()  
dosen't
keep the unmatched lines. Is there another function that I could use  
to

merge the data frames.

Example:

completedf <- expand.grid(alpha=letters[1:3],beta=1:3)
partdf <- data.frame(
alpha= c('a','a','c'),
beta = c(1,3,2),
val = c(2,6,4))

mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta'))
# it only kept the common rows
nrow(mergedf)

Thanks,
Etienne




Is this what you want?

> merge(x=completedf, y=partdf, by=c('alpha','beta'), all = TRUE)
  alpha beta val
1 a1   2
2 a2  NA
3 a3   6
4 b1  NA
5 b2  NA
6 b3  NA
7 c1  NA
8 c2   4
9 c3  NA

Note the 'all', 'all.x' and 'all.y' arguments...

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge data frame and keep unmatched

2009-06-10 Thread Etienne B. Racine


Hi,

With two data sets, one complete and another one partial, I would like to
merge them and keep the unmatched lines. The problem is that merge() dosen't
keep the unmatched lines. Is there another function that I could use to
merge the data frames.

Example:

completedf <- expand.grid(alpha=letters[1:3],beta=1:3)
partdf <- data.frame(
alpha= c('a','a','c'),
beta = c(1,3,2),
val = c(2,6,4))

mergedf <- merge(x=completedf, y=partdf, by=c('alpha','beta'))
# it only kept the common rows
nrow(mergedf)

Thanks, 
Etienne
-- 
View this message in context: 
http://www.nabble.com/Merge-data-frame-and-keep-unmatched-tp23962874p23962874.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data frames with same column names of differe nt lengths and missing values

2009-03-08 Thread Dieter Menne

Steven Lubitz  yahoo.com> writes:

> Thank you - this is very helpful. However I realized that with my real data
sets (not the example I have here),
> I also have different numbers of columns in each data frame. rbind doesn't
seem to like this. Here's a
> modified example:
> 
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5),
item3=c(NA,2,NA,4,NA), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
> 
> rbind(x,y)

You should add dummy  variables to each partial data frame
such that they look the same, and do the rbind later.

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data frames with same column names of different lengths and missing values

2009-03-07 Thread Steven Lubitz

Subject: Re: [R] merge data frames with same column names of different lengths 
and missing values
To: "Phil Spector" 
Date: Saturday, March 7, 2009, 5:01 PM

Phil,
Thank you - this is very helpful. However I realized that with my real data 
sets (not the example I have here), I also have different numbers of columns in 
each data frame. rbind doesn't seem to like this. Here's a modified example:

x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), 
item3=c(NA,2,NA,4,NA), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)

rbind(x,y)
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

Any ideas?
Thanks,
Steve


--- On Sat, 3/7/09, Phil Spector  wrote:
From: Phil Spector 
Subject: Re: [R] merge data frames with
 same column names of different lengths and missing values
To: "Steven Lubitz" 
Date: Saturday, March 7, 2009, 1:56 AM

Steven -
  I believe this gives the output that you desire:

> xy = rbind(x,y)
>
aggregate(subset(xy,select=-id),xy['id'],function(x)rev(x[!is.na(x)])[1])
   id item1 item2
1  1NA 1
2  2 2NA
3  3 3 3
4  4 4 4
5  5 5 5
6  6 6NA

  But I think what merge x y; by id; would give you is

> aggregate(subset(xy,select=-id),xy['id'],function(x)x[length(x)])
   id item1 item2
1  1NANA
2  2 2NA
3  3NA 3
4  4 4 4
5  5 5 5
6  6 6NA

- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley

 spec...@stat.berkeley.edu





On Fri, 6 Mar 2009, Steven Lubitz wrote:

>
> Hello, I'm switching over from SAS to R and am having trouble merging
data frames. The data frames have several columns with the same name, and each
has a different number of rows. Some of the values are missing from cells with
the same column names in each data frame. I had hoped that when I merged the
dataframes, every column with the same name would be merged, with the value in a
complete cell overwriting the value in an empty cell from the other data frame.
I cannot seem to achieve this result, though I've tried several merge
adaptations:
>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA),
id=1:6)
>
>
> merge(x,y,by="id") #I lose observations here (n=1 in this
example), and my items are
 duplicated - I do not want this result
>  id item1.x item2.x item1.y item2.y
> 1  1  NA   1  NA  NA
> 2  2  NA  NA   2  NA
> 3  3   3  NA  NA   3
> 4  4   4   4   4   4
> 5  5   5   5   5   5
>
>
> merge(x,y,by=c("id","item1","item2")) #again
I lose observations (n=4 here) and do not want this result
>  id item1 item2
> 1  4 4 4
> 2  5 5 5
>
>
>
merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T)
#my rows are duplicated and the NA values are retained - I instead want one row
per ID
>  id item1 item2
> 1  1NA 1
> 2  1NANA
> 3  2 2NA
> 4  2NANA
> 5  3 3NA
> 6  3NA 3
> 7  4 4 4
> 8  5 5 5
> 9  6 6NA
>
> In reality I
 have multiple data frames with numerous columns, all with
this problem. I can do the merge seamlessly in SAS, but am trying to learn and
stick with R for my analyses. Any help would be greatly appreciated.
>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and
Massachusetts General Hospital
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



  


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data frames with same column names of different lengths and missing values

2009-03-07 Thread Jun Shen

Steve,

I don't know if R has such a function to perform the task you were asking. I
wrote one myself. Try the following to see if it works for you. The new
function "merge.new" has one additional argument col.ID, which is the column
number of ID column. To use your x, y as examples, type:

merge.new(x,y,all=TRUE,col.ID=3)

#

merge.new<-function(...,col.ID){
inter<-merge(...)
inter<-inter[order(inter[col.ID]),] #merged data sorted by ID

#total columns and rows for the target dataframe
total.row<-length(unique(inter[[col.ID]]))
total.col<-dim(inter)[2]
row.ID<-unique(inter[[col.ID]])
target<-matrix(NA,total.row,total.col)
target<-as.data.frame(target)
names(target)<-names(inter)

for (i in 1:total.row){
inter.part<-inter[inter[col.ID]==row.ID[i],] #select all rows with
the same ID
for (j in 1:total.col){
if (is.na(inter.part[1,j])){
if(is.na(inter.part[2,j])) {target[i,j]=NA}
else {target[i,j]=inter.part[2,j]}
}
else {target[i,j]=inter.part[1,j]}

}
}
print(paste("total rows=",total.row))
print(paste("total columns=",total.col))
return(target)
}
#
-- 
Jun Shen PhD
PK/PD Scientist
BioPharma Services
Millipore Corporation
15 Research Park Dr.
St Charles, MO 63304
Direct: 636-720-1589

On Fri, Mar 6, 2009 at 11:02 PM, Steven Lubitz  wrote:

>
> Hello, I'm switching over from SAS to R and am having trouble merging data
> frames. The data frames have several columns with the same name, and each
> has a different number of rows. Some of the values are missing from cells
> with the same column names in each data frame. I had hoped that when I
> merged the dataframes, every column with the same name would be merged, with
> the value in a complete cell overwriting the value in an empty cell from the
> other data frame. I cannot seem to achieve this result, though I've tried
> several merge adaptations:
>
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
>
>
> merge(x,y,by="id") #I lose observations here (n=1 in this example), and my
> items are duplicated - I do not want this result
>  id item1.x item2.x item1.y item2.y
> 1  1  NA   1  NA  NA
> 2  2  NA  NA   2  NA
> 3  3   3  NA  NA   3
> 4  4   4   4   4   4
> 5  5   5   5   5   5
>
>
> merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here)
> and do not want this result
>  id item1 item2
> 1  4 4 4
> 2  5 5 5
>
>
> merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are
> duplicated and the NA values are retained - I instead want one row per ID
>  id item1 item2
> 1  1NA 1
> 2  1NANA
> 3  2 2NA
> 4  2NANA
> 5  3 3NA
> 6  3NA 3
> 7  4 4 4
> 8  5 5 5
> 9  6 6NA
>
> In reality I have multiple data frames with numerous columns, all with this
> problem. I can do the merge seamlessly in SAS, but am trying to learn and
> stick with R for my analyses. Any help would be greatly appreciated.
>
> Steve Lubitz
> Cardiovascular Research Fellow, Brigham and Women's Hospital and
> Massachusetts General Hospital
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data frames with same column names of different lengths and missing values

2009-03-07 Thread Domenico Vistocco


Steven Lubitz wrote:

Hello, I'm switching over from SAS to R and am having trouble merging data 
frames. The data frames have several columns with the same name, and each has a 
different number of rows. Some of the values are missing from cells with the 
same column names in each data frame. I had hoped that when I merged the 
dataframes, every column with the same name would be merged, with the value in 
a complete cell overwriting the value in an empty cell from the other data 
frame. I cannot seem to achieve this result, though I've tried several merge 
adaptations:

x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)


merge(x,y,by="id") #I lose observations here (n=1 in this example), and my 
items are duplicated - I do not want this result
  id item1.x item2.x item1.y item2.y
1  1  NA   1  NA  NA
2  2  NA  NA   2  NA
3  3   3  NA  NA   3
4  4   4   4   4   4
5  5   5   5   5   5


merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and 
do not want this result
  id item1 item2
1  4 4 4
2  5 5 5


merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated 
and the NA values are retained - I instead want one row per ID
  id item1 item2
1  1NA 1
2  1NANA
3  2 2NA
4  2NANA
5  3 3NA
6  3NA 3
7  4 4 4
8  5 5 5
9  6 6NA
  

You should obtain the desired solution using:
merge(y, x, by=c("id","item1","item2"), all=TRUE)

In database terminology all=TRUE corresponds to the full outer join, 
all.x to the left outer join and all.y to the right outer join.


Ciao,
domenico


In reality I have multiple data frames with numerous columns, all with this 
problem. I can do the merge seamlessly in SAS, but am trying to learn and stick 
with R for my analyses. Any help would be greatly appreciated.

Steve Lubitz
Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts 
General Hospital

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge data frames with same column names of differe nt lengths and missing values

2009-03-07 Thread Dieter Menne

Steven Lubitz  yahoo.com> writes:

> 
> x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
> y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)
> 

> merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated
and the NA values are
> retained - I instead want one row per ID
>   id item1 item2
> 1  1NA 1
> 2  1NANA
> 3  2 2NA
> 4  2NANA
> 5  3 3NA
> 6  3NA 3
> 7  4 4 4
> 8  5 5 5
> 9  6 6NA
> 
I think you only got the wrong (too complex) function. Try rbind(x,y)

Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] merge data frames with same column names of different lengths and missing values

2009-03-06 Thread Steven Lubitz


Hello, I'm switching over from SAS to R and am having trouble merging data 
frames. The data frames have several columns with the same name, and each has a 
different number of rows. Some of the values are missing from cells with the 
same column names in each data frame. I had hoped that when I merged the 
dataframes, every column with the same name would be merged, with the value in 
a complete cell overwriting the value in an empty cell from the other data 
frame. I cannot seem to achieve this result, though I've tried several merge 
adaptations:

x <- data.frame(item1=c(NA,NA,3,4,5), item2=c(1,NA,NA,4,5), id=1:5)
y <- data.frame(item1=c(NA,2,NA,4,5,6), item2=c(NA,NA,3,4,5,NA), id=1:6)


merge(x,y,by="id") #I lose observations here (n=1 in this example), and my 
items are duplicated - I do not want this result
  id item1.x item2.x item1.y item2.y
1  1  NA   1  NA  NA
2  2  NA  NA   2  NA
3  3   3  NA  NA   3
4  4   4   4   4   4
5  5   5   5   5   5


merge(x,y,by=c("id","item1","item2")) #again I lose observations (n=4 here) and 
do not want this result
  id item1 item2
1  4 4 4
2  5 5 5


merge(x,y,by=c("id","item1","item2"),all.x=T,all.y=T) #my rows are duplicated 
and the NA values are retained - I instead want one row per ID
  id item1 item2
1  1NA 1
2  1NANA
3  2 2NA
4  2NANA
5  3 3NA
6  3NA 3
7  4 4 4
8  5 5 5
9  6 6NA

In reality I have multiple data frames with numerous columns, all with this 
problem. I can do the merge seamlessly in SAS, but am trying to learn and stick 
with R for my analyses. Any help would be greatly appreciated.

Steve Lubitz
Cardiovascular Research Fellow, Brigham and Women's Hospital and Massachusetts 
General Hospital

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

41 matches

Mail list logo