Re: [R] how to deduplicate records, e.g. using melt() and cast()
Fantastic Jan, Thanks a lot for the example on how i achieve this with melt()/cast(). Very good for my understanding of these functions. Karl On 07/05/12 13:49, Jan van der Laan wrote: using reshape: library(reshape) m - melt(my.df, id.var=pathway, na.rm=T) cast(m, pathway~variable, sum, fill=NA) Jan On 05/07/2012 12:30 PM, Karl Brand wrote: Dimitris, Petra, Thank you! aggregate() is my lesson for today, not melt() | cast() Really appreciate the super fast help, Karl On 07/05/12 12:09, Dimitris Rizopoulos wrote: you could try aggregate(), e.g., my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE) or sum. - function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE) aggregate(my.df[-1], my.df['pathway'], sum.) I hope it helps. Best, Dimitris On 5/7/2012 11:50 AM, Karl Brand wrote: Esteemed UseRs, This must be embarrassingly trivial to achieve with e.g., melt() and cast(): deduplicating records (pw.X in example) for a given set of responses (cond.Y in example). Hopefully the runnable example shows clearly what i have and what i'm trying to convert it to. But i'm just not getting it, ?cast that is! So i'd really appreciate some ones patience to clarify this, using the reshape package, or any other approach. With sincere thanks in advance, Karl ## Runnable example ## The data.frame i have: library(reshape) my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) my.df ## The data fram i want: wanted.df - data.frame(pathway = c(pw.A, pw.B, pw.C), cond.one = c(0.5, 0.4, NA), cond.two = c(0.6, 0.9, 0.2), cond.three = c(NA, 0.1, NA)) wanted.df -- Karl Brand Dept of Cardiology and Dept of Bioinformatics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to deduplicate records, e.g. using melt() and cast()
Esteemed UseRs, This must be embarrassingly trivial to achieve with e.g., melt() and cast(): deduplicating records (pw.X in example) for a given set of responses (cond.Y in example). Hopefully the runnable example shows clearly what i have and what i'm trying to convert it to. But i'm just not getting it, ?cast that is! So i'd really appreciate some ones patience to clarify this, using the reshape package, or any other approach. With sincere thanks in advance, Karl ## Runnable example ## The data.frame i have: library(reshape) my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) my.df ## The data fram i want: wanted.df - data.frame(pathway = c(pw.A, pw.B, pw.C), cond.one = c(0.5, 0.4, NA), cond.two = c(0.6, 0.9, 0.2), cond.three = c(NA, 0.1, NA)) wanted.df -- Karl Brand Dept of Cardiology and Dept of Bioinformatics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to deduplicate records, e.g. using melt() and cast()
you could try aggregate(), e.g., my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE) or sum. - function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE) aggregate(my.df[-1], my.df['pathway'], sum.) I hope it helps. Best, Dimitris On 5/7/2012 11:50 AM, Karl Brand wrote: Esteemed UseRs, This must be embarrassingly trivial to achieve with e.g., melt() and cast(): deduplicating records (pw.X in example) for a given set of responses (cond.Y in example). Hopefully the runnable example shows clearly what i have and what i'm trying to convert it to. But i'm just not getting it, ?cast that is! So i'd really appreciate some ones patience to clarify this, using the reshape package, or any other approach. With sincere thanks in advance, Karl ## Runnable example ## The data.frame i have: library(reshape) my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) my.df ## The data fram i want: wanted.df - data.frame(pathway = c(pw.A, pw.B, pw.C), cond.one = c(0.5, 0.4, NA), cond.two = c(0.6, 0.9, 0.2), cond.three = c(NA, 0.1, NA)) wanted.df -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 Web: http://www.erasmusmc.nl/biostatistiek/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to deduplicate records, e.g. using melt() and cast()
Dimitris, Petra, Thank you! aggregate() is my lesson for today, not melt() | cast() Really appreciate the super fast help, Karl On 07/05/12 12:09, Dimitris Rizopoulos wrote: you could try aggregate(), e.g., my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE) or sum. - function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE) aggregate(my.df[-1], my.df['pathway'], sum.) I hope it helps. Best, Dimitris On 5/7/2012 11:50 AM, Karl Brand wrote: Esteemed UseRs, This must be embarrassingly trivial to achieve with e.g., melt() and cast(): deduplicating records (pw.X in example) for a given set of responses (cond.Y in example). Hopefully the runnable example shows clearly what i have and what i'm trying to convert it to. But i'm just not getting it, ?cast that is! So i'd really appreciate some ones patience to clarify this, using the reshape package, or any other approach. With sincere thanks in advance, Karl ## Runnable example ## The data.frame i have: library(reshape) my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) my.df ## The data fram i want: wanted.df - data.frame(pathway = c(pw.A, pw.B, pw.C), cond.one = c(0.5, 0.4, NA), cond.two = c(0.6, 0.9, 0.2), cond.three = c(NA, 0.1, NA)) wanted.df -- Karl Brand Dept of Cardiology and Dept of Bioinformatics Erasmus MC Dr Molewaterplein 50 3015 GE Rotterdam T +31 (0)10 703 2460 |M +31 (0)642 777 268 |F +31 (0)10 704 4161 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to deduplicate records, e.g. using melt() and cast()
using reshape: library(reshape) m - melt(my.df, id.var=pathway, na.rm=T) cast(m, pathway~variable, sum, fill=NA) Jan On 05/07/2012 12:30 PM, Karl Brand wrote: Dimitris, Petra, Thank you! aggregate() is my lesson for today, not melt() | cast() Really appreciate the super fast help, Karl On 07/05/12 12:09, Dimitris Rizopoulos wrote: you could try aggregate(), e.g., my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) aggregate(my.df[-1], my.df['pathway'], sum, na.rm = TRUE) or sum. - function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE) aggregate(my.df[-1], my.df['pathway'], sum.) I hope it helps. Best, Dimitris On 5/7/2012 11:50 AM, Karl Brand wrote: Esteemed UseRs, This must be embarrassingly trivial to achieve with e.g., melt() and cast(): deduplicating records (pw.X in example) for a given set of responses (cond.Y in example). Hopefully the runnable example shows clearly what i have and what i'm trying to convert it to. But i'm just not getting it, ?cast that is! So i'd really appreciate some ones patience to clarify this, using the reshape package, or any other approach. With sincere thanks in advance, Karl ## Runnable example ## The data.frame i have: library(reshape) my.df - data.frame(pathway = c(rep(pw.A, 2), rep(pw.B, 3), rep(pw.C, 1)), cond.one = c(0.5, NA, 0.4, NA, NA, NA), cond.two = c(NA, 0.6, NA, 0.9, NA, 0.2), cond.three = c(NA, NA, NA, NA, 0.1, NA)) my.df ## The data fram i want: wanted.df - data.frame(pathway = c(pw.A, pw.B, pw.C), cond.one = c(0.5, 0.4, NA), cond.two = c(0.6, 0.9, 0.2), cond.three = c(NA, 0.1, NA)) wanted.df __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.