Re: [R] Data transformation problem
Thank you so much for this elegant solution, Jeff. Philip On 2020-11-12 02:20, Jeff Newmiller wrote: I am not a data.table afficiando, but here is how I would do it with dplyr/tidyr: library(dplyr) library(tidyr) do_per_REL <- function( DF ) { rng <- range( DF$REF1 ) # watch out for missing months? DF <- ( data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" ) ) %>% left_join( DF, by = "REF1" ) %>% arrange( REF1 ) ) with( DF , data.frame( REF2 = REF1[ -1 ] , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ] ) ) } df2a <- ( df1 %>% mutate( REF1 = as.Date( REF1 ) , REL1 = as.Date( REL1 ) ) %>% nest( data = -REL1 ) %>% rename( REL2 = REL1 ) %>% rowwise() %>% mutate( data = list( do_per_REL( data ) ) ) %>% ungroup() %>% unnest( cols = "data" ) %>% select( REF2, REL2, VAL2 ) %>% arrange( REF2, desc( REL2 ), VAL2 ) ) df2a On Wed, 11 Nov 2020, p...@philipsmith.ca wrote: I am stuck on a data transformation problem. I have a data frame, df1 in my example, with some original "levels" data. The data pertain to some variable, such as GDP, in various reference periods, REF, as estimated and released in various release periods, REL. The release periods follow after the reference periods by two months or more, sometimes by several years. I want to build a second data frame, called df2 in my example, with the month-to-month growth rates that existed in each reference period, revealing the revisions to those growth rates in subsequent periods. REF1 <- c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01", "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01", "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01") REL1 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01", "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01", "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01") VAL1 <- c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500, 19900,18765,13467) df1 <- data.frame(REF1,REL1,VAL1) REF2 <- c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01", "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01") REL2 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01", "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01") VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7) df2 <- data.frame(REF2,REL2,VAL2) In my example I have provided some sample data pertaining to three reference months, 2017-01-01 through 2017-03-01, and five release periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and "2019-05-01". In my actual problem I have millions of REF-REL combinations, so my data frame is quite large. I am using data.table for faster processing, though I am more familiar with the tidyverse. I am providing df2 as the target data frame for my example, so you can see what I am trying to achieve. I have not been able to find an efficient way to do these calculations. I have tried "for" loops with "if" statements, without success so far, and anyway this approach would be too slow, I fear. Suggestions as to how I might proceed would be much appreciated. Philip __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation problem
I am not a data.table afficiando, but here is how I would do it with dplyr/tidyr: library(dplyr) library(tidyr) do_per_REL <- function( DF ) { rng <- range( DF$REF1 ) # watch out for missing months? DF <- ( data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" ) ) %>% left_join( DF, by = "REF1" ) %>% arrange( REF1 ) ) with( DF , data.frame( REF2 = REF1[ -1 ] , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ] ) ) } df2a <- ( df1 %>% mutate( REF1 = as.Date( REF1 ) , REL1 = as.Date( REL1 ) ) %>% nest( data = -REL1 ) %>% rename( REL2 = REL1 ) %>% rowwise() %>% mutate( data = list( do_per_REL( data ) ) ) %>% ungroup() %>% unnest( cols = "data" ) %>% select( REF2, REL2, VAL2 ) %>% arrange( REF2, desc( REL2 ), VAL2 ) ) df2a On Wed, 11 Nov 2020, p...@philipsmith.ca wrote: I am stuck on a data transformation problem. I have a data frame, df1 in my example, with some original "levels" data. The data pertain to some variable, such as GDP, in various reference periods, REF, as estimated and released in various release periods, REL. The release periods follow after the reference periods by two months or more, sometimes by several years. I want to build a second data frame, called df2 in my example, with the month-to-month growth rates that existed in each reference period, revealing the revisions to those growth rates in subsequent periods. REF1 <- c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01", "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01", "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01") REL1 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01", "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01", "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01") VAL1 <- c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500, 19900,18765,13467) df1 <- data.frame(REF1,REL1,VAL1) REF2 <- c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01", "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01") REL2 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01", "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01") VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7) df2 <- data.frame(REF2,REL2,VAL2) In my example I have provided some sample data pertaining to three reference months, 2017-01-01 through 2017-03-01, and five release periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and "2019-05-01". In my actual problem I have millions of REF-REL combinations, so my data frame is quite large. I am using data.table for faster processing, though I am more familiar with the tidyverse. I am providing df2 as the target data frame for my example, so you can see what I am trying to achieve. I have not been able to find an efficient way to do these calculations. I have tried "for" loops with "if" statements, without success so far, and anyway this approach would be too slow, I fear. Suggestions as to how I might proceed would be much appreciated. Philip __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- Jeff NewmillerThe . . Go Live... DCN:Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
There is no "perhaps" about it. Nonsense phrases like "similar to logit, where I dont [sic] lose normality of the data" that lead into off-topic discussions of why one introduces transformations in the first place are perfect examples of why questions like this belong on a statistical theory discussion forum like StackExchange rather than here where the topic is the R language. On January 20, 2019 6:02:15 AM PST, Adrian Johnson wrote: >Dear group, >My question, perhaps is more of a statistical question using R >I have a data matrix ( 400 x 400 normally distributed) with data >points ranging from -1 to +1.. >For certain clustering algorithms, I suspect the tight data range is >not helping resolving the clusters. > >Is there a way to transform the data something similar to logit, where >I dont lose normality of the data and yet I can better expand the data >ranges. > >Thanks >Adrian > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
this might work for you newy <- sign(oldy)*f(abs(oldy)) where f() is a monotonic transformation, perhaps a power function. On Sun, Jan 20, 2019 at 11:08 AM Adrian Johnson wrote: > > I apologize, I forgot to mention another key operation. > in my matrix -1 to <0 has a different meaning while values between >0 > to 1 has a different set of meaning. So If I do logit transformation > some of the positives becomes negative (values < 0.5 etc.). In such > case, the resulting transformed matrix is incorrect. > > I want to transform numbers ranging from -1 to <0 and numbers > between >0 and 1 independently. > > Thanks > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
I don't think you have given us enough information. For example, is the 500x500 matrix a distance matrix or does it represent 500 columns of information about 500 rows of observations? If a distance matrix, how is distance being measured? You clarification suggests it may be a distance matrix of correlation coefficients? If distance has different meanings between -1 and 0 and 0 and +1, getting interpretable results from cluster analysis will be difficult, but it is not clear what you mean by that. - David L. Carlson Department of Anthropology Texas A&M University -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Adrian Johnson Sent: Sunday, January 20, 2019 8:02 AM To: r-help Subject: [R] data transformation Dear group, My question, perhaps is more of a statistical question using R I have a data matrix ( 400 x 400 normally distributed) with data points ranging from -1 to +1.. For certain clustering algorithms, I suspect the tight data range is not helping resolving the clusters. Is there a way to transform the data something similar to logit, where I dont lose normality of the data and yet I can better expand the data ranges. Thanks Adrian __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Adrian Johnson Sent: Sunday, January 20, 2019 10:08 AM To: r-help Subject: Re: [R] data transformation I apologize, I forgot to mention another key operation. in my matrix -1 to <0 has a different meaning while values between >0 to 1 has a different set of meaning. So If I do logit transformation some of the positives becomes negative (values < 0.5 etc.). In such case, the resulting transformed matrix is incorrect. I want to transform numbers ranging from -1 to <0 and numbers between >0 and 1 independently. Thanks __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
I apologize, I forgot to mention another key operation. in my matrix -1 to <0 has a different meaning while values between >0 to 1 has a different set of meaning. So If I do logit transformation some of the positives becomes negative (values < 0.5 etc.). In such case, the resulting transformed matrix is incorrect. I want to transform numbers ranging from -1 to <0 and numbers between >0 and 1 independently. Thanks __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation to list for event occurence
Or, f3 <- function (dat1) { i <- dat1$Event_Occurence == 1 split(dat1$Week[i], dat1$ID[i]) } in addition to the previously mentioned f1 <- function(dat1) { with(dat1,tapply(as.logical(Event_Occurence),ID,FUN=which )) } f2 <- function(dat1){ lapply(split(dat1,dat1$ID),function(x) which(!!x[,3])) } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of arun > Sent: Tuesday, November 12, 2013 2:13 PM > To: R help > Subject: Re: [R] Data transformation to list for event occurence > > > > Hi Anindya, > > You may try: > dat1 <- read.table(text="ID Week Event_Occurence > A 1 0 > A 2 0 > A 3 1 > A 4 0 > B 1 1 > B 2 0 > B 3 0 > B 4 1",sep="",header=TRUE,stringsAsFactors=FALSE) > > with(dat1,tapply(as.logical(Event_Occurence),ID,FUN=which )) > #or > lapply(split(dat1,dat1$ID),function(x) which(!!x[,3])) > A.K. > > > > > > On Tuesday, November 12, 2013 4:58 PM, Anindya Sankar Dey > > wrote: > Hi, > > Say I have a following data > > ID Week Event_Occurence > A 1 0 > A 2 0 > A 3 1 > A 4 0 > B 1 1 > B 2 0 > B 3 0 > B 4 1 > > that whether an individual experienced an event in a particular week. > > I wish to create list such as the first element of the list will be a > vector listing the week number when the event has occurred for A, followed > by that of B. > > Can you help creating this? > > -- > Anindya Sankar Dey > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation to list for event occurence
Hi Anindya, You may try: dat1 <- read.table(text="ID Week Event_Occurence A 1 0 A 2 0 A 3 1 A 4 0 B 1 1 B 2 0 B 3 0 B 4 1",sep="",header=TRUE,stringsAsFactors=FALSE) with(dat1,tapply(as.logical(Event_Occurence),ID,FUN=which )) #or lapply(split(dat1,dat1$ID),function(x) which(!!x[,3])) A.K. On Tuesday, November 12, 2013 4:58 PM, Anindya Sankar Dey wrote: Hi, Say I have a following data ID Week Event_Occurence A 1 0 A 2 0 A 3 1 A 4 0 B 1 1 B 2 0 B 3 0 B 4 1 that whether an individual experienced an event in a particular week. I wish to create list such as the first element of the list will be a vector listing the week number when the event has occurred for A, followed by that of B. Can you help creating this? -- Anindya Sankar Dey [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation & cleaning
On 09/28/2011 01:13 PM, pip56789 wrote: Hi, I have a few methodological and implementation questions for ya'll. Thank you in advance for your help. I have a dataset that reflects people's preference choices. I want to see if there's any kind of clustering effect among certain preference choices (e.g. do people who pick choice A also pick choice D). I have a data set that has one record per user ID, per preference choice. It's a "long" form of a data set that looks like this: ID | Page 123 | Choice A 123 | Choice B 456 | Choice A 456 | Choice B ... I thought that I should do the following 1. Make the data set "wide", counting the observations so the data looks like this: ID | Count of Preference A | Count of Preference B 123 | 1 | 1 ... Using table1<- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) 2. Create a correlation matrix of preferences cor(table2[,-1]) How would I restrict my correlation to show preferences that met a minimum sample threshold? Can you confirm if the two following commands do the same thing? What would I do from here (or am I taking the wrong approach) table1<- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) table2<- with(data, table(Page,Page)) Hi Peter, An easy way to visualize set intersections is the intersectDiagram function in the plotrix package. This will display the counts or percentages of each type of intersection. Your data could be passed like this: choices<-data.frame(IDs=sample(1:20,50,TRUE), sample(LETTERS[1:4],50,TRUE)) library(plotrix) intersectDiagram(choices) This example is a bit messy, as it will generate quite a few repeated choices that will be ignored by intersectDiagram, but it should give you the idea. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation & cleaning
Seems your questions belong to rule mining for frequent item sets. check arules package Weidong Gu On Tue, Sep 27, 2011 at 11:13 PM, pip56789 wrote: > Hi, > > I have a few methodological and implementation questions for ya'll. Thank > you in advance for your help. I have a dataset that reflects people's > preference choices. I want to see if there's any kind of clustering effect > among certain preference choices (e.g. do people who pick choice A also pick > choice D). > > I have a data set that has one record per user ID, per preference choice. > It's a "long" form of a data set that looks like this: > > ID | Page > 123 | Choice A > 123 | Choice B > 456 | Choice A > 456 | Choice B > ... > > I thought that I should do the following > > 1. Make the data set "wide", counting the observations so the data looks > like this: > ID | Count of Preference A | Count of Preference B > 123 | 1 | 1 > ... > > Using > table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) > > 2. Create a correlation matrix of preferences > cor(table2[,-1]) > > How would I restrict my correlation to show preferences that met a minimum > sample threshold? Can you confirm if the two following commands do the same > thing? What would I do from here (or am I taking the wrong approach) > table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) > table2 <- with(data, table(Page,Page)) > > > many thanks, > Peter > > -- > View this message in context: > http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation & cleaning
On a methodological level, if the choices do not correspond on a cardinal or at least ordinal scale, you don't want to use correlations. Instead you should probably use Cramer's V, in particular if the choices are multinomial. Whether the wide format is necessary will depend on the format the function you are using expects. HTH, Daniel pde3p wrote: > > Hi, > > I have a few methodological and implementation questions for ya'll. Thank > you in advance for your help. I have a dataset that reflects people's > preference choices. I want to see if there's any kind of clustering effect > among certain preference choices (e.g. do people who pick choice A also > pick choice D). > > I have a data set that has one record per user ID, per preference choice. > It's a "long" form of a data set that looks like this: > > ID | Page > 123 | Choice A > 123 | Choice B > 456 | Choice A > 456 | Choice B > ... > > I thought that I should do the following > > 1. Make the data set "wide", counting the observations so the data looks > like this: > ID | Count of Preference A | Count of Preference B > 123 | 1 | 1 > ... > > Using > table1 <- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' ) > > 2. Create a correlation matrix of preferences > cor(table2[,-1]) > > How would I restrict my correlation to show preferences that met a minimum > sample threshold? Can you confirm if the two following commands do the > same thing? What would I do from here (or am I taking the wrong approach) > table1 <- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' ) > table2 <- with(data, table(Page,Page)) > > > many thanks, > Peter > -- View this message in context: http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3850076.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation ----Box-Cox Transformations
Dear Stuart, See ?bcPower and ?powerTransform in the car package, the latter for univariate and multivariate conditional and unconditional ML Box-Cox. I hope this helps, John John Fox Senator William McMaster Professor of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Stuart > Sent: May-03-11 11:37 AM > To: r-help@r-project.org > Subject: [R] data transformation Box-Cox Transformations > > Hi > > Could any one please help how I can trnasform data based on Box-Cox > Transformations. I have massive data set with many variables. If > possible someone can write few lines so I can read in all data set once > and transform it. > > > g1g2 g2 > 97.03703704 89.25925926 4.4 > 24.90740741 69.25925926 35.5556 > 62. 85.18518519 36.85185185 > 18.51851852 84.25925926 21.6667 > 93.703703795.92592593 54.07407407 > 26.6667 23. 99.25925926 > 63. 97.03703704 27.40740741 > 95.74074074 3.6 59.25925926 > 46.6667 49. 39.1667 > 21.85185185 2.592592593 63.14814815 > 94.7222 17.7778 81. > > > any help will be much appreciated > > Cheers > Sbroad > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation ----Box-Cox Transformations
There is the bct function in the TeachingDemos package that does Box-Cox transforms (though you could also write your own fairly simply). The lappy/sapply functions will apply a function to each column of a data frame. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of Stuart > Sent: Tuesday, May 03, 2011 9:37 AM > To: r-help@r-project.org > Subject: [R] data transformation Box-Cox Transformations > > Hi > > Could any one please help how I can trnasform data based on Box-Cox > Transformations. I have massive data set with many variables. If > possible someone can write few lines so I can read in all data set > once and transform it. > > > g1g2 g2 > 97.03703704 89.25925926 4.4 > 24.90740741 69.25925926 35.5556 > 62. 85.18518519 36.85185185 > 18.51851852 84.25925926 21.6667 > 93.703703795.92592593 54.07407407 > 26.6667 23. 99.25925926 > 63. 97.03703704 27.40740741 > 95.74074074 3.6 59.25925926 > 46.6667 49. 39.1667 > 21.85185185 2.592592593 63.14814815 > 94.7222 17.7778 81. > > > any help will be much appreciated > > Cheers > Sbroad > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Try this: > t(apply(x, 1, function(r) table(factor(r, levels = seq_len(max(x)) 1 2 3 4 5 6 7 8 9 10 [1,] 1 0 1 0 0 0 0 0 0 0 [2,] 0 2 0 0 0 0 0 0 0 0 [3,] 0 0 0 1 0 0 1 0 0 0 [4,] 0 0 0 0 0 1 0 1 0 0 [5,] 0 0 0 0 1 0 0 0 0 1 If you use aaply in the plyr package instead of apply then you can omit the transpose. On Mon, Jan 25, 2010 at 5:39 PM, Lisa wrote: > > Dear all, > > I have a dataset that looks like this: > > x <- read.table(textConnection("col1 col2 > 3 1 > 2 2 > 4 7 > 8 6 > 5 10"), header=TRUE) > > I want to rewrite it as below: > > var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 > 1 0 1 0 0 0 0 0 0 0 > 0 2 0 0 0 0 0 0 0 0 > 0 0 0 1 0 0 1 0 0 0 > 0 0 0 0 0 1 0 1 0 0 > 0 0 0 0 1 0 0 0 0 1 > > Can anybody please help how to get this done? Your help would be greatly > appreciated. > > Lisa > > -- > View this message in context: > http://n4.nabble.com/Data-transformation-tp1289899p1289899.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
r-help-boun...@r-project.org wrote on 01/25/2010 02:39:32 PM: > x <- read.table(textConnection("col1 col2 > 3 1 > 2 2 > 4 7 > 8 6 > 5 10"), header=TRUE) > > I want to rewrite it as below: > > var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 > 1 0 1 0 0 0 0 0 0 0 > 0 2 0 0 0 0 0 0 0 0 > 0 0 0 1 0 0 1 0 0 0 > 0 0 0 0 0 1 0 1 0 0 > 0 0 0 0 1 0 0 0 0 1 > > Can anybody please help how to get this done? Your help would be greatly > appreciated. Thanks, I've not seen textConnection() before. The table() function will get you close: table(c(rownames(x),rownames(x)), c(x$col1,x$col2)) cur -- Curt Seeliger, Data Ranger Raytheon Information Services - Contractor to ORD seeliger.c...@epa.gov 541/754-4638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Thank you so much. Lisa -- View this message in context: http://n4.nabble.com/Data-transformation-tp1289899p1289915.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Hi, On Mon, Jan 25, 2010 at 5:39 PM, Lisa wrote: > > Dear all, > > I have a dataset that looks like this: > > x <- read.table(textConnection("col1 col2 > 3 1 > 2 2 > 4 7 > 8 6 > 5 10"), header=TRUE) > > I want to rewrite it as below: > > var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 > 1 0 1 0 0 0 0 0 0 0 > 0 2 0 0 0 0 0 0 0 0 > 0 0 0 1 0 0 1 0 0 0 > 0 0 0 0 0 1 0 1 0 0 > 0 0 0 0 1 0 0 0 0 1 > > Can anybody please help how to get this done? Your help would be greatly > appreciated. I was trying to do it w/o for loops, but I can't figure out a way to do so: R> bounds <- range(x) R> m <- matrix(0, nrow=nrow(x), ncol=bounds[2]) R> colnames(m) <- paste('var', seq(bounds[2]), sep="") ## Ugly nested for-loop one-liner below R> for (i in 1:nrow(x))for (j in 1:ncol(x)) m[i,x[i,j]] <- m[i,x[i,j]] + 1 R> m var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 [1,]101000000 0 [2,]020000000 0 [3,]000100100 0 [4,]000001010 0 [5,]000010000 1 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Well, I have no idea how to get from one to the other. There's col1 and col2 but no var1 var2 var3, etc. I thought perhaps col1 was the row index and col2 was the column index, but that doesn't match up either, and not all the cell values are 1. So you will need to explain more clearly what you intend. Meanwhile, you might try reshape, or perhaps crosstab from the ecodist package. Sarah On Mon, Jan 25, 2010 at 5:39 PM, Lisa wrote: > > Dear all, > > I have a dataset that looks like this: > > x <- read.table(textConnection("col1 col2 > 3 1 > 2 2 > 4 7 > 8 6 > 5 10"), header=TRUE) > > I want to rewrite it as below: > > var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 > 1 0 1 0 0 0 0 0 0 0 > 0 2 0 0 0 0 0 0 0 0 > 0 0 0 1 0 0 1 0 0 0 > 0 0 0 0 0 1 0 1 0 0 > 0 0 0 0 1 0 0 0 0 1 > > Can anybody please help how to get this done? Your help would be greatly > appreciated. > > Lisa > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
>> (x.n <- cast(x.m, id ~ var, function(.dat){ > + if (length(.dat) == 0) return(0) # test for no data; return > zero if that is the case > + mean(.dat) > + })) Or fill = 0. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
That's what I want. Many thanks for your help. Legen jholtman wrote: > > Try this: > >> x <- read.table(textConnection("idcode1code2 p > + 148 0.1 > + 157 0.9 > + 218 0.4 > + 262 0.2 > + 243 0.6 > + 356 0.7 > + 375 0.9"), header=TRUE) >> closeAllConnections() >> # create object like output from 'melt' >> x.m <- data.frame(id=c(x$id, x$id), > +var=paste('var', c(x$code1, x$code2), sep=''), > +variable=rep('p', 2*nrow(x)), > +value=c(x$p, x$p)) >> require(reshape) # use the reshape package >> (x.n <- cast(x.m, id ~ var, function(.dat){ > + if (length(.dat) == 0) return(0) # test for no data; return > zero if that is the case > + mean(.dat) > + })) > id var1 var2 var3 var4 var5 var6 var7 var8 > 1 1 0.0 0.0 0.0 0.1 0.9 0.0 0.9 0.1 > 2 2 0.4 0.2 0.6 0.6 0.0 0.2 0.0 0.4 > 3 3 0.0 0.0 0.0 0.0 0.8 0.7 0.9 0.0 >> > > > On Tue, Nov 10, 2009 at 11:10 PM, legen wrote: >> >> Thank you for your kind help. Your script works very well. Would you >> please >> show me how to change NaN to zero and column variables 1, 2, ..., 8 to >> var1, >> var2, ..., var8? Thanks again. >> >> Legen >> >> >> >> jholtman wrote: >>> >>> Is this what you want: >>> x <- read.table(textConnection("id code1 code2 p >>> + 1 4 8 0.1 >>> + 1 5 7 0.9 >>> + 2 1 8 0.4 >>> + 2 6 2 0.2 >>> + 2 4 3 0.6 >>> + 3 5 6 0.7 >>> + 3 7 5 0.9"), header=TRUE) closeAllConnections() # create object like output from 'melt' x.m <- data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2), >>> + variable=rep('p', 2*nrow(x)), value=c(x$p, x$p)) require(reshape) # use the reshape package cast(x.m, id ~ var, mean) >>> id 1 2 3 4 5 6 7 8 >>> 1 1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1 >>> 2 2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4 >>> 3 3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN >>> >>> >>> >>> On Tue, Nov 10, 2009 at 4:30 PM, legen wrote: Dear all, I have a dataset as below: id code1 code2 p 1 4 8 0.1 1 5 7 0.9 2 1 8 0.4 2 6 2 0.2 2 4 3 0.6 3 5 6 0.7 3 7 5 0.9 I just want to rewrite it as this (vertical to horizontal): id var1 var2 var3 var4 var5 var6 var7 var8 1 0 0 0 0.1 0.9 0 0.9 0.1 2 0.4 0.2 0.6 0.6 0 0.2 0 0.4 3 0 0 0 0 0.8 0.7 0.9 0 For the third subject, there are two values being equal to 5 in code1 and code2, but different values in p: 0.7 and 0.9, so I assigned their average 0.8 in var5. Does anybody can help me to handle this? Many thanks for your consideration and time. Legen -- View this message in context: http://old.nabble.com/Data-transformation-tp26291568p26291568.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> Jim Holtman >>> Cincinnati, OH >>> +1 513 646 9390 >>> >>> What is the problem that you are trying to solve? >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Data-transformation-tp26291568p26295766.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/lis
Re: [R] Data transformation
Your script works very well. Thank you very much. Legen Henrique Dallazuanna wrote: > > Try this also: > > xtabs(rep(p, 2) ~ rep(id, 2) + sprintf("var%d", c(code1, code2)), data = > x) > > On Wed, Nov 11, 2009 at 2:10 AM, legen wrote: >> >> Thank you for your kind help. Your script works very well. Would you >> please >> show me how to change NaN to zero and column variables 1, 2, ..., 8 to >> var1, >> var2, ..., var8? Thanks again. >> >> Legen >> >> >> >> jholtman wrote: >>> >>> Is this what you want: >>> x <- read.table(textConnection("id code1 code2 p >>> + 1 4 8 0.1 >>> + 1 5 7 0.9 >>> + 2 1 8 0.4 >>> + 2 6 2 0.2 >>> + 2 4 3 0.6 >>> + 3 5 6 0.7 >>> + 3 7 5 0.9"), header=TRUE) closeAllConnections() # create object like output from 'melt' x.m <- data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2), >>> + variable=rep('p', 2*nrow(x)), value=c(x$p, x$p)) require(reshape) # use the reshape package cast(x.m, id ~ var, mean) >>> id 1 2 3 4 5 6 7 8 >>> 1 1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1 >>> 2 2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4 >>> 3 3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN >>> >>> >>> >>> On Tue, Nov 10, 2009 at 4:30 PM, legen wrote: Dear all, I have a dataset as below: id code1 code2 p 1 4 8 0.1 1 5 7 0.9 2 1 8 0.4 2 6 2 0.2 2 4 3 0.6 3 5 6 0.7 3 7 5 0.9 I just want to rewrite it as this (vertical to horizontal): id var1 var2 var3 var4 var5 var6 var7 var8 1 0 0 0 0.1 0.9 0 0.9 0.1 2 0.4 0.2 0.6 0.6 0 0.2 0 0.4 3 0 0 0 0 0.8 0.7 0.9 0 For the third subject, there are two values being equal to 5 in code1 and code2, but different values in p: 0.7 and 0.9, so I assigned their average 0.8 in var5. Does anybody can help me to handle this? Many thanks for your consideration and time. Legen -- View this message in context: http://old.nabble.com/Data-transformation-tp26291568p26291568.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> Jim Holtman >>> Cincinnati, OH >>> +1 513 646 9390 >>> >>> What is the problem that you are trying to solve? >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Data-transformation-tp26291568p26295766.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/Data-transformation-tp26291568p26301029.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Try this: > x <- read.table(textConnection("idcode1code2 p + 148 0.1 + 157 0.9 + 218 0.4 + 262 0.2 + 243 0.6 + 356 0.7 + 375 0.9"), header=TRUE) > closeAllConnections() > # create object like output from 'melt' > x.m <- data.frame(id=c(x$id, x$id), +var=paste('var', c(x$code1, x$code2), sep=''), +variable=rep('p', 2*nrow(x)), +value=c(x$p, x$p)) > require(reshape) # use the reshape package > (x.n <- cast(x.m, id ~ var, function(.dat){ + if (length(.dat) == 0) return(0) # test for no data; return zero if that is the case + mean(.dat) + })) id var1 var2 var3 var4 var5 var6 var7 var8 1 1 0.0 0.0 0.0 0.1 0.9 0.0 0.9 0.1 2 2 0.4 0.2 0.6 0.6 0.0 0.2 0.0 0.4 3 3 0.0 0.0 0.0 0.0 0.8 0.7 0.9 0.0 > On Tue, Nov 10, 2009 at 11:10 PM, legen wrote: > > Thank you for your kind help. Your script works very well. Would you please > show me how to change NaN to zero and column variables 1, 2, ..., 8 to var1, > var2, ..., var8? Thanks again. > > Legen > > > > jholtman wrote: >> >> Is this what you want: >> >>> x <- read.table(textConnection("id code1 code2 p >> + 1 4 8 0.1 >> + 1 5 7 0.9 >> + 2 1 8 0.4 >> + 2 6 2 0.2 >> + 2 4 3 0.6 >> + 3 5 6 0.7 >> + 3 7 5 0.9"), header=TRUE) >>> closeAllConnections() >>> # create object like output from 'melt' >>> x.m <- data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2), >> + variable=rep('p', 2*nrow(x)), value=c(x$p, x$p)) >>> require(reshape) # use the reshape package >>> cast(x.m, id ~ var, mean) >> id 1 2 3 4 5 6 7 8 >> 1 1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1 >> 2 2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4 >> 3 3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN >>> >> >> >> >> On Tue, Nov 10, 2009 at 4:30 PM, legen wrote: >>> >>> Dear all, >>> >>> I have a dataset as below: >>> >>> id code1 code2 p >>> 1 4 8 0.1 >>> 1 5 7 0.9 >>> 2 1 8 0.4 >>> 2 6 2 0.2 >>> 2 4 3 0.6 >>> 3 5 6 0.7 >>> 3 7 5 0.9 >>> >>> I just want to rewrite it as this (vertical to horizontal): >>> >>> id var1 var2 var3 var4 var5 var6 var7 var8 >>> 1 0 0 0 0.1 0.9 0 0.9 0.1 >>> 2 0.4 0.2 0.6 0.6 0 0.2 0 0.4 >>> 3 0 0 0 0 0.8 0.7 0.9 0 >>> >>> For the third subject, there are two values being equal to 5 in code1 and >>> code2, but different values in p: 0.7 and 0.9, so I assigned their >>> average >>> 0.8 in var5. >>> >>> Does anybody can help me to handle this? Many thanks for your >>> consideration >>> and time. >>> >>> Legen >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Data-transformation-tp26291568p26291568.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://old.nabble.com/Data-transformation-tp26291568p26295766.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Try this also: xtabs(rep(p, 2) ~ rep(id, 2) + sprintf("var%d", c(code1, code2)), data = x) On Wed, Nov 11, 2009 at 2:10 AM, legen wrote: > > Thank you for your kind help. Your script works very well. Would you please > show me how to change NaN to zero and column variables 1, 2, ..., 8 to var1, > var2, ..., var8? Thanks again. > > Legen > > > > jholtman wrote: >> >> Is this what you want: >> >>> x <- read.table(textConnection("id code1 code2 p >> + 1 4 8 0.1 >> + 1 5 7 0.9 >> + 2 1 8 0.4 >> + 2 6 2 0.2 >> + 2 4 3 0.6 >> + 3 5 6 0.7 >> + 3 7 5 0.9"), header=TRUE) >>> closeAllConnections() >>> # create object like output from 'melt' >>> x.m <- data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2), >> + variable=rep('p', 2*nrow(x)), value=c(x$p, x$p)) >>> require(reshape) # use the reshape package >>> cast(x.m, id ~ var, mean) >> id 1 2 3 4 5 6 7 8 >> 1 1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1 >> 2 2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4 >> 3 3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN >>> >> >> >> >> On Tue, Nov 10, 2009 at 4:30 PM, legen wrote: >>> >>> Dear all, >>> >>> I have a dataset as below: >>> >>> id code1 code2 p >>> 1 4 8 0.1 >>> 1 5 7 0.9 >>> 2 1 8 0.4 >>> 2 6 2 0.2 >>> 2 4 3 0.6 >>> 3 5 6 0.7 >>> 3 7 5 0.9 >>> >>> I just want to rewrite it as this (vertical to horizontal): >>> >>> id var1 var2 var3 var4 var5 var6 var7 var8 >>> 1 0 0 0 0.1 0.9 0 0.9 0.1 >>> 2 0.4 0.2 0.6 0.6 0 0.2 0 0.4 >>> 3 0 0 0 0 0.8 0.7 0.9 0 >>> >>> For the third subject, there are two values being equal to 5 in code1 and >>> code2, but different values in p: 0.7 and 0.9, so I assigned their >>> average >>> 0.8 in var5. >>> >>> Does anybody can help me to handle this? Many thanks for your >>> consideration >>> and time. >>> >>> Legen >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Data-transformation-tp26291568p26291568.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://old.nabble.com/Data-transformation-tp26291568p26295766.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Thank you for your kind help. Your script works very well. Would you please show me how to change NaN to zero and column variables 1, 2, ..., 8 to var1, var2, ..., var8? Thanks again. Legen jholtman wrote: > > Is this what you want: > >> x <- read.table(textConnection("idcode1code2 p > + 148 0.1 > + 157 0.9 > + 218 0.4 > + 262 0.2 > + 243 0.6 > + 356 0.7 > + 375 0.9"), header=TRUE) >> closeAllConnections() >> # create object like output from 'melt' >> x.m <- data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2), > + variable=rep('p', 2*nrow(x)), value=c(x$p, x$p)) >> require(reshape) # use the reshape package >> cast(x.m, id ~ var, mean) > id 1 2 3 4 5 6 7 8 > 1 1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1 > 2 2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4 > 3 3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN >> > > > > On Tue, Nov 10, 2009 at 4:30 PM, legen wrote: >> >> Dear all, >> >> I have a dataset as below: >> >> id code1 code2 p >> 1 4 8 0.1 >> 1 5 7 0.9 >> 2 1 8 0.4 >> 2 6 2 0.2 >> 2 4 3 0.6 >> 3 5 6 0.7 >> 3 7 5 0.9 >> >> I just want to rewrite it as this (vertical to horizontal): >> >> id var1 var2 var3 var4 var5 var6 var7 var8 >> 1 0 0 0 0.1 0.9 0 0.9 0.1 >> 2 0.4 0.2 0.6 0.6 0 0.2 0 0.4 >> 3 0 0 0 0 0.8 0.7 0.9 0 >> >> For the third subject, there are two values being equal to 5 in code1 and >> code2, but different values in p: 0.7 and 0.9, so I assigned their >> average >> 0.8 in var5. >> >> Does anybody can help me to handle this? Many thanks for your >> consideration >> and time. >> >> Legen >> >> -- >> View this message in context: >> http://old.nabble.com/Data-transformation-tp26291568p26291568.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/Data-transformation-tp26291568p26295766.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation
Is this what you want: > x <- read.table(textConnection("idcode1code2 p + 148 0.1 + 157 0.9 + 218 0.4 + 262 0.2 + 243 0.6 + 356 0.7 + 375 0.9"), header=TRUE) > closeAllConnections() > # create object like output from 'melt' > x.m <- data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2), + variable=rep('p', 2*nrow(x)), value=c(x$p, x$p)) > require(reshape) # use the reshape package > cast(x.m, id ~ var, mean) id 1 2 3 4 5 6 7 8 1 1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1 2 2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4 3 3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN > On Tue, Nov 10, 2009 at 4:30 PM, legen wrote: > > Dear all, > > I have a dataset as below: > > id code1 code2 p > 1 4 8 0.1 > 1 5 7 0.9 > 2 1 8 0.4 > 2 6 2 0.2 > 2 4 3 0.6 > 3 5 6 0.7 > 3 7 5 0.9 > > I just want to rewrite it as this (vertical to horizontal): > > id var1 var2 var3 var4 var5 var6 var7 var8 > 1 0 0 0 0.1 0.9 0 0.9 0.1 > 2 0.4 0.2 0.6 0.6 0 0.2 0 0.4 > 3 0 0 0 0 0.8 0.7 0.9 0 > > For the third subject, there are two values being equal to 5 in code1 and > code2, but different values in p: 0.7 and 0.9, so I assigned their average > 0.8 in var5. > > Does anybody can help me to handle this? Many thanks for your consideration > and time. > > Legen > > -- > View this message in context: > http://old.nabble.com/Data-transformation-tp26291568p26291568.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation using gamma
Roslina, this code performs what you need: dt = matrix((1:(58*12))/58/12,58) # some numbers # if dt is a data.frame use dt = as.matrix(dt) a = (1:12)/12 # some a coef b = (12:1)/12 # some b coef dtgam = matrix(pgamma(dt,a,b),58) # dtgam is the transformation you're looking for no loop needed no transform function involved cheers, Patrizio 2009/5/7 Roslina Zakaria : > Hi R-users, > > I have this code to uniformise the data using gamma: > >> length(dp1) > [1] 696 >> dim(dp1) > [1] 58 12 >> dim(ahall) > [1] 1 12 >> dim(bhall) > [1] 1 12 > >> trans_dt <- function(dt,a,b) > + { n1 <- ncol(dt) > + n2 <- length(dt) > + trans <- vector(mode='numeric', length=n2) > + dim(trans) <- dim(dt) > + for (i in 1:n1) > + { dt[,i] <- as.vector(dt[,i]) > + trans[,i] <- transform(dti,newdt=pgamma(dti,shape= > a[1,i],scale=b[1,i])) } > + trans > + } > >> trans_dt(dp1,ahall,bhall) > Error in transform(dti, newdt = pgamma(dti, shape = a[1, i], scale = b[1, : > object "dti" not found > > and also try > trans_dt <- function(dt,a,b) > { n1 <- ncol(dt) > n2 <- length(dt) > trans <- vector(mode='numeric', length=n2) > dim(trans) <- dim(dt) > for (i in 1:n1) > { dti <- dt[,i] > ai <- a[1,i] > bi <- b[1,i] > trans[,i] <- transform(dti,newdt=pgamma(dti,shape= ai,scale=bi)) } > trans > } > > trans_dt(dp1,ahall,bhall) > Error in pgamma(dti, shape = ai, scale = bi) : object "dti" not found > > > Thank you for any help given. > > > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- +- | Patrizio Frederic, PhD | Assistant Professor, | Department of Economics, | University of Modena and Reggio Emilia, | Via Berengario 51, | 41100 Modena, Italy | | tel: +39 059 205 6727 | fax: +39 059 205 6947 | mail: patrizio.frede...@unimore.it +- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
on 07/22/2008 11:24 AM Christian Hof wrote: Dear all, how can I, with R, transform a presence-only table (with the names of the species (1st column), the lat information of the sites (2nd column) and the lon information of the sites (3rd column)) into a presence-absence (0/1) matrix of species occurrences across sites, as given in the below example? Thanks a lot for your help! Christian My initial table: specieslatlon sp11010 sp11030 sp12010 sp12020 sp12030 sp21030 sp22030 sp23030 My desired matrix: latlonsp1sp2 101010 102000 103011 201010 202010 203011 301000 302000 303001 One approach would be to use ftable(). Presuming that your source data is in a data frame called 'DF': > ftable(species ~ lat + lon, data = DF) species sp1 sp2 lat lon 10 101 0 200 0 301 1 20 101 0 201 0 301 1 30 100 0 200 0 300 1 See ?ftable and/or ?ftable.formula HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
Try this: newx <- with(x, cbind(stack(x, select = grep("spec", names(x))), lat, lon)) newx[newx$values > 0, -1] On 5/2/08, Christian Hof <[EMAIL PROTECTED]> wrote: > > Dear all, > how can I, with R, transform a presence-absence (0/1) matrix of species > occurrences into a presence-only table (3 columns) with the names of the > species (1st column), the lat information of the sites (2nd column) and the > lon information of the sites (3rd column), as given in the below example? > Thanks a lot for your help! > Christian > > > my dataframe: > > sitelat lon spec1 spec2 spec3 spec4 > site1 10 11 1 0 1 0 > site2 20 21 1 1 1 0 > site3 30 31 0 1 1 1 > > > my desired new dataframe: > > species lat lon > spec1 10 11 > spec1 20 21 > spec2 20 21 > spec2 30 31 > spec3 10 11 > spec3 20 21 > spec3 30 31 > spec4 30 31 > > > > -- > Christian Hof, PhD student > > Center for Macroecology & Evolution > University of Copenhagen > www.macroecology.ku.dk > & > Biodiversity & Global Change Lab > Museo Nacional de Ciencias Naturales, Madrid > www.biochange-lab.eu > > mobile ES .. +34 697 508 519 > mobile DE .. +49 176 205 189 27 > mail .. [EMAIL PROTECTED] >mail2 .. [EMAIL PROTECTED] > blog .. www.vogelwart.de > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
Hi Christian, Here's a way using the reshape package: > dfr site lat lon spec1 spec2 spec3 spec4 1 site1 10 11 1 0 1 0 2 site2 20 21 1 1 1 0 3 site3 30 31 0 1 1 1 > library(reshape) > dfr <- melt(dfr[, -1], id=1:2, variable_name='species') > dfr <- dfr[dfr$value>0,] > dfr lat lon species value 1 10 11 spec1 1 2 20 21 spec1 1 5 20 21 spec2 1 6 30 31 spec2 1 7 10 11 spec3 1 8 20 21 spec3 1 9 30 31 spec3 1 12 30 31 spec4 1 The 'value', variable is not interesting here, but if you had counts rather than presence/absence it could be. best, Kingsford Jones On Fri, May 2, 2008 at 2:27 PM, Christian Hof <[EMAIL PROTECTED]> wrote: > Dear all, > how can I, with R, transform a presence-absence (0/1) matrix of species > occurrences into a presence-only table (3 columns) with the names of the > species (1st column), the lat information of the sites (2nd column) and the > lon information of the sites (3rd column), as given in the below example? > Thanks a lot for your help! > Christian > > > my dataframe: > > sitelat lon spec1 spec2 spec3 spec4 > site1 10 11 1 0 1 0 > site2 20 21 1 1 1 0 > site3 30 31 0 1 1 1 > > > my desired new dataframe: > > species lat lon > spec1 10 11 > spec1 20 21 > spec2 20 21 > spec2 30 31 > spec3 10 11 > spec3 20 21 > spec3 30 31 > spec4 30 31 > > > > -- > Christian Hof, PhD student > > Center for Macroecology & Evolution > University of Copenhagen > www.macroecology.ku.dk > & > Biodiversity & Global Change Lab > Museo Nacional de Ciencias Naturales, Madrid > www.biochange-lab.eu > > mobile ES .. +34 697 508 519 > mobile DE .. +49 176 205 189 27 > mail .. [EMAIL PROTECTED] > mail2 .. [EMAIL PROTECTED] > blog .. www.vogelwart.de > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data transformation
Christian, You need to use reshape to convert to the 'long' format. Check the help page ?reshape for details. >dat <- read.table('clipboard', header=TRUE) >dat site lat lon spec1 spec2 spec3 spec4 1 site1 10 11 1 0 1 0 2 site2 20 21 1 1 1 0 3 site3 30 31 0 1 1 1 > dat.long <- reshape(dat, varying = list(names(dat)[4:7]), timevar="species", times=names(dat)[4:7], direction="long") > dat.long site lat lon species spec1 id 1.spec1 site1 10 11 spec1 1 1 2.spec1 site2 20 21 spec1 1 2 3.spec1 site3 30 31 spec1 0 3 1.spec2 site1 10 11 spec2 0 1 2.spec2 site2 20 21 spec2 1 2 3.spec2 site3 30 31 spec2 1 3 1.spec3 site1 10 11 spec3 1 1 2.spec3 site2 20 21 spec3 1 2 3.spec3 site3 30 31 spec3 1 3 1.spec4 site1 10 11 spec4 0 1 2.spec4 site2 20 21 spec4 0 2 3.spec4 site3 30 31 spec4 1 3 > dat.long[dat.long$spec1 == 1, ] site lat lon species spec1 id 1.spec1 site1 10 11 spec1 1 1 2.spec1 site2 20 21 spec1 1 2 2.spec2 site2 20 21 spec2 1 2 3.spec2 site3 30 31 spec2 1 3 1.spec3 site1 10 11 spec3 1 1 2.spec3 site2 20 21 spec3 1 2 3.spec3 site3 30 31 spec3 1 3 3.spec4 site3 30 31 spec4 1 3 -Christos > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Christian Hof > Sent: Friday, May 02, 2008 5:28 PM > To: r-help@r-project.org > Subject: [R] data transformation > > Dear all, > how can I, with R, transform a presence-absence (0/1) matrix > of species occurrences into a presence-only table (3 columns) > with the names of the species (1st column), the lat > information of the sites (2nd column) and the lon information > of the sites (3rd column), as given in the below example? > Thanks a lot for your help! > Christian > > > my dataframe: > > site lat lon spec1 spec2 spec3 spec4 > site1 10 11 1 0 1 0 > site2 20 21 1 1 1 0 > site3 30 31 0 1 1 1 > > > my desired new dataframe: > > species lat lon > spec1 10 11 > spec1 20 21 > spec2 20 21 > spec2 30 31 > spec3 10 11 > spec3 20 21 > spec3 30 31 > spec4 30 31 > > > > -- > Christian Hof, PhD student > > Center for Macroecology & Evolution > University of Copenhagen > www.macroecology.ku.dk > & > Biodiversity & Global Change Lab > Museo Nacional de Ciencias Naturales, Madrid www.biochange-lab.eu > > mobile ES .. +34 697 508 519 > mobile DE .. +49 176 205 189 27 > mail .. [EMAIL PROTECTED] > mail2 .. [EMAIL PROTECTED] > blog .. www.vogelwart.de > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.