subject:"\[R\] Data transformation"

Re: [R] Data transformation problem

2020-11-12 Thread phil


Thank you so much for this elegant solution, Jeff.

Philip

On 2020-11-12 02:20, Jeff Newmiller wrote:

I am not a data.table afficiando, but here is how I would do it with
dplyr/tidyr:

library(dplyr)
library(tidyr)

do_per_REL <- function( DF ) {
  rng <- range( DF$REF1 ) # watch out for missing months?
  DF <- (   data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" ) 
)

%>% left_join( DF, by = "REF1" )
%>% arrange( REF1 )
)
  with( DF
  , data.frame( REF2 = REF1[ -1 ]
  , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
  )
  )
}

df2a <- (   df1
%>% mutate( REF1 = as.Date( REF1 )
  , REL1 = as.Date( REL1 )
  )
%>% nest( data = -REL1 )
%>% rename( REL2 = REL1 )
%>% rowwise()
%>% mutate( data = list( do_per_REL( data ) ) )
%>% ungroup()
%>% unnest( cols = "data" )
%>% select( REF2, REL2, VAL2 )
%>% arrange( REF2, desc( REL2 ), VAL2 )
)
df2a

On Wed, 11 Nov 2020, p...@philipsmith.ca wrote:

I am stuck on a data transformation problem. I have a data frame, df1 
in my example, with some original "levels" data. The data pertain to 
some variable, such as GDP, in various reference periods, REF, as 
estimated and released in various release periods, REL. The release 
periods follow after the reference periods by two months or more, 
sometimes by several years. I want to build a second data frame, 
called df2 in my example, with the month-to-month growth rates that 
existed in each reference period, revealing the revisions to those 
growth rates in subsequent periods.


REF1 <- 
c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",

 "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
 "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL1 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",

 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL1 <- 
c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,

 19900,18765,13467)
df1 <- data.frame(REF1,REL1,VAL1)
REF2 <- 
c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",

 "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL2 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",

 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
df2 <- data.frame(REF2,REL2,VAL2)

In my example I have provided some sample data pertaining to three 
reference months, 2017-01-01 through 2017-03-01, and five release 
periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and 
"2019-05-01". In my actual problem I have millions of REF-REL 
combinations, so my data frame is quite large. I am using data.table 
for faster processing, though I am more familiar with the tidyverse. I 
am providing df2 as the target data frame for my example, so you can 
see what I am trying to achieve.


I have not been able to find an efficient way to do these 
calculations. I have tried "for" loops with "if" statements, without 
success so far, and anyway this approach would be too slow, I fear. 
Suggestions as to how I might proceed would be much appreciated.


Philip

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go 
Live...
DCN:Basics: ##.#.   ##.#.  Live 
Go...
  Live:   OO#.. Dead: OO#..  
Playing

Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  
rocks...1k

---


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation problem

2020-11-11 Thread Jeff Newmiller

I am not a data.table afficiando, but here is how I would do it with 
dplyr/tidyr:


library(dplyr)
library(tidyr)

do_per_REL <- function( DF ) {
  rng <- range( DF$REF1 ) # watch out for missing months?
  DF <- (   data.frame( REF1 = seq( rng[ 1 ], rng[ 2 ], by = "month" ) )
%>% left_join( DF, by = "REF1" )
%>% arrange( REF1 )
)
  with( DF
  , data.frame( REF2 = REF1[ -1 ]
  , VAL2 = 100 * diff( VAL1 ) / VAL1[ -length( VAL1 ) ]
  )
  )
}

df2a <- (   df1
%>% mutate( REF1 = as.Date( REF1 )
  , REL1 = as.Date( REL1 )
  )
%>% nest( data = -REL1 )
%>% rename( REL2 = REL1 )
%>% rowwise()
%>% mutate( data = list( do_per_REL( data ) ) )
%>% ungroup()
%>% unnest( cols = "data" )
%>% select( REF2, REL2, VAL2 )
%>% arrange( REF2, desc( REL2 ), VAL2 )
)
df2a

On Wed, 11 Nov 2020, p...@philipsmith.ca wrote:

I am stuck on a data transformation problem. I have a data frame, df1 in my 
example, with some original "levels" data. The data pertain to some variable, 
such as GDP, in various reference periods, REF, as estimated and released in 
various release periods, REL. The release periods follow after the reference 
periods by two months or more, sometimes by several years. I want to build a 
second data frame, called df2 in my example, with the month-to-month growth 
rates that existed in each reference period, revealing the revisions to those 
growth rates in subsequent periods.


REF1 <- c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",
 "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
 "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL1 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL1 <- 
c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,

 19900,18765,13467)
df1 <- data.frame(REF1,REL1,VAL1)
REF2 <- c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
 "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL2 <- c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
 "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
df2 <- data.frame(REF2,REL2,VAL2)

In my example I have provided some sample data pertaining to three reference 
months, 2017-01-01 through 2017-03-01, and five release periods, 
"2020-09-01","2020-08-01","2020-07-01","2020-06-01" and "2019-05-01". In my 
actual problem I have millions of REF-REL combinations, so my data frame is 
quite large. I am using data.table for faster processing, though I am more 
familiar with the tidyverse. I am providing df2 as the target data frame for 
my example, so you can see what I am trying to achieve.


I have not been able to find an efficient way to do these calculations. I 
have tried "for" loops with "if" statements, without success so far, and 
anyway this approach would be too slow, I fear. Suggestions as to how I might 
proceed would be much appreciated.


Philip

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data transformation problem

2020-11-11 Thread phil

I am stuck on a data transformation problem. I have a data frame, df1 in 
my example, with some original "levels" data. The data pertain to some 
variable, such as GDP, in various reference periods, REF, as estimated 
and released in various release periods, REL. The release periods follow 
after the reference periods by two months or more, sometimes by several 
years. I want to build a second data frame, called df2 in my example, 
with the month-to-month growth rates that existed in each reference 
period, revealing the revisions to those growth rates in subsequent 
periods.


REF1 <- 
c("2017-01-01","2017-01-01","2017-01-01","2017-01-01","2017-01-01",

  "2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",
  "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL1 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",

  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",
  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL1 <- 
c(17974,14567,13425,NA,12900,17974,14000,14000,12999,13245,17197,11500,

  19900,18765,13467)
df1 <- data.frame(REF1,REL1,VAL1)
REF2 <- 
c("2017-02-01","2017-02-01","2017-02-01","2017-02-01","2017-02-01",

  "2017-03-01","2017-03-01","2017-03-01","2017-03-01","2017-03-01")
REL2 <- 
c("2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01",

  "2020-09-01","2020-08-01","2020-07-01","2020-06-01","2019-05-01")
VAL2 <- c(0.0,-3.9,4.3,NA,2.3,-4.3,-17.9,42.1,44.4,1.7)
df2 <- data.frame(REF2,REL2,VAL2)

In my example I have provided some sample data pertaining to three 
reference months, 2017-01-01 through 2017-03-01, and five release 
periods, "2020-09-01","2020-08-01","2020-07-01","2020-06-01" and 
"2019-05-01". In my actual problem I have millions of REF-REL 
combinations, so my data frame is quite large. I am using data.table for 
faster processing, though I am more familiar with the tidyverse. I am 
providing df2 as the target data frame for my example, so you can see 
what I am trying to achieve.


I have not been able to find an efficient way to do these calculations. 
I have tried "for" loops with "if" statements, without success so far, 
and anyway this approach would be too slow, I fear. Suggestions as to 
how I might proceed would be much appreciated.


Philip

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2019-01-20 Thread Jeff Newmiller

There is no "perhaps" about it. Nonsense phrases like "similar to logit, where 
I dont [sic] lose normality of the data" that lead into off-topic discussions 
of why one introduces transformations in the first place are perfect examples 
of why questions like this belong on a statistical theory discussion forum like 
StackExchange rather than here where the topic is the R language.

On January 20, 2019 6:02:15 AM PST, Adrian Johnson  
wrote:
>Dear group,
>My question, perhaps is more of a statistical question using R
>I have a data matrix ( 400 x 400 normally distributed) with data
>points ranging from -1 to +1..
>For certain clustering algorithms, I suspect the tight data range is
>not helping resolving the clusters.
>
>Is there a way to transform the data something similar to logit, where
>I dont lose normality of the data and yet I can better expand the data
>ranges.
>
>Thanks
>Adrian
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2019-01-20 Thread Richard M. Heiberger

this might work for you

newy <- sign(oldy)*f(abs(oldy))

where f() is a monotonic transformation, perhaps a power function.

On Sun, Jan 20, 2019 at 11:08 AM Adrian Johnson
 wrote:
>
> I apologize,  I forgot to mention another key operation.
> in my matrix -1 to <0 has a different meaning while values between >0
> to 1 has a different set of meaning.  So If I do logit transformation
> some of the positives becomes negative (values < 0.5 etc.). In such
> case, the resulting transformed matrix is incorrect.
>
> I want to transform numbers ranging from -1 to <0   and numbers
> between >0 and 1 independently.
>
> Thanks
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2019-01-20 Thread David L Carlson

I don't think you have given us enough information. For example, is the 500x500 
matrix a distance matrix or does it represent 500 columns of information about 
500 rows of observations? If a distance matrix, how is distance being measured? 
You clarification suggests it may be a distance matrix of correlation 
coefficients? If distance has different meanings between -1 and 0 and 0 and +1, 
getting interpretable results from cluster analysis will be difficult, but it 
is not clear what you mean by that.

-
David L. Carlson
Department of Anthropology
Texas A University

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Adrian Johnson
Sent: Sunday, January 20, 2019 8:02 AM
To: r-help 
Subject: [R] data transformation

Dear group,
My question, perhaps is more of a statistical question using R
I have a data matrix ( 400 x 400 normally distributed) with data
points ranging from -1 to +1..
For certain clustering algorithms, I suspect the tight data range is
not helping resolving the clusters.

Is there a way to transform the data something similar to logit, where
I dont lose normality of the data and yet I can better expand the data
ranges.

Thanks
Adrian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Adrian Johnson
Sent: Sunday, January 20, 2019 10:08 AM
To: r-help 
Subject: Re: [R] data transformation

I apologize,  I forgot to mention another key operation.
in my matrix -1 to <0 has a different meaning while values between >0
to 1 has a different set of meaning.  So If I do logit transformation
some of the positives becomes negative (values < 0.5 etc.). In such
case, the resulting transformed matrix is incorrect.

I want to transform numbers ranging from -1 to <0   and numbers
between >0 and 1 independently.

Thanks

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2019-01-20 Thread Adrian Johnson

I apologize,  I forgot to mention another key operation.
in my matrix -1 to <0 has a different meaning while values between >0
to 1 has a different set of meaning.  So If I do logit transformation
some of the positives becomes negative (values < 0.5 etc.). In such
case, the resulting transformed matrix is incorrect.

I want to transform numbers ranging from -1 to <0   and numbers
between >0 and 1 independently.

Thanks

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data transformation

2019-01-20 Thread Adrian Johnson

Dear group,
My question, perhaps is more of a statistical question using R
I have a data matrix ( 400 x 400 normally distributed) with data
points ranging from -1 to +1..
For certain clustering algorithms, I suspect the tight data range is
not helping resolving the clusters.

Is there a way to transform the data something similar to logit, where
I dont lose normality of the data and yet I can better expand the data
ranges.

Thanks
Adrian

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation to list for event occurence

2013-11-13 Thread William Dunlap

Or,

f3 - function (dat1)  {
i - dat1$Event_Occurence == 1
split(dat1$Week[i], dat1$ID[i])
}

in addition to the previously mentioned
f1 - function(dat1) {
with(dat1,tapply(as.logical(Event_Occurence),ID,FUN=which ))
}
f2 - function(dat1){
 lapply(split(dat1,dat1$ID),function(x) which(!!x[,3]))
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of arun
 Sent: Tuesday, November 12, 2013 2:13 PM
 To: R help
 Subject: Re: [R] Data transformation to list for event occurence
 
 
 
 Hi Anindya,
 
 You may try:
 dat1 - read.table(text=ID   Week    Event_Occurence
 A 1 0
 A 2 0
 A 3 1
 A 4 0
 B 1 1
 B 2 0
 B 3 0
 B 4 1,sep=,header=TRUE,stringsAsFactors=FALSE)
 
  with(dat1,tapply(as.logical(Event_Occurence),ID,FUN=which ))
 #or
 lapply(split(dat1,dat1$ID),function(x) which(!!x[,3]))
 A.K.
 
 
 
 
 
 On Tuesday, November 12, 2013 4:58 PM, Anindya Sankar Dey 
 anindy...@gmail.com
 wrote:
 Hi,
 
 Say I have a following data
 
 ID   Week    Event_Occurence
 A 1 0
 A 2 0
 A 3 1
 A 4 0
 B 1 1
 B 2 0
 B 3 0
 B 4 1
 
 that whether an individual experienced an event in a particular week.
 
 I wish to create list such as the first element of the list will be a
 vector listing the week number when the event has occurred for A, followed
 by that of B.
 
 Can you help creating this?
 
 --
 Anindya Sankar Dey
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data transformation to list for event occurence

2013-11-12 Thread Anindya Sankar Dey

Hi,

Say I have a following data

ID   WeekEvent_Occurence
A 1 0
A 2 0
A 3 1
A 4 0
B 1 1
B 2 0
B 3 0
B 4 1

that whether an individual experienced an event in a particular week.

I wish to create list such as the first element of the list will be a
vector listing the week number when the event has occurred for A, followed
by that of B.

Can you help creating this?

-- 
Anindya Sankar Dey

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation to list for event occurence

2013-11-12 Thread arun



Hi Anindya,

You may try:
dat1 - read.table(text=ID   Week    Event_Occurence
A 1 0
A 2 0
A 3 1
A 4 0
B 1 1
B 2 0
B 3 0
B 4 1,sep=,header=TRUE,stringsAsFactors=FALSE)

 with(dat1,tapply(as.logical(Event_Occurence),ID,FUN=which ))
#or
lapply(split(dat1,dat1$ID),function(x) which(!!x[,3]))
A.K.





On Tuesday, November 12, 2013 4:58 PM, Anindya Sankar Dey anindy...@gmail.com 
wrote:
Hi,

Say I have a following data

ID   Week    Event_Occurence
A 1 0
A 2 0
A 3 1
A 4 0
B 1 1
B 2 0
B 3 0
B 4 1

that whether an individual experienced an event in a particular week.

I wish to create list such as the first element of the list will be a
vector listing the week number when the event has occurred for A, followed
by that of B.

Can you help creating this?

-- 
Anindya Sankar Dey

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation cleaning

2011-09-28 Thread Weidong Gu

Seems your questions belong to rule mining for frequent item sets.
check arules package

Weidong Gu

On Tue, Sep 27, 2011 at 11:13 PM, pip56789 pd...@virginia.edu wrote:
 Hi,

 I have a few methodological and implementation questions for ya'll. Thank
 you in advance for your help. I have a dataset that reflects people's
 preference choices. I want to see if there's any kind of clustering effect
 among certain preference choices (e.g. do people who pick choice A also pick
 choice D).

 I have a data set that has one record per user ID, per preference choice.
 It's a long form of a data set that looks like this:

 ID | Page
 123 | Choice A
 123 | Choice B
 456 | Choice A
 456 | Choice B
 ...

 I thought that I should do the following

 1. Make the data set wide, counting the observations so the data looks
 like this:
 ID | Count of Preference A | Count of Preference B
 123 | 1 | 1
 ...

 Using
 table1 - dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

 2. Create a correlation matrix of preferences
 cor(table2[,-1])

 How would I restrict my correlation to show preferences that met a minimum
 sample threshold? Can you confirm if the two following commands do the same
 thing? What would I do from here (or am I taking the wrong approach)
 table1 - dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
 table2 - with(data, table(Page,Page))


 many thanks,
 Peter

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation cleaning

2011-09-28 Thread Jim Lemon


On 09/28/2011 01:13 PM, pip56789 wrote:

Hi,

I have a few methodological and implementation questions for ya'll. Thank
you in advance for your help. I have a dataset that reflects people's
preference choices. I want to see if there's any kind of clustering effect
among certain preference choices (e.g. do people who pick choice A also pick
choice D).

I have a data set that has one record per user ID, per preference choice.
It's a long form of a data set that looks like this:

ID | Page
123 | Choice A
123 | Choice B
456 | Choice A
456 | Choice B
...

I thought that I should do the following

1. Make the data set wide, counting the observations so the data looks
like this:
ID | Count of Preference A | Count of Preference B
123 | 1 | 1
...

Using
table1- dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

2. Create a correlation matrix of preferences
cor(table2[,-1])

How would I restrict my correlation to show preferences that met a minimum
sample threshold? Can you confirm if the two following commands do the same
thing? What would I do from here (or am I taking the wrong approach)
table1- dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
table2- with(data, table(Page,Page))



Hi Peter,
An easy way to visualize set intersections is the intersectDiagram 
function in the plotrix package. This will display the counts or 
percentages of each type of intersection. Your data could be passed like 
this:


choices-data.frame(IDs=sample(1:20,50,TRUE),
 sample(LETTERS[1:4],50,TRUE))
library(plotrix)
intersectDiagram(choices)

This example is a bit messy, as it will generate quite a few repeated 
choices that will be ignored by intersectDiagram, but it should give you 
the idea.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data transformation cleaning

2011-09-27 Thread pip56789

Hi,

I have a few methodological and implementation questions for ya'll. Thank
you in advance for your help. I have a dataset that reflects people's
preference choices. I want to see if there's any kind of clustering effect
among certain preference choices (e.g. do people who pick choice A also pick
choice D). 

I have a data set that has one record per user ID, per preference choice.
It's a long form of a data set that looks like this: 

ID | Page
123 | Choice A
123 | Choice B
456 | Choice A
456 | Choice B
...

I thought that I should do the following

1. Make the data set wide, counting the observations so the data looks
like this:
ID | Count of Preference A | Count of Preference B
123 | 1 | 1
...

Using 
table1 - dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )

2. Create a correlation matrix of preferences
cor(table2[,-1])

How would I restrict my correlation to show preferences that met a minimum
sample threshold? Can you confirm if the two following commands do the same
thing? What would I do from here (or am I taking the wrong approach)
table1 - dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
table2 - with(data, table(Page,Page))


many thanks,
Peter

--
View this message in context: 
http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3849889.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation cleaning

2011-09-27 Thread Daniel Malter

On a methodological level, if the choices do not correspond on a cardinal or
at least ordinal scale, you don't want to use correlations. Instead you
should probably use Cramer's V, in particular if the choices are
multinomial. Whether the wide format is necessary will depend on the format
the function you are using expects.

HTH,
Daniel


pde3p wrote:
 
 Hi,
 
 I have a few methodological and implementation questions for ya'll. Thank
 you in advance for your help. I have a dataset that reflects people's
 preference choices. I want to see if there's any kind of clustering effect
 among certain preference choices (e.g. do people who pick choice A also
 pick choice D). 
 
 I have a data set that has one record per user ID, per preference choice.
 It's a long form of a data set that looks like this: 
 
 ID | Page
 123 | Choice A
 123 | Choice B
 456 | Choice A
 456 | Choice B
 ...
 
 I thought that I should do the following
 
 1. Make the data set wide, counting the observations so the data looks
 like this:
 ID | Count of Preference A | Count of Preference B
 123 | 1 | 1
 ...
 
 Using 
 table1 - dcast(data,ID ~ Page,fun.aggregate=length,value_var='Page' )
 
 2. Create a correlation matrix of preferences
 cor(table2[,-1])
 
 How would I restrict my correlation to show preferences that met a minimum
 sample threshold? Can you confirm if the two following commands do the
 same thing? What would I do from here (or am I taking the wrong approach)
 table1 - dcast(data,Page ~ Page,fun.aggregate=length,value_var='Page' )
 table2 - with(data, table(Page,Page))
 
 
 many thanks,
 Peter
 

--
View this message in context: 
http://r.789695.n4.nabble.com/Data-transformation-cleaning-tp3849889p3850076.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data transformation ----Box-Cox Transformations

2011-05-03 Thread Stuart

Hi

Could any one please help how I can trnasform data based on Box-Cox
Transformations. I have massive data set with many variables. If
possible someone can write few lines so I can read in all data set
once and transform it.


g1  g2  g2
97.03703704 89.25925926 4.4
24.90740741 69.25925926 35.5556
62. 85.18518519 36.85185185
18.51851852 84.25925926 21.6667
93.7037037  95.92592593 54.07407407
26.6667 23. 99.25925926
63. 97.03703704 27.40740741
95.74074074 3.6 59.25925926
46.6667 49. 39.1667
21.85185185 2.592592593 63.14814815
94.7222 17.7778 81.


any help will be much appreciated

Cheers
Sbroad

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation ----Box-Cox Transformations

2011-05-03 Thread Greg Snow

There is the bct function in the TeachingDemos package that does Box-Cox 
transforms (though you could also write your own fairly simply).  The 
lappy/sapply functions will apply a function to each column of a data frame.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Stuart
 Sent: Tuesday, May 03, 2011 9:37 AM
 To: r-help@r-project.org
 Subject: [R] data transformation Box-Cox Transformations
 
 Hi
 
 Could any one please help how I can trnasform data based on Box-Cox
 Transformations. I have massive data set with many variables. If
 possible someone can write few lines so I can read in all data set
 once and transform it.
 
 
 g1g2  g2
 97.03703704   89.25925926 4.4
 24.90740741   69.25925926 35.5556
 62.   85.18518519 36.85185185
 18.51851852   84.25925926 21.6667
 93.703703795.92592593 54.07407407
 26.6667   23. 99.25925926
 63.   97.03703704 27.40740741
 95.74074074   3.6 59.25925926
 46.6667   49. 39.1667
 21.85185185   2.592592593 63.14814815
 94.7222   17.7778 81.
 
 
 any help will be much appreciated
 
 Cheers
 Sbroad
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation ----Box-Cox Transformations

2011-05-03 Thread John Fox

Dear Stuart,

See ?bcPower and ?powerTransform in the car package, the latter for
univariate and multivariate conditional and unconditional ML Box-Cox.

I hope this helps,
 John


John Fox
Senator William McMaster
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox




 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Stuart
 Sent: May-03-11 11:37 AM
 To: r-help@r-project.org
 Subject: [R] data transformation Box-Cox Transformations
 
 Hi
 
 Could any one please help how I can trnasform data based on Box-Cox
 Transformations. I have massive data set with many variables. If
 possible someone can write few lines so I can read in all data set once
 and transform it.
 
 
 g1g2  g2
 97.03703704   89.25925926 4.4
 24.90740741   69.25925926 35.5556
 62.   85.18518519 36.85185185
 18.51851852   84.25925926 21.6667
 93.703703795.92592593 54.07407407
 26.6667   23. 99.25925926
 63.   97.03703704 27.40740741
 95.74074074   3.6 59.25925926
 46.6667   49. 39.1667
 21.85185185   2.592592593 63.14814815
 94.7222   17.7778 81.
 
 
 any help will be much appreciated
 
 Cheers
 Sbroad
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data transformation

2010-11-03 Thread Santosh Srinivas

Dear Group,

Need to do the following transformation:

I have the dataset
structure(list(Date = structure(1L, .Label = 2010-06-16, class =
factor), 
ACC.returns1Day = -0.018524832, ACC.returns5Day = 0.000863931, 
ACC.returns7Day = -0.019795222, BCC.returns1Day = -0.009861859, 
BCC.returns5Day = 0.000850706, BCC.returns7Day = -0.014695715), .Names =
c(Date, 
ACC.returns1Day, ACC.returns5Day, ACC.returns7Day, BCC.returns1Day, 
BCC.returns5Day, BCC.returns7Day), class = data.frame, row.names =
c(NA, 
-1L))

I can split the names using:
retNames - strsplit(names(returns),\\.returns)

Assuming that the frame has only one row, how do I transform this into 

1Day5Day7Day
ACC -0.0185 0.0009  -0.0198
BCC -0.0099 0.0009  -0.0147

If I have more than one unique date  ... is there some nice structure that I
could put this into where I have the date as the parent and the sub data
structure that gives the data as above for any unique date?

I can always do this with for-loops ... but I think there are easier ways to
achieve this.

Thanks,
S

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Data transformation

2010-01-25 Thread Lisa


Dear all,

I  have a dataset that looks like this:

x - read.table(textConnection(col1 col2 
3 1 
2 2 
4 7 
8 6 
5 10), header=TRUE) 

I want to rewrite it as below:

var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
1 0 1  0 0 0 0 0  0  0
0 2 0  0 0 0 0 0  0  0
0 0 0  1 0 0 1 0  0  0
0 0 0  0 0 1 0 1  0  0
0 0 0  0 1 0 0 0  0  1

Can anybody please help how to get this done? Your help would be greatly
appreciated. 

Lisa 

-- 
View this message in context: 
http://n4.nabble.com/Data-transformation-tp1289899p1289899.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2010-01-25 Thread Sarah Goslee

Well, I have no idea how to get from one to the other. There's
col1 and col2 but no var1 var2 var3, etc. I thought perhaps col1
was the row index and col2 was the column index, but that doesn't
match up either, and not all the cell values are 1.

So you will need to explain more clearly what you intend.

Meanwhile, you might try reshape, or perhaps crosstab from the
ecodist package.

Sarah

On Mon, Jan 25, 2010 at 5:39 PM, Lisa lisa...@gmail.com wrote:

 Dear all,

 I  have a dataset that looks like this:

 x - read.table(textConnection(col1 col2
 3 1
 2 2
 4 7
 8 6
 5 10), header=TRUE)

 I want to rewrite it as below:

 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
    1     0     1      0     0     0     0     0      0      0
    0     2     0      0     0     0     0     0      0      0
    0     0     0      1     0     0     1     0      0      0
    0     0     0      0     0     1     0     1      0      0
    0     0     0      0     1     0     0     0      0      1

 Can anybody please help how to get this done? Your help would be greatly
 appreciated.

 Lisa



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2010-01-25 Thread Steve Lianoglou

Hi,

On Mon, Jan 25, 2010 at 5:39 PM, Lisa lisa...@gmail.com wrote:

 Dear all,

 I  have a dataset that looks like this:

 x - read.table(textConnection(col1 col2
 3 1
 2 2
 4 7
 8 6
 5 10), header=TRUE)

 I want to rewrite it as below:

 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
    1     0     1      0     0     0     0     0      0      0
    0     2     0      0     0     0     0     0      0      0
    0     0     0      1     0     0     1     0      0      0
    0     0     0      0     0     1     0     1      0      0
    0     0     0      0     1     0     0     0      0      1

 Can anybody please help how to get this done? Your help would be greatly
 appreciated.

I was trying to do it w/o for loops, but I can't figure out a way to do so:

R bounds - range(x)
R m - matrix(0, nrow=nrow(x), ncol=bounds[2])
R colnames(m) - paste('var', seq(bounds[2]), sep=)
## Ugly nested for-loop one-liner below
R for (i in 1:nrow(x))for (j in 1:ncol(x)) m[i,x[i,j]] - m[i,x[i,j]] + 1
R m

 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
[1,]101000000 0
[2,]020000000 0
[3,]000100100 0
[4,]000001010 0
[5,]000010000 1

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2010-01-25 Thread Lisa


Thank you so much.

Lisa
-- 
View this message in context: 
http://n4.nabble.com/Data-transformation-tp1289899p1289915.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2010-01-25 Thread Seeliger . Curt

r-help-boun...@r-project.org wrote on 01/25/2010 02:39:32 PM:
 x - read.table(textConnection(col1 col2 
 3 1 
 2 2 
 4 7 
 8 6 
 5 10), header=TRUE) 
 
 I want to rewrite it as below:
 
 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
 1 0 1  0 0 0 0 0  0  0
 0 2 0  0 0 0 0 0  0  0
 0 0 0  1 0 0 1 0  0  0
 0 0 0  0 0 1 0 1  0  0
 0 0 0  0 1 0 0 0  0  1
 
 Can anybody please help how to get this done? Your help would be greatly
 appreciated. 

Thanks, I've not seen textConnection() before.  The table() function will 
get you close:

table(c(rownames(x),rownames(x)), c(x$col1,x$col2))

cur
-- 
Curt Seeliger, Data Ranger
Raytheon Information Services - Contractor to ORD
seeliger.c...@epa.gov
541/754-4638



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2010-01-25 Thread Gabor Grothendieck

Try this:

 t(apply(x, 1, function(r) table(factor(r, levels = seq_len(max(x))
 1 2 3 4 5 6 7 8 9 10
[1,] 1 0 1 0 0 0 0 0 0  0
[2,] 0 2 0 0 0 0 0 0 0  0
[3,] 0 0 0 1 0 0 1 0 0  0
[4,] 0 0 0 0 0 1 0 1 0  0
[5,] 0 0 0 0 1 0 0 0 0  1

If you use aaply in the plyr package instead of apply then you can
omit the transpose.


On Mon, Jan 25, 2010 at 5:39 PM, Lisa lisa...@gmail.com wrote:

 Dear all,

 I  have a dataset that looks like this:

 x - read.table(textConnection(col1 col2
 3 1
 2 2
 4 7
 8 6
 5 10), header=TRUE)

 I want to rewrite it as below:

 var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
    1     0     1      0     0     0     0     0      0      0
    0     2     0      0     0     0     0     0      0      0
    0     0     0      1     0     0     1     0      0      0
    0     0     0      0     0     1     0     1      0      0
    0     0     0      0     1     0     0     0      0      1

 Can anybody please help how to get this done? Your help would be greatly
 appreciated.

 Lisa

 --
 View this message in context: 
 http://n4.nabble.com/Data-transformation-tp1289899p1289899.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2009-11-11 Thread jim holtman

Try this:

 x - read.table(textConnection(idcode1code2 p
+  148   0.1
+  157   0.9
+  218   0.4
+  262   0.2
+  243   0.6
+  356   0.7
+  375   0.9), header=TRUE)
  closeAllConnections()
  # create object like output from 'melt'
  x.m - data.frame(id=c(x$id, x$id),
+var=paste('var', c(x$code1, x$code2), sep=''),
+variable=rep('p', 2*nrow(x)),
+value=c(x$p, x$p))
 require(reshape)  # use the reshape package
 (x.n - cast(x.m, id ~ var, function(.dat){
+ if (length(.dat) == 0) return(0)  # test for no data; return
zero if that is the case
+ mean(.dat)
+ }))
  id var1 var2 var3 var4 var5 var6 var7 var8
1  1  0.0  0.0  0.0  0.1  0.9  0.0  0.9  0.1
2  2  0.4  0.2  0.6  0.6  0.0  0.2  0.0  0.4
3  3  0.0  0.0  0.0  0.0  0.8  0.7  0.9  0.0



On Tue, Nov 10, 2009 at 11:10 PM, legen lege...@gmail.com wrote:

 Thank you for your kind help. Your script works very well. Would you please
 show me how to change NaN to zero and column variables 1, 2, ..., 8 to var1,
 var2, ..., var8? Thanks again.

 Legen



 jholtman wrote:

 Is this what you want:

 x - read.table(textConnection(id    code1    code2         p
 +  1        4        8           0.1
 +  1        5        7           0.9
 +  2        1        8           0.4
 +  2        6        2           0.2
 +  2        4        3           0.6
 +  3        5        6           0.7
 +  3        7        5           0.9), header=TRUE)
  closeAllConnections()
  # create object like output from 'melt'
  x.m - data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2),
 +     variable=rep('p', 2*nrow(x)), value=c(x$p, x$p))
 require(reshape)  # use the reshape package
 cast(x.m, id ~ var, mean)
   id   1   2   3   4   5   6   7   8
 1  1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1
 2  2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4
 3  3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN




 On Tue, Nov 10, 2009 at 4:30 PM, legen lege...@gmail.com wrote:

 Dear all,

 I have a dataset as below:

 id    code1    code2         p
  1        4        8           0.1
  1        5        7           0.9
  2        1        8           0.4
  2        6        2           0.2
  2        4        3           0.6
  3        5        6           0.7
  3        7        5           0.9

 I just want to rewrite it as this (vertical to horizontal):

 id   var1  var2  var3  var4  var5  var6  var7  var8
 1        0      0      0    0.1   0.9       0   0.9    0.1
 2     0.4    0.2   0.6    0.6      0    0.2      0    0.4
 3        0      0      0      0    0.8    0.7    0.9      0

 For the third subject, there are two values being equal to 5 in code1 and
 code2, but different values in p:  0.7 and 0.9, so I assigned their
 average
 0.8 in var5.

 Does anybody can help me to handle this? Many thanks for your
 consideration
 and time.

 Legen

 --
 View this message in context:
 http://old.nabble.com/Data-transformation-tp26291568p26291568.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 View this message in context: 
 http://old.nabble.com/Data-transformation-tp26291568p26295766.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2009-11-11 Thread legen


Your script works very well. Thank you very much.

Legen



Henrique Dallazuanna wrote:
 
 Try this also:
 
 xtabs(rep(p, 2) ~ rep(id, 2) + sprintf(var%d, c(code1, code2)), data =
 x)
 
 On Wed, Nov 11, 2009 at 2:10 AM, legen lege...@gmail.com wrote:

 Thank you for your kind help. Your script works very well. Would you
 please
 show me how to change NaN to zero and column variables 1, 2, ..., 8 to
 var1,
 var2, ..., var8? Thanks again.

 Legen



 jholtman wrote:

 Is this what you want:

 x - read.table(textConnection(id    code1    code2         p
 +  1        4        8           0.1
 +  1        5        7           0.9
 +  2        1        8           0.4
 +  2        6        2           0.2
 +  2        4        3           0.6
 +  3        5        6           0.7
 +  3        7        5           0.9), header=TRUE)
  closeAllConnections()
  # create object like output from 'melt'
  x.m - data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2),
 +     variable=rep('p', 2*nrow(x)), value=c(x$p, x$p))
 require(reshape)  # use the reshape package
 cast(x.m, id ~ var, mean)
   id   1   2   3   4   5   6   7   8
 1  1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1
 2  2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4
 3  3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN




 On Tue, Nov 10, 2009 at 4:30 PM, legen lege...@gmail.com wrote:

 Dear all,

 I have a dataset as below:

 id    code1    code2         p
  1        4        8           0.1
  1        5        7           0.9
  2        1        8           0.4
  2        6        2           0.2
  2        4        3           0.6
  3        5        6           0.7
  3        7        5           0.9

 I just want to rewrite it as this (vertical to horizontal):

 id   var1  var2  var3  var4  var5  var6  var7  var8
 1        0      0      0    0.1   0.9       0   0.9    0.1
 2     0.4    0.2   0.6    0.6      0    0.2      0    0.4
 3        0      0      0      0    0.8    0.7    0.9      0

 For the third subject, there are two values being equal to 5 in code1
 and
 code2, but different values in p:  0.7 and 0.9, so I assigned their
 average
 0.8 in var5.

 Does anybody can help me to handle this? Many thanks for your
 consideration
 and time.

 Legen

 --
 View this message in context:
 http://old.nabble.com/Data-transformation-tp26291568p26291568.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 View this message in context:
 http://old.nabble.com/Data-transformation-tp26291568p26295766.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/Data-transformation-tp26291568p26301029.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2009-11-11 Thread legen


That's what I want. Many thanks for your help.
Legen



jholtman wrote:
 
 Try this:
 
 x - read.table(textConnection(idcode1code2 p
 +  148   0.1
 +  157   0.9
 +  218   0.4
 +  262   0.2
 +  243   0.6
 +  356   0.7
 +  375   0.9), header=TRUE)
  closeAllConnections()
  # create object like output from 'melt'
  x.m - data.frame(id=c(x$id, x$id),
 +var=paste('var', c(x$code1, x$code2), sep=''),
 +variable=rep('p', 2*nrow(x)),
 +value=c(x$p, x$p))
 require(reshape)  # use the reshape package
 (x.n - cast(x.m, id ~ var, function(.dat){
 + if (length(.dat) == 0) return(0)  # test for no data; return
 zero if that is the case
 + mean(.dat)
 + }))
   id var1 var2 var3 var4 var5 var6 var7 var8
 1  1  0.0  0.0  0.0  0.1  0.9  0.0  0.9  0.1
 2  2  0.4  0.2  0.6  0.6  0.0  0.2  0.0  0.4
 3  3  0.0  0.0  0.0  0.0  0.8  0.7  0.9  0.0

 
 
 On Tue, Nov 10, 2009 at 11:10 PM, legen lege...@gmail.com wrote:

 Thank you for your kind help. Your script works very well. Would you
 please
 show me how to change NaN to zero and column variables 1, 2, ..., 8 to
 var1,
 var2, ..., var8? Thanks again.

 Legen



 jholtman wrote:

 Is this what you want:

 x - read.table(textConnection(id    code1    code2         p
 +  1        4        8           0.1
 +  1        5        7           0.9
 +  2        1        8           0.4
 +  2        6        2           0.2
 +  2        4        3           0.6
 +  3        5        6           0.7
 +  3        7        5           0.9), header=TRUE)
  closeAllConnections()
  # create object like output from 'melt'
  x.m - data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2),
 +     variable=rep('p', 2*nrow(x)), value=c(x$p, x$p))
 require(reshape)  # use the reshape package
 cast(x.m, id ~ var, mean)
   id   1   2   3   4   5   6   7   8
 1  1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1
 2  2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4
 3  3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN




 On Tue, Nov 10, 2009 at 4:30 PM, legen lege...@gmail.com wrote:

 Dear all,

 I have a dataset as below:

 id    code1    code2         p
  1        4        8           0.1
  1        5        7           0.9
  2        1        8           0.4
  2        6        2           0.2
  2        4        3           0.6
  3        5        6           0.7
  3        7        5           0.9

 I just want to rewrite it as this (vertical to horizontal):

 id   var1  var2  var3  var4  var5  var6  var7  var8
 1        0      0      0    0.1   0.9       0   0.9    0.1
 2     0.4    0.2   0.6    0.6      0    0.2      0    0.4
 3        0      0      0      0    0.8    0.7    0.9      0

 For the third subject, there are two values being equal to 5 in code1
 and
 code2, but different values in p:  0.7 and 0.9, so I assigned their
 average
 0.8 in var5.

 Does anybody can help me to handle this? Many thanks for your
 consideration
 and time.

 Legen

 --
 View this message in context:
 http://old.nabble.com/Data-transformation-tp26291568p26291568.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem that you are trying to solve?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 View this message in context:
 http://old.nabble.com/Data-transformation-tp26291568p26295766.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/Data-transformation-tp26291568p26300980.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org

[R] Data transformation

2009-11-10 Thread legen


Dear all,

I have a dataset as below:

idcode1code2 p 
 148   0.1
 157   0.9
 218   0.4
 262   0.2
 243   0.6
 356   0.7
 375   0.9

I just want to rewrite it as this (vertical to horizontal):

id   var1  var2  var3  var4  var5  var6  var7  var8 
10  0  00.1   0.9   0   0.90.1
2 0.40.2   0.60.6  00.2  00.4
30  0  0  00.80.70.9  0

For the third subject, there are two values being equal to 5 in code1 and
code2, but different values in p:  0.7 and 0.9, so I assigned their average
0.8 in var5.

Does anybody can help me to handle this? Many thanks for your consideration
and time.

Legen 

-- 
View this message in context: 
http://old.nabble.com/Data-transformation-tp26291568p26291568.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2009-11-10 Thread jim holtman

Is this what you want:

 x - read.table(textConnection(idcode1code2 p
+  148   0.1
+  157   0.9
+  218   0.4
+  262   0.2
+  243   0.6
+  356   0.7
+  375   0.9), header=TRUE)
  closeAllConnections()
  # create object like output from 'melt'
  x.m - data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2),
+ variable=rep('p', 2*nrow(x)), value=c(x$p, x$p))
 require(reshape)  # use the reshape package
 cast(x.m, id ~ var, mean)
  id   1   2   3   4   5   6   7   8
1  1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1
2  2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4
3  3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN




On Tue, Nov 10, 2009 at 4:30 PM, legen lege...@gmail.com wrote:

 Dear all,

 I have a dataset as below:

 id    code1    code2         p
  1        4        8           0.1
  1        5        7           0.9
  2        1        8           0.4
  2        6        2           0.2
  2        4        3           0.6
  3        5        6           0.7
  3        7        5           0.9

 I just want to rewrite it as this (vertical to horizontal):

 id   var1  var2  var3  var4  var5  var6  var7  var8
 1        0      0      0    0.1   0.9       0   0.9    0.1
 2     0.4    0.2   0.6    0.6      0    0.2      0    0.4
 3        0      0      0      0    0.8    0.7    0.9      0

 For the third subject, there are two values being equal to 5 in code1 and
 code2, but different values in p:  0.7 and 0.9, so I assigned their average
 0.8 in var5.

 Does anybody can help me to handle this? Many thanks for your consideration
 and time.

 Legen

 --
 View this message in context: 
 http://old.nabble.com/Data-transformation-tp26291568p26291568.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data transformation

2009-11-10 Thread legen


Thank you for your kind help. Your script works very well. Would you please
show me how to change NaN to zero and column variables 1, 2, ..., 8 to var1,
var2, ..., var8? Thanks again.

Legen

 

jholtman wrote:
 
 Is this what you want:
 
 x - read.table(textConnection(idcode1code2 p
 +  148   0.1
 +  157   0.9
 +  218   0.4
 +  262   0.2
 +  243   0.6
 +  356   0.7
 +  375   0.9), header=TRUE)
  closeAllConnections()
  # create object like output from 'melt'
  x.m - data.frame(id=c(x$id, x$id), var=c(x$code1, x$code2),
 + variable=rep('p', 2*nrow(x)), value=c(x$p, x$p))
 require(reshape)  # use the reshape package
 cast(x.m, id ~ var, mean)
   id   1   2   3   4   5   6   7   8
 1  1 NaN NaN NaN 0.1 0.9 NaN 0.9 0.1
 2  2 0.4 0.2 0.6 0.6 NaN 0.2 NaN 0.4
 3  3 NaN NaN NaN NaN 0.8 0.7 0.9 NaN

 
 
 
 On Tue, Nov 10, 2009 at 4:30 PM, legen lege...@gmail.com wrote:

 Dear all,

 I have a dataset as below:

 id    code1    code2         p
  1        4        8           0.1
  1        5        7           0.9
  2        1        8           0.4
  2        6        2           0.2
  2        4        3           0.6
  3        5        6           0.7
  3        7        5           0.9

 I just want to rewrite it as this (vertical to horizontal):

 id   var1  var2  var3  var4  var5  var6  var7  var8
 1        0      0      0    0.1   0.9       0   0.9    0.1
 2     0.4    0.2   0.6    0.6      0    0.2      0    0.4
 3        0      0      0      0    0.8    0.7    0.9      0

 For the third subject, there are two values being equal to 5 in code1 and
 code2, but different values in p:  0.7 and 0.9, so I assigned their
 average
 0.8 in var5.

 Does anybody can help me to handle this? Many thanks for your
 consideration
 and time.

 Legen

 --
 View this message in context:
 http://old.nabble.com/Data-transformation-tp26291568p26291568.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem that you are trying to solve?
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 

-- 
View this message in context: 
http://old.nabble.com/Data-transformation-tp26291568p26295766.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data transformation using gamma

2009-05-07 Thread Roslina Zakaria

Hi R-users,

I have this code to uniformise the data using gamma:

 length(dp1)
[1] 696
 dim(dp1) 
[1] 58 12
 dim(ahall)
[1]  1 12
 dim(bhall)
[1]  1 12

 trans_dt - function(dt,a,b)
+ { n1 - ncol(dt)
+   n2 - length(dt)
+   trans  - vector(mode='numeric', length=n2) 
+   dim(trans) - dim(dt)
+   for (i in 1:n1)
+   {  dt[,i] - as.vector(dt[,i])
+  trans[,i] - transform(dti,newdt=pgamma(dti,shape= a[1,i],scale=b[1,i])) 
}
+   trans
+ }

 trans_dt(dp1,ahall,bhall)
Error in transform(dti, newdt = pgamma(dti, shape = a[1, i], scale = b[1,  : 
  object dti not found

and also try 
trans_dt - function(dt,a,b)
{ n1 - ncol(dt)
  n2 - length(dt)
  trans  - vector(mode='numeric', length=n2) 
  dim(trans) - dim(dt)
  for (i in 1:n1)
  {  dti - dt[,i]
 ai  - a[1,i]
 bi  - b[1,i]
 trans[,i] - transform(dti,newdt=pgamma(dti,shape= ai,scale=bi)) }
  trans
}

trans_dt(dp1,ahall,bhall)
Error in pgamma(dti, shape = ai, scale = bi) : object dti not found


Thank you for any help given.


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation using gamma

2009-05-07 Thread Patrizio Frederic

Roslina,
this code performs what you need:

dt  = matrix((1:(58*12))/58/12,58) # some numbers
# if dt is a data.frame use dt = as.matrix(dt)
a   = (1:12)/12 # some a coef
b   = (12:1)/12 # some b coef
dtgam   = matrix(pgamma(dt,a,b),58)
# dtgam is the transformation you're looking for

no loop needed no transform function involved
cheers,

Patrizio


2009/5/7 Roslina Zakaria zrosl...@yahoo.com:
 Hi R-users,

 I have this code to uniformise the data using gamma:

 length(dp1)
 [1] 696
 dim(dp1)
 [1] 58 12
 dim(ahall)
 [1]  1 12
 dim(bhall)
 [1]  1 12

 trans_dt - function(dt,a,b)
 + { n1 - ncol(dt)
 +   n2 - length(dt)
 +   trans  - vector(mode='numeric', length=n2)
 +   dim(trans) - dim(dt)
 +   for (i in 1:n1)
 +   {  dt[,i] - as.vector(dt[,i])
 +  trans[,i] - transform(dti,newdt=pgamma(dti,shape= 
 a[1,i],scale=b[1,i])) }
 +   trans
 + }

 trans_dt(dp1,ahall,bhall)
 Error in transform(dti, newdt = pgamma(dti, shape = a[1, i], scale = b[1,  :
   object dti not found

 and also try
 trans_dt - function(dt,a,b)
 { n1 - ncol(dt)
   n2 - length(dt)
   trans  - vector(mode='numeric', length=n2)
   dim(trans) - dim(dt)
   for (i in 1:n1)
   {  dti - dt[,i]
  ai  - a[1,i]
  bi  - b[1,i]
  trans[,i] - transform(dti,newdt=pgamma(dti,shape= ai,scale=bi)) }
   trans
 }

 trans_dt(dp1,ahall,bhall)
 Error in pgamma(dti, shape = ai, scale = bi) : object dti not found


 Thank you for any help given.



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
+-
| Patrizio Frederic, PhD
| Assistant Professor,
| Department of Economics,
| University of Modena and Reggio Emilia,
| Via Berengario 51,
| 41100 Modena, Italy
|
| tel:  +39 059 205 6727
| fax:  +39 059 205 6947
| mail: patrizio.frede...@unimore.it
+-

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data transformation

2008-07-22 Thread Christian Hof


Dear all,

how can I, with R, transform a presence-only table (with the names of 
the species (1st column), the lat information of the sites (2nd column) 
and the lon information of the sites (3rd column)) into a 
presence-absence (0/1) matrix of species occurrences across sites, as 
given in the below example?


Thanks a lot for your help!
Christian



My initial table:

species lat lon
sp1 10  10
sp1 10  30
sp1 20  10
sp1 20  20
sp1 20  30
sp2 10  30
sp2 20  30
sp2 30  30


My desired matrix:

lat lon sp1 sp2
10  10  1   0
10  20  0   0
10  30  1   1
20  10  1   0
20  20  1   0
20  30  1   1
30  10  0   0
30  20  0   0
30  30  0   1


--
Christian Hof, PhD student

Center for Macroecology  Evolution
University of Copenhagen
www.macroecology.ku.dk

Biodiversity  Global Change Lab
Museo Nacional de Ciencias Naturales, Madrid
www.biochange-lab.eu

mobile ES .. +34 697 508 519
mobile DE .. +49 176 205 189 27
  mail .. [EMAIL PROTECTED]
 mail2 .. [EMAIL PROTECTED]
  blog .. www.vogelwart.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2008-07-22 Thread Marc Schwartz


on 07/22/2008 11:24 AM Christian Hof wrote:

Dear all,

how can I, with R, transform a presence-only table (with the names of 
the species (1st column), the lat information of the sites (2nd column) 
and the lon information of the sites (3rd column)) into a 
presence-absence (0/1) matrix of species occurrences across sites, as 
given in the below example?


Thanks a lot for your help!
Christian



My initial table:

specieslatlon
sp11010
sp11030
sp12010
sp12020
sp12030
sp21030
sp22030
sp23030


My desired matrix:

latlonsp1sp2
101010
102000
103011
201010
202010
203011
301000
302000
303001



One approach would be to use ftable(). Presuming that your source data 
is in a data frame called 'DF':


 ftable(species ~ lat + lon, data = DF)
species sp1 sp2
lat lon
10  101   0
200   0
301   1
20  101   0
201   0
301   1
30  100   0
200   0
300   1



See ?ftable and/or ?ftable.formula

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data transformation

2008-05-02 Thread Christian Hof


Dear all,
how can I, with R, transform a presence-absence (0/1) matrix of species 
occurrences into a presence-only table (3 columns) with the names of the 
species (1st column), the lat information of the sites (2nd column) and 
the lon information of the sites (3rd column), as given in the below 
example?

Thanks a lot for your help!
Christian


my dataframe:

sitelat lon spec1   spec2   spec3   spec4
site1   10  11  1   0   1   0
site2   20  21  1   1   1   0
site3   30  31  0   1   1   1


my desired new dataframe:

species lat lon
spec1   10  11
spec1   20  21
spec2   20  21
spec2   30  31
spec3   10  11
spec3   20  21
spec3   30  31
spec4   30  31



--
Christian Hof, PhD student

Center for Macroecology  Evolution
University of Copenhagen
www.macroecology.ku.dk

Biodiversity  Global Change Lab
Museo Nacional de Ciencias Naturales, Madrid
www.biochange-lab.eu

mobile ES .. +34 697 508 519
mobile DE .. +49 176 205 189 27
 mail .. [EMAIL PROTECTED]
mail2 .. [EMAIL PROTECTED]
 blog .. www.vogelwart.de

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2008-05-02 Thread Christos Hatzis

Christian,

You need to use reshape to convert to the 'long' format.
Check the help page ?reshape for details.

dat - read.table('clipboard', header=TRUE)
dat
   site lat lon spec1 spec2 spec3 spec4
1 site1  10  11 1 0 1 0
2 site2  20  21 1 1 1 0
3 site3  30  31 0 1 1 1
 dat.long - reshape(dat, varying = list(names(dat)[4:7]),
timevar=species, 
times=names(dat)[4:7], direction=long)
 dat.long
 site lat lon species spec1 id
1.spec1 site1  10  11   spec1 1  1
2.spec1 site2  20  21   spec1 1  2
3.spec1 site3  30  31   spec1 0  3
1.spec2 site1  10  11   spec2 0  1
2.spec2 site2  20  21   spec2 1  2
3.spec2 site3  30  31   spec2 1  3
1.spec3 site1  10  11   spec3 1  1
2.spec3 site2  20  21   spec3 1  2
3.spec3 site3  30  31   spec3 1  3
1.spec4 site1  10  11   spec4 0  1
2.spec4 site2  20  21   spec4 0  2
3.spec4 site3  30  31   spec4 1  3
 dat.long[dat.long$spec1 == 1, ]
 site lat lon species spec1 id
1.spec1 site1  10  11   spec1 1  1
2.spec1 site2  20  21   spec1 1  2
2.spec2 site2  20  21   spec2 1  2
3.spec2 site3  30  31   spec2 1  3
1.spec3 site1  10  11   spec3 1  1
2.spec3 site2  20  21   spec3 1  2
3.spec3 site3  30  31   spec3 1  3
3.spec4 site3  30  31   spec4 1  3 

-Christos

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Christian Hof
 Sent: Friday, May 02, 2008 5:28 PM
 To: r-help@r-project.org
 Subject: [R] data transformation
 
 Dear all,
 how can I, with R, transform a presence-absence (0/1) matrix 
 of species occurrences into a presence-only table (3 columns) 
 with the names of the species (1st column), the lat 
 information of the sites (2nd column) and the lon information 
 of the sites (3rd column), as given in the below example?
 Thanks a lot for your help!
 Christian
 
 
 my dataframe:
 
 site  lat lon spec1   spec2   spec3   spec4
 site1 10  11  1   0   1   0
 site2 20  21  1   1   1   0
 site3 30  31  0   1   1   1
 
 
 my desired new dataframe:
 
 species   lat lon
 spec1 10  11
 spec1 20  21
 spec2 20  21
 spec2 30  31
 spec3 10  11
 spec3 20  21
 spec3 30  31
 spec4 30  31
 
 
 
 --
 Christian Hof, PhD student
 
 Center for Macroecology  Evolution
 University of Copenhagen
 www.macroecology.ku.dk
 
 Biodiversity  Global Change Lab
 Museo Nacional de Ciencias Naturales, Madrid www.biochange-lab.eu
 
 mobile ES .. +34 697 508 519
 mobile DE .. +49 176 205 189 27
   mail .. [EMAIL PROTECTED]
  mail2 .. [EMAIL PROTECTED]
   blog .. www.vogelwart.de
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2008-05-02 Thread Kingsford Jones

Hi Christian,

Here's a way using the reshape package:

 dfr
   site lat lon spec1 spec2 spec3 spec4
1 site1  10  11 1 0 1 0
2 site2  20  21 1 1 1 0
3 site3  30  31 0 1 1 1
 library(reshape)
 dfr - melt(dfr[, -1], id=1:2, variable_name='species')
 dfr - dfr[dfr$value0,]
 dfr
   lat lon species value
1   10  11   spec1 1
2   20  21   spec1 1
5   20  21   spec2 1
6   30  31   spec2 1
7   10  11   spec3 1
8   20  21   spec3 1
9   30  31   spec3 1
12  30  31   spec4 1


The 'value', variable is not interesting here, but if you had counts
rather than presence/absence it could be.

best,

Kingsford Jones

On Fri, May 2, 2008 at 2:27 PM, Christian Hof [EMAIL PROTECTED] wrote:
 Dear all,
  how can I, with R, transform a presence-absence (0/1) matrix of species
 occurrences into a presence-only table (3 columns) with the names of the
 species (1st column), the lat information of the sites (2nd column) and the
 lon information of the sites (3rd column), as given in the below example?
  Thanks a lot for your help!
  Christian


  my dataframe:

  sitelat lon spec1   spec2   spec3   spec4
  site1   10  11  1   0   1   0
  site2   20  21  1   1   1   0
  site3   30  31  0   1   1   1


  my desired new dataframe:

  species lat lon
  spec1   10  11
  spec1   20  21
  spec2   20  21
  spec2   30  31
  spec3   10  11
  spec3   20  21
  spec3   30  31
  spec4   30  31



  --
  Christian Hof, PhD student

  Center for Macroecology  Evolution
  University of Copenhagen
  www.macroecology.ku.dk
  
  Biodiversity  Global Change Lab
  Museo Nacional de Ciencias Naturales, Madrid
  www.biochange-lab.eu

  mobile ES .. +34 697 508 519
  mobile DE .. +49 176 205 189 27
  mail .. [EMAIL PROTECTED]
 mail2 .. [EMAIL PROTECTED]
  blog .. www.vogelwart.de

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data transformation

2008-05-02 Thread Henrique Dallazuanna

Try this:

newx - with(x, cbind(stack(x, select = grep(spec, names(x))), lat, lon))
newx[newx$values  0, -1]



On 5/2/08, Christian Hof [EMAIL PROTECTED] wrote:

 Dear all,
 how can I, with R, transform a presence-absence (0/1) matrix of species
 occurrences into a presence-only table (3 columns) with the names of the
 species (1st column), the lat information of the sites (2nd column) and the
 lon information of the sites (3rd column), as given in the below example?
 Thanks a lot for your help!
 Christian


 my dataframe:

 sitelat lon spec1   spec2   spec3   spec4
 site1   10  11  1   0   1   0
 site2   20  21  1   1   1   0
 site3   30  31  0   1   1   1


 my desired new dataframe:

 species lat lon
 spec1   10  11
 spec1   20  21
 spec2   20  21
 spec2   30  31
 spec3   10  11
 spec3   20  21
 spec3   30  31
 spec4   30  31



 --
 Christian Hof, PhD student

 Center for Macroecology  Evolution
 University of Copenhagen
 www.macroecology.ku.dk
 
 Biodiversity  Global Change Lab
 Museo Nacional de Ciencias Naturales, Madrid
 www.biochange-lab.eu

 mobile ES .. +34 697 508 519
 mobile DE .. +49 176 205 189 27
 mail .. [EMAIL PROTECTED]
mail2 .. [EMAIL PROTECTED]
 blog .. www.vogelwart.de

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40 S 49° 16' 22 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

39 matches

Mail list logo