subject:"Re\: \[R\] Data frame manipulation"

Re: [R] data frame manipulation

2014-02-20 Thread Jeff Newmiller

Depending what you really want to achieve, the following may be useful or 
educational:


dat$ID2x <- with( dat, ave( rep( 1, nrow( dat ) ), ID, USE, FUN=cumsum ) )
dat$ID2y <- dat$ID2x
dat$ID2y[ dat$USE != "001" ] <- NA

On Thu, 20 Feb 2014, arun wrote:


Hi,
Try:
dat$ID2 <- with(dat,ave(seq_along(USE),ID,FUN=function(x){x1 <- USE[x] =='001'; 
ifelse(!x1,'',cumsum(x1))}))
A.K.




On Thursday, February 20, 2014 3:31 PM, Pedro Mardones  
wrote:
Dear R community;

I'm kind of stuck with the following situation and would appreciate any
hint. Let's assume I have the following data frame:

dat <- data.frame(ID = c(rep("01",18), rep("02",16)), USE = c(c("001","004",
"005","007","001","004","005","007","012","001","004","005","007","001","004",
"005","007","012"),c("004","005","007","013","001","004","005","007","001","004",
"005","007","001","004","005","007")),ID2 = "")

What I would like to achieve is to number all the "001" occurrences in USE
for each ID individually and store them in ID2. In other words, something
like this:

   ID USE ID2
1  01 001  1
2  01 004
3  01 005
4  01 007
5  01 001   2
6  01 004
7  01 005
8  01 007
9  01 012
10 01 001   3
11 01 004
12 01 005
13 01 007
14 01 001   4
15 01 004
16 01 005
17 01 007
18 01 012
19 02 004
20 02 005
21 02 007
22 02 013
23 02 001    1
24 02 004
25 02 005
26 02 007
27 02 001    2
28 02 004
29 02 005
30 02 007
31 02 001    3
32 02 004
33 02 005
34 02 007

Thanks again,
Pedro

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
---__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation

2014-02-20 Thread arun

Hi,
Try:
dat$ID2 <- with(dat,ave(seq_along(USE),ID,FUN=function(x){x1 <- USE[x] =='001'; 
ifelse(!x1,'',cumsum(x1))}))
A.K.




On Thursday, February 20, 2014 3:31 PM, Pedro Mardones  
wrote:
Dear R community;

I'm kind of stuck with the following situation and would appreciate any
hint. Let's assume I have the following data frame:

dat <- data.frame(ID = c(rep("01",18), rep("02",16)), USE = c(c("001","004",
"005","007","001","004","005","007","012","001","004","005","007","001","004",
"005","007","012"),c("004","005","007","013","001","004","005","007","001","004",
"005","007","001","004","005","007")),ID2 = "")

What I would like to achieve is to number all the "001" occurrences in USE
for each ID individually and store them in ID2. In other words, something
like this:

   ID USE ID2
1  01 001  1
2  01 004
3  01 005
4  01 007
5  01 001   2
6  01 004
7  01 005
8  01 007
9  01 012
10 01 001   3
11 01 004
12 01 005
13 01 007
14 01 001   4
15 01 004
16 01 005
17 01 007
18 01 012
19 02 004
20 02 005
21 02 007
22 02 013
23 02 001    1
24 02 004
25 02 005
26 02 007
27 02 001    2
28 02 004
29 02 005
30 02 007
31 02 001    3
32 02 004
33 02 005
34 02 007

Thanks again,
Pedro

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data frame manipulation

2012-11-22 Thread arun

Hi,
May be this helps:
library(reshape)
dat1 # data that needs to be converted
res<-melt(dat1,id=c("Local","Mês","Dia","Colonia"))
 names(res)[5:6]<-c("Hora","N")
 res1<-res[order(res$Dia),] 

 row.names(res1)<-1:nrow(res1)
res1$Hora<-gsub("[X]","",res1$Hora)
 head(res1)
#  Local   Mês Dia Colonia Hora   N
#1 Conceição Junho   1   3   6h 2.16137
#2 Conceição Junho   1   4   6h 1.65321
#3 Conceição Junho   1   5   6h 2.21748
#4 Conceição julho   1   3   6h 2.20952
#5 Conceição Junho   1   3   7h 2.20412
#6 Conceição Junho   1   4   7h 2.16435
A.K.


To: r-help@r-project.org
Cc: 
Sent: Thursday, November 22, 2012 8:53 PM
Subject: [R] Data frame manipulation

Hello,

I have a table that was constructed in a wrong way (dput data on bottom -
wrong data-frame):

    Local   Mês Dia Colonia     X6h     X7h     X8h     X9h    X10h    X11h
   X12h    X13h    X14h    X15h    X16h    X17h
1  Conceição Junho   1       3 2.16137 2.20412 2.08991 1.72428 1.69897
1.62325 1.44716 1.51851 1.43136 1.47712 1.51851 1.04139
2  Conceição Junho   2       3 2.46538 2.13672 2.06819 1.97772 2.0
1.80618 1.64345 1.20412 1.62325 1.36173 1.69020 1.57978
3  Conceição Junho   3       3 2.53275 2.52504 2.49276 2.3 2.12710
2.26007 2.24551 1.95424 2.09342 1.04139 1.53148 1.17609
4  Conceição Junho   1       4 1.65321 2.16435 1.91381 1.75587 1.74036
1.17609 1.66276 1.51851 1.39794 1.04139 1.11394 1.04139
5  Conceição Junho   2       4 2.30320 1.71600 2.02531 2.05690 1.86332
1.66276 1.17609 1.04139 1.30103 1.27875 1.3 1.3
6  Conceição Junho   3       4 2.71012 2.30320 2.53403 1.80618 2.24551
2.20683 2.02531 1.07918 1.36173 1.39794 1.11394 1.93450
7  Conceição Junho   1       5 2.21748 1.99564 2.26007 2.28103 2.10380
1.41497 0.47712 1.07918 0.90309 1.04139 1.49136 1.23045
8  Conceição Junho   2       5 2.10721 2.16435 2.05308 2.38561 2.14613
1.61278 1.27875 0.47712 1.61278 1.0 1.44716 1.07918
9  Conceição Junho   3       5 1.62325 1.93450 2.33041 2.24797 2.29885
2.48001 2.29003 1.43136 1.49136 1.17609 1.41497 1.14613
10 Conceição julho   1       3 2.20952 2.01284 1.79239 1.59106 1.62325
1.51851 1.41497 1.38021 1.66276 1.46240 1.53148 1.66276


I have to create a new column (hour) and transpose just the last 12
columns, and first four columns have to be copied 12 time, like this (dput
data on bottom - correct data-frame):

     Local   Mês Dia Colonia Hora        N
1  Conceição Junho   1       3   6h 2.161370
2  Conceição Junho   1       3   7h 2.204120
3  Conceição Junho   1       3   8h 2.089910
4  Conceição Junho   1       3   9h 1.724280
5  Conceição Junho   1       3  10h 1.698970
6  Conceição Junho   1       3  11h 1.623250
7  Conceição Junho   1       3  12h 1.447160
8  Conceição Junho   1       3  13h 1.518510
9  Conceição Junho   1       3  14h 1.431360
10 Conceição Junho   1       3  15h 1.477120
11 Conceição Junho   1       3  16h 1.518510
12 Conceição Junho   1       3  17h 1.041390
13 Conceição Junho   2       3   6h 2.465383

Some one could give me some ideas? I don't even know how to start...

Thanks in advanced,

-- 
Raoni Rosa Rodrigues
Research Associate of Fish Transposition Center CTPeixes
Universidade Federal de Minas Gerais - UFMG
Brasil
rodrigues.ra...@gmail.com

wrong data frame:

structure(list(Local = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "Conceição", class = "factor"), Mês = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("julho", "Junho"
), class = "factor"), Dia = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L, 1L), Colonia = c(3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 3L),
    X6h = c(2.16137, 2.46538, 2.53275, 1.65321, 2.3032, 2.71012,
    2.21748, 2.10721, 1.62325, 2.20952), X7h = c(2.20412, 2.13672,
    2.52504, 2.16435, 1.716, 2.3032, 1.99564, 2.16435, 1.9345,
    2.01284), X8h = c(2.08991, 2.06819, 2.49276, 1.91381, 2.02531,
    2.53403, 2.26007, 2.05308, 2.33041, 1.79239), X9h = c(1.72428,
    1.97772, 2.3, 1.75587, 2.0569, 1.80618, 2.28103, 2.38561,
    2.24797, 1.59106), X10h = c(1.69897, 2, 2.1271, 1.74036,
    1.86332, 2.24551, 2.1038, 2.14613, 2.29885, 1.62325), X11h = c(1.62325,
    1.80618, 2.26007, 1.17609, 1.66276, 2.20683, 1.41497, 1.61278,
    2.48001, 1.51851), X12h = c(1.44716, 1.64345, 2.24551, 1.66276,
    1.17609, 2.02531, 0.47712, 1.27875, 2.29003, 1.41497), X13h =
c(1.51851,
    1.20412, 1.95424, 1.51851, 1.04139, 1.07918, 1.07918, 0.47712,
    1.43136, 1.38021), X14h = c(1.43136, 1.62325, 2.09342, 1.39794,
    1.30103, 1.36173, 0.90309, 1.61278, 1.49136, 1.66276), X15h =
c(1.47712,
    1.36173, 1.04139, 1.04139, 1.27875, 1.39794, 1.04139, 1,
    1.17609, 1.4624), X16h = c(1.51851, 1.6902, 1.53148, 1.11394,
    1.3, 1.11394, 1.49136, 1.44716, 1.41497, 1.53148), X17h =
c(1.04139,
    1.57978, 1.17609, 1.04139, 1.3, 1.9345, 1.23045, 1.07918,
    1.14613, 1.66276)), .Names = c("Local", "Mês", "Dia", "Colonia",
"X6h", "X7h", "X8h", "X9h", "X10h", "X11h", "X12h", "X13h", "X14h",
"X15h", "X16h", "X17h"), row.nam

Re: [R] Data frame manipulation

2012-11-22 Thread jim holtman

The 'reshape2' package is your friend:

> require(reshape2)
> x <- melt(wrong, id = c("Local", "Mês", "Dia", "Colonia"), variable.name = 
> "Hora")
> # remove "X" from Hora
> x$Hora <- as.character(substring(x$Hora, 2))
> head(x)  # not in the right order
  Local   Mês Dia Colonia Hora   value
1 Conceição Junho   1   3   6h 2.16137
2 Conceição Junho   2   3   6h 2.46538
3 Conceição Junho   3   3   6h 2.53275
4 Conceição Junho   1   4   6h 1.65321
5 Conceição Junho   2   4   6h 2.30320
6 Conceição Junho   3   4   6h 2.71012
> # sort, but first add blank on Hora for less that 10h for sorting
> x$Hora <- ifelse(nchar(x$Hora) == 2, paste0(" ", x$Hora), x$Hora)
> x <- x[order(x$Local, x$Mês, x$Dia, x$Colonia, x$Hora), ]
>
> head(x,20)
Local   Mês Dia Colonia Hora   value
10  Conceição julho   1   3   6h 2.20952
20  Conceição julho   1   3   7h 2.01284
30  Conceição julho   1   3   8h 1.79239
40  Conceição julho   1   3   9h 1.59106
50  Conceição julho   1   3  10h 1.62325
60  Conceição julho   1   3  11h 1.51851
70  Conceição julho   1   3  12h 1.41497
80  Conceição julho   1   3  13h 1.38021
90  Conceição julho   1   3  14h 1.66276
100 Conceição julho   1   3  15h 1.46240
110 Conceição julho   1   3  16h 1.53148
120 Conceição julho   1   3  17h 1.66276
1   Conceição Junho   1   3   6h 2.16137
11  Conceição Junho   1   3   7h 2.20412
21  Conceição Junho   1   3   8h 2.08991
31  Conceição Junho   1   3   9h 1.72428
41  Conceição Junho   1   3  10h 1.69897
51  Conceição Junho   1   3  11h 1.62325
61  Conceição Junho   1   3  12h 1.44716
71  Conceição Junho   1   3  13h 1.51851



On Thu, Nov 22, 2012 at 8:53 PM, Raoni Rodrigues
 wrote:
> structure(list(Local = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L), .Label = "Conceição", class = "factor"), Mês = structure(c(2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("julho", "Junho"
> ), class = "factor"), Dia = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L), Colonia = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
> Hora = structure(1:10, .Label = c("6h", "7h", "8h", "9h",
> "10h", "11h", "12h", "13h", "14h", "15h", "16h", "17h"), class =
> "factor"),
> N = c(2.16137, 2.20412, 2.08991, 1.72428, 1.69897, 1.62325,
> 1.44716, 1.51851, 1.43136, 1.47712)), .Names = c("Local",
> "Mês", "Dia", "Colonia", "Hora", "N"), row.names = c(NA, 10L), class =
> "data.frame")



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with conditions

2012-02-24 Thread ilai

Ahh, I see it now.
For some reason your original post popped up on the list again, could
be just my mail server, sorry.
Looks like Uwe gave you the same solution (in two lines for better
clarity) right away. Depending on your level of "noobiness", my advice
would have been to ignore everything after that. Although if the other
approaches worked better for you, cheers.
Again sorry for this double thread.


On Fri, Feb 24, 2012 at 12:31 PM, Arnaud Gaboury
 wrote:
> TY Elai for your answer. One solution has been given earlier in this list by 
> Sarah Goslee and William Dunlap.
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of ilai
> Sent: vendredi 24 février 2012 20:14
> To: A2CT2 Trading
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation with conditions
>
> On Fri, Feb 24, 2012 at 8:11 AM, A2CT2 Trading  wrote:
>> Dear list,
>>
>> n00b question, but still can't find any easy answer.
>>
>> Here is a df:
>>
>>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))
>
> # No, your y is a factor
>  str(df)
> 'data.frame':   4 obs. of  2 variables:
>  $ x: Factor w/ 3 levels "AA","BB","CC": 1 2 3 1  $ y: Factor w/ 4 levels 
> "1","2","3","4": 1 2 3 4
>
> # You want to remove the cbind
>> df<-data.frame(x=c("AA","BB","CC","AA"),y=1:4)
>> str(df)
> 'data.frame':   4 obs. of  2 variables:
>  $ x: Factor w/ 3 levels "AA","BB","CC": 1 2 3 1  $ y: int  1 2 3 4
>
>> I want to modify this df this way :
>>  if df$x=="AA" then df$y=df$y*10
>>  if df$x=="BB" then df$y=df$y*25
>>
>> and so on with other conditions.
>>
>
>  df$y<- df$y * c(10,25,.5)[df$x]
> [1] 10.0 50.0  1.5 40.0
>  # 1*10 2*25 3*.5 4*10
>
> HTH
>
> Elai
>
>> TY for any help.
>>
>> Trading
>>
>> A2CT2 Ltd.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with conditions

2012-02-24 Thread Arnaud Gaboury

TY Elai for your answer. One solution has been given earlier in this list by 
Sarah Goslee and William Dunlap.

Arnaud Gaboury
 
A2CT2 Ltd.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of ilai
Sent: vendredi 24 février 2012 20:14
To: A2CT2 Trading
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with conditions

On Fri, Feb 24, 2012 at 8:11 AM, A2CT2 Trading  wrote:
> Dear list,
>
> n00b question, but still can't find any easy answer.
>
> Here is a df:
>
>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))

# No, your y is a factor
 str(df)
'data.frame':   4 obs. of  2 variables:
 $ x: Factor w/ 3 levels "AA","BB","CC": 1 2 3 1  $ y: Factor w/ 4 levels 
"1","2","3","4": 1 2 3 4

# You want to remove the cbind
> df<-data.frame(x=c("AA","BB","CC","AA"),y=1:4)
> str(df)
'data.frame':   4 obs. of  2 variables:
 $ x: Factor w/ 3 levels "AA","BB","CC": 1 2 3 1  $ y: int  1 2 3 4

> I want to modify this df this way :
>  if df$x=="AA" then df$y=df$y*10
>  if df$x=="BB" then df$y=df$y*25
>
> and so on with other conditions.
>

 df$y<- df$y * c(10,25,.5)[df$x]
[1] 10.0 50.0  1.5 40.0
 # 1*10 2*25 3*.5 4*10

HTH

Elai

> TY for any help.
>
> Trading
>
> A2CT2 Ltd.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with conditions

2012-02-24 Thread ilai

On Fri, Feb 24, 2012 at 8:11 AM, A2CT2 Trading  wrote:
> Dear list,
>
> n00b question, but still can't find any easy answer.
>
> Here is a df:
>
>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))

# No, your y is a factor
 str(df)
'data.frame':   4 obs. of  2 variables:
 $ x: Factor w/ 3 levels "AA","BB","CC": 1 2 3 1
 $ y: Factor w/ 4 levels "1","2","3","4": 1 2 3 4

# You want to remove the cbind
> df<-data.frame(x=c("AA","BB","CC","AA"),y=1:4)
> str(df)
'data.frame':   4 obs. of  2 variables:
 $ x: Factor w/ 3 levels "AA","BB","CC": 1 2 3 1
 $ y: int  1 2 3 4

> I want to modify this df this way :
>  if df$x=="AA" then df$y=df$y*10
>  if df$x=="BB" then df$y=df$y*25
>
> and so on with other conditions.
>

 df$y<- df$y * c(10,25,.5)[df$x]
[1] 10.0 50.0  1.5 40.0
 # 1*10 2*25 3*.5 4*10

HTH

Elai

> TY for any help.
>
> Trading
>
> A2CT2 Ltd.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread William Dunlap

When a factor is used as a subscript it is treated
as its integer codes so explicit conversion to character
is needed if you want to subscript by names:
  > f <- factor(c("One","Three","Two"), levels=c("One","Two","Three"))
  > x <- c(Two=2, One=1, Three=3)
  > x[f]
Two Three   One 
  2 3 1 
  > x[as.character(f)]
One Three   Two 
  1 3 2
For most other functions (e.g., %in%, paste, sprintf("%s"))
you do not need an explicit conversion to character, but '['
requires you to choose.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -Original Message-
> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
> Sent: Friday, February 24, 2012 9:39 AM
> To: William Dunlap
> Cc: Arnaud Gaboury; r-help@r-project.org
> Subject: Re: [R] data frame manipulation with condition
> 
> On Fri, Feb 24, 2012 at 12:23 PM, William Dunlap  wrote:
> > Use mult[as.character(df$x)] instead of mult[df$x].
> > They are different when df$x is a factor and the
> > character version is what you want.
> 
> R will coerce a factor to character to perform the comparison; explicitly
> calling as.character() is not necessary:
> 
> > df$x
> [1] AA BB CC AA DD DD
> > df$x == "AA"
> [1]  TRUE FALSE FALSE  TRUE FALSE FALSE
> 
> See ?factor for details.
> 
> Sarah
> 
> >  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
> >  > mult <- c(AA = 10, BB = 25,DD=15)
> >  > df$y <- df$y * mult[as.character(df$x)]
> >  > df
> >     x  y
> >  1 AA 10
> >  2 BB 50
> >  3 CC NA
> >  4 AA 40
> >  5 DD 75
> >  6 DD 90
> >
> > This gets the order right.  The NA for "CC" is because
> > your vector of multipliers didn't include an entry for
> > CC.  You can either add CC=1 to mult or work only on the
> > subset of the data which has entries in the mult vector.
> >
> >  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
> >  > mult <- c(AA = 10, BB = 25,DD=15)
> >  > i <- as.character(df$x) %in% names(mult)
> >  > df$y[i] <- df$y[i] * mult[as.character(df$x[i])]
> >  > df
> >     x  y
> >  1 AA 10
> >  2 BB 50
> >  3 CC  3
> >  4 AA 40
> >  5 DD 75
> >  6 DD 90
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
> >> On Behalf Of Arnaud
> Gaboury
> >> Sent: Friday, February 24, 2012 8:37 AM
> >> To: Uwe Ligges
> >> Cc: r-help@r-project.org
> >> Subject: Re: [R] data frame manipulation with condition
> >>
> >> > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
> >> > mult <- c(AA = 10, BB = 25,DD=15)
> >> > df$y <- df$y * mult[df$x]
> >> > df
> >>    x  y
> >> 1 AA 10
> >> 2 BB 50
> >> 3 CC 45
> >> 4 AA 40
> >> 5 DD NA
> >> 6 DD NA
> >>
> >> My df is in fact much more longer than the chosen example shown here. It 
> >> seems your tip didn't do
> the
> >> job.
> >> I am expecting this as result :
> >>
> >> > df
> >>    x  y
> >> 1 AA 10  > if df$x==AA, df$y<-1*10
> >> 2 BB 50   > if df$x==BB, df$y<-2*25
> >> 3 CC 3         NOTHING
> >> 4 AA 40    > if df$x==AA, df$y<-4*10
> >> 5 DD 75   > if df$x==DD, df$y<-5*15
> >> 6 DD 90   > if df$x==DD, df$y<-6*15
> >>
> >> Arnaud Gaboury
> >>
> >> A2CT2 Ltd.
> >>
> >> -Original Message-
> >> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
> >> Sent: vendredi 24 février 2012 17:07
> >> To: Arnaud Gaboury
> >> Cc: r-help@r-project.org
> >> Subject: Re: [R] data frame manipulation with condition
> >>
> >>
> >>
> >> On 24.02.2012 16:59, Arnaud Gaboury wrote:
> >> > TY Uwe,
> >> >
> >> > So I will have to write a line for each condition? Right?
> >> >
> >> > In fact I was trying to do something with apply in one line, but 
> >> > couldn't achieve any result. In
> >> fact

Re: [R] data frame manipulation with condition

2012-02-24 Thread Sarah Goslee

Whatever makes you happy.

> df1 <-
+ structure(list(x = structure(c(1L, 2L, 2L, 3L), .Label = c("AA",
+ "BB", "CC"), class = "factor"), y = 1:4), .Names = c("x", "y"
+ ), row.names = c(NA, -4L), class = "data.frame")
> mult <- c("AA"=2,"BB"=5,"CC"=1,"DD"=2)
> df1$y * mult[df1$x]
AA BB BB CC
 2 10 15  4
> df1$y * mult[as.character(df1$x)]
AA BB BB CC
 2 10 15  4
>
>
> df2 <- data.frame(x = c("AA","AA","BB","BB","BB","CC","DD","DD"), y = 1:8)
> df2$y * mult[df2$x]
AA AA BB BB BB CC DD DD
 2  4 15 20 25  6 14 16
> df2$y * mult[as.character(df2$x)]
AA AA BB BB BB CC DD DD
 2  4 15 20 25  6 14 16


On Fri, Feb 24, 2012 at 12:52 PM, Arnaud Gaboury
 wrote:
> In fact I need to use William tip: Use mult[as.character(df$x)] instead of 
> mult[df$x].
>
>
> Let's try again with a shorter df as example:
>
> The rule: if AA, then multiply y by 2, if BB multiply y by 5, if CC do 
> nothing, if DD multiply by 2.
>
>
> Let's say day 1 I have df1:
>
> df1 <-
> structure(list(x = structure(c(1L, 2L, 2L, 3L), .Label = c("AA",
> "BB", "CC"), class = "factor"), y = 1:4), .Names = c("x", "y"
> ), row.names = c(NA, -4L), class = "data.frame")
>
>> df1
>   x y
> 1 AA 1
> 2 BB 2
> 3 BB 3
> 4 CC 4
>
>>mult <- c("AA"=2,"BB"=5,"CC"=1,"DD"=2)
>>df1$y <- df1$y * mult[as.character(df1$x)]
>> df1
>   x  y
> 1 AA  2
> 2 BB 10
> 3 BB 15
> 4 CC  4
>
> WORKING
>
> Now day 2 with df2:
>
>>df2 <- data.frame(x = c("AA","AA","BB","BB","BB","CC","DD","DD"), y = 1:8)
>>df2$y <- df2$y * mult[as.character(df2$x)]
>> df2
>   x  y
> 1 AA  2
> 2 AA  4
> 3 BB 15
> 4 BB 20
> 5 BB 25
> 6 CC  6
> 7 DD 14
> 8 DD 16
>
> WORKING
>
>
> Ty both of you and have a good weekend.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Arnaud Gaboury

In fact I need to use William tip: Use mult[as.character(df$x)] instead of 
mult[df$x].


Let's try again with a shorter df as example:

The rule: if AA, then multiply y by 2, if BB multiply y by 5, if CC do nothing, 
if DD multiply by 2.


Let's say day 1 I have df1:

df1 <-
structure(list(x = structure(c(1L, 2L, 2L, 3L), .Label = c("AA", 
"BB", "CC"), class = "factor"), y = 1:4), .Names = c("x", "y"
), row.names = c(NA, -4L), class = "data.frame")

> df1
   x y
1 AA 1
2 BB 2
3 BB 3
4 CC 4

>mult <- c("AA"=2,"BB"=5,"CC"=1,"DD"=2)
>df1$y <- df1$y * mult[as.character(df1$x)]
> df1
   x  y
1 AA  2
2 BB 10
3 BB 15
4 CC  4

WORKING

Now day 2 with df2:

>df2 <- data.frame(x = c("AA","AA","BB","BB","BB","CC","DD","DD"), y = 1:8)
>df2$y <- df2$y * mult[as.character(df2$x)]
> df2
   x  y
1 AA  2
2 AA  4
3 BB 15
4 BB 20
5 BB 25
6 CC  6
7 DD 14
8 DD 16

WORKING


Ty both of you and have a good weekend.


Arnaud Gaboury
 
A2CT2 Ltd.


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Arnaud Gaboury
Sent: vendredi 24 février 2012 18:17
To: Sarah Goslee
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition

TY very much Sarah: your tip is doing the job:

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 13L, 14L, 14L), .Label = 
c("CL", "Cocoa", "Coffee C", "GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar 
No 11", "ZC", "ZL", "ZW"), class = "factor"), reported.Price = c(105.35, 2380, 
2407, 2408, 202.35, 202.8, 202.95, 205.85, 206.05, 206.1, 206.2, 1748, 378.8, 
379.25, 379.5, 320.61, 2.538, 2.543, 1669, 1678.5, 304.49, 321.39, 321.6, 
321.65, 322.5, 322.55, 322.8, 323.04, 3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 
24.54, 25.5, 25.55, 631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 
3L, -1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 11L, -54L, 
-52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 17L, 17L, 114L, 71L, 16L, 
27L, -3L, 3L, -3L, -89L, -1L, -1L, -51L, -51L)), .Names = c("Product", 
"reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 8L, 9L, 
10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 31L, 
32L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 2L, 3L, 1L, 33L, 34L), class = 
"data.frame")

> mult<-c(CL=100,GC=10,HG=10,NG=1000,PL=10,RB=100,SI=10,ZL=100,HO=100,KC
> =1,CC=1,SB=1,ZC=1,ZW=1) reported$reported.Price <- 
> reported$reported.Price * mult[reported$Product]

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 12L, 12L, 13L, 14L, 14L), .Label = 
c("CL", "Cocoa", "Coffee C", "GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar 
No 11", "ZC", "ZL", "ZW"), class = "factor"), reported.Price = c(10535, 23800, 
24070, 24080, 2023.5, 2028, 2029.5, 2058.5, 2060.5, 2061, 2062, 1748000, 3788, 
3792.5, 3795, 32061, 25.38, 25.43, 166900, 167850, 30449, 32139, 32160, 32165, 
32250, 32255, 32280, 32304, 3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 
25.5, 25.55, 631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, 
-1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 11L, -54L, 
-52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 17L, 17L, 114L, 71L, 16L, 
27L, -3L, 3L, -3L, -89L, -1L, -1L, -51L, -51L)), .Names = c("Product", 
"reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 8L, 9L, 
10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 31L, 
32L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 2L, 3L, 1L, 33L, 34L), class = 
"data.frame")

Have a good weekend.

Arnaud Gaboury
 
A2CT2 Ltd.
Trade: +41 22 849 88 63
Fax:   +41 22 849 88 66
arnaud.gabo...@a2ct2.com 

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
Access to this email by anyone else is unauthorized. If you are not the 
intended recipient, any disclosure, copying, distribution or any action taken 
or omitted to be taken in relia

Re: [R] data frame manipulation with condition

2012-02-24 Thread Sarah Goslee

On Fri, Feb 24, 2012 at 12:23 PM, William Dunlap  wrote:
> Use mult[as.character(df$x)] instead of mult[df$x].
> They are different when df$x is a factor and the
> character version is what you want.

R will coerce a factor to character to perform the comparison; explicitly
calling as.character() is not necessary:

> df$x
[1] AA BB CC AA DD DD
> df$x == "AA"
[1]  TRUE FALSE FALSE  TRUE FALSE FALSE

See ?factor for details.

Sarah

>  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
>  > mult <- c(AA = 10, BB = 25,DD=15)
>  > df$y <- df$y * mult[as.character(df$x)]
>  > df
>     x  y
>  1 AA 10
>  2 BB 50
>  3 CC NA
>  4 AA 40
>  5 DD 75
>  6 DD 90
>
> This gets the order right.  The NA for "CC" is because
> your vector of multipliers didn't include an entry for
> CC.  You can either add CC=1 to mult or work only on the
> subset of the data which has entries in the mult vector.
>
>  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
>  > mult <- c(AA = 10, BB = 25,DD=15)
>  > i <- as.character(df$x) %in% names(mult)
>  > df$y[i] <- df$y[i] * mult[as.character(df$x[i])]
>  > df
>     x  y
>  1 AA 10
>  2 BB 50
>  3 CC  3
>  4 AA 40
>  5 DD 75
>  6 DD 90
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
>> Behalf Of Arnaud Gaboury
>> Sent: Friday, February 24, 2012 8:37 AM
>> To: Uwe Ligges
>> Cc: r-help@r-project.org
>> Subject: Re: [R] data frame manipulation with condition
>>
>> > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
>> > mult <- c(AA = 10, BB = 25,DD=15)
>> > df$y <- df$y * mult[df$x]
>> > df
>>    x  y
>> 1 AA 10
>> 2 BB 50
>> 3 CC 45
>> 4 AA 40
>> 5 DD NA
>> 6 DD NA
>>
>> My df is in fact much more longer than the chosen example shown here. It 
>> seems your tip didn't do the
>> job.
>> I am expecting this as result :
>>
>> > df
>>    x  y
>> 1 AA 10  > if df$x==AA, df$y<-1*10
>> 2 BB 50   > if df$x==BB, df$y<-2*25
>> 3 CC 3         NOTHING
>> 4 AA 40    > if df$x==AA, df$y<-4*10
>> 5 DD 75   > if df$x==DD, df$y<-5*15
>> 6 DD 90   > if df$x==DD, df$y<-6*15
>>
>> Arnaud Gaboury
>>
>> A2CT2 Ltd.
>>
>> -Original Message-
>> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
>> Sent: vendredi 24 février 2012 17:07
>> To: Arnaud Gaboury
>> Cc: r-help@r-project.org
>> Subject: Re: [R] data frame manipulation with condition
>>
>>
>>
>> On 24.02.2012 16:59, Arnaud Gaboury wrote:
>> > TY Uwe,
>> >
>> > So I will have to write a line for each condition? Right?
>> >
>> > In fact I was trying to do something with apply in one line, but couldn't 
>> > achieve any result. In
>> fact, all my transformation will be multiplying one object by a specific 
>> number according to the value
>> of df$x.
>>
>> In that case:
>>
>> mult <- c(AA = 10, BB = 25)
>>
>> Then:
>>
>>
>> df$y <- df$y * mult[df$x]
>>
>>
>> Uwe Ligges
>>
>>
>> >
>> > Arnaud Gaboury
>> >
>> > A2CT2 Ltd.
>> >
>> >
>> > -Original Message-
>> > From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
>> > Sent: vendredi 24 février 2012 16:33
>> > To: Arnaud Gaboury
>> > Cc: r-help@r-project.org
>> > Subject: Re: [R] data frame manipulation with condition
>> >
>> >
>> >
>> > On 24.02.2012 16:25, Arnaud Gaboury wrote:
>> >> Dear list,
>> >>
>> >> n00b question, but still can't find any easy answer.
>> >>
>> >> Here is a df:
>> >
>> >
>> > Change
>> >
>> >>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))
>> >
>> > to
>> >
>> >    df<- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)
>> >
>> > to make your object a sensible data.frame.
>> >
>> >
>> >
>> >>> df
>> >>      x y
>> >> 1 AA 1
>> >> 2 BB 2
>> >> 3 CC 3
>> >> 4 AA 4
>> >>
>> >>
>> >> I want to modify this df this way :
>> >>    if df$x=="AA" then df$y=df$y*10
>> >
>> > df$y[df$x=="AA"]<- df$y[df$x=="AA"] * 25
>> >
>> > ...
>> >
>> >
>> > Uwe Ligges
>> >
>> >
>> >>    if df$x=="BB" then df$y=df$y*25
>> >
>> >
>> >
>> >
>> >> and so on with other conditions.
>> >>
>> >> TY for any help.
>> >>
>> >> Trading
>> >>
>> >> A2CT2 Ltd.
>> >>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread William Dunlap

Use mult[as.character(df$x)] instead of mult[df$x].
They are different when df$x is a factor and the
character version is what you want.

  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
  > mult <- c(AA = 10, BB = 25,DD=15)
  > df$y <- df$y * mult[as.character(df$x)]
  > df
 x  y
  1 AA 10
  2 BB 50
  3 CC NA
  4 AA 40
  5 DD 75
  6 DD 90

This gets the order right.  The NA for "CC" is because
your vector of multipliers didn't include an entry for
CC.  You can either add CC=1 to mult or work only on the
subset of the data which has entries in the mult vector.

  > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
  > mult <- c(AA = 10, BB = 25,DD=15)
  > i <- as.character(df$x) %in% names(mult)
  > df$y[i] <- df$y[i] * mult[as.character(df$x[i])]
  > df
 x  y
  1 AA 10
  2 BB 50
  3 CC  3
  4 AA 40
  5 DD 75
  6 DD 90

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Arnaud Gaboury
> Sent: Friday, February 24, 2012 8:37 AM
> To: Uwe Ligges
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation with condition
> 
> > df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
> > mult <- c(AA = 10, BB = 25,DD=15)
> > df$y <- df$y * mult[df$x]
> > df
>x  y
> 1 AA 10
> 2 BB 50
> 3 CC 45
> 4 AA 40
> 5 DD NA
> 6 DD NA
> 
> My df is in fact much more longer than the chosen example shown here. It 
> seems your tip didn't do the
> job.
> I am expecting this as result :
> 
> > df
>x  y
> 1 AA 10  > if df$x==AA, df$y<-1*10
> 2 BB 50   > if df$x==BB, df$y<-2*25
> 3 CC 3     NOTHING
> 4 AA 40> if df$x==AA, df$y<-4*10
> 5 DD 75   > if df$x==DD, df$y<-5*15
> 6 DD 90   > if df$x==DD, df$y<-6*15
> 
> Arnaud Gaboury
> 
> A2CT2 Ltd.
> 
> -Original Message-
> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
> Sent: vendredi 24 février 2012 17:07
> To: Arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation with condition
> 
> 
> 
> On 24.02.2012 16:59, Arnaud Gaboury wrote:
> > TY Uwe,
> >
> > So I will have to write a line for each condition? Right?
> >
> > In fact I was trying to do something with apply in one line, but couldn't 
> > achieve any result. In
> fact, all my transformation will be multiplying one object by a specific 
> number according to the value
> of df$x.
> 
> In that case:
> 
> mult <- c(AA = 10, BB = 25)
> 
> Then:
> 
> 
> df$y <- df$y * mult[df$x]
> 
> 
> Uwe Ligges
> 
> 
> >
> > Arnaud Gaboury
> >
> > A2CT2 Ltd.
> >
> >
> > -Original Message-
> > From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
> > Sent: vendredi 24 février 2012 16:33
> > To: Arnaud Gaboury
> > Cc: r-help@r-project.org
> > Subject: Re: [R] data frame manipulation with condition
> >
> >
> >
> > On 24.02.2012 16:25, Arnaud Gaboury wrote:
> >> Dear list,
> >>
> >> n00b question, but still can't find any easy answer.
> >>
> >> Here is a df:
> >
> >
> > Change
> >
> >>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))
> >
> > to
> >
> >df<- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)
> >
> > to make your object a sensible data.frame.
> >
> >
> >
> >>> df
> >>  x y
> >> 1 AA 1
> >> 2 BB 2
> >> 3 CC 3
> >> 4 AA 4
> >>
> >>
> >> I want to modify this df this way :
> >>if df$x=="AA" then df$y=df$y*10
> >
> > df$y[df$x=="AA"]<- df$y[df$x=="AA"] * 25
> >
> > ...
> >
> >
> > Uwe Ligges
> >
> >
> >>if df$x=="BB" then df$y=df$y*25
> >
> >
> >
> >
> >> and so on with other conditions.
> >>
> >> TY for any help.
> >>
> >> Trading
> >>
> >> A2CT2 Ltd.
> >>
> >>
> >> Arnaud Gaboury
> >>
> >> A2CT2 Ltd.
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Arnaud Gaboury

TY very much Sarah: your tip is doing the job:

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", 
"GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", 
"ZL", "ZW"), class = "factor"), reported.Price = c(105.35, 2380, 
2407, 2408, 202.35, 202.8, 202.95, 205.85, 206.05, 206.1, 206.2, 
1748, 378.8, 379.25, 379.5, 320.61, 2.538, 2.543, 1669, 1678.5, 
304.49, 321.39, 321.6, 321.65, 322.5, 322.55, 322.8, 323.04, 
3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 
631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, 
-1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 
11L, -54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 
17L, 17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, 
-51L, -51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 
30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")

> mult<-c(CL=100,GC=10,HG=10,NG=1000,PL=10,RB=100,SI=10,ZL=100,HO=100,KC=1,CC=1,SB=1,ZC=1,ZW=1)
> reported$reported.Price <- reported$reported.Price * mult[reported$Product]

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", 
"GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", 
"ZL", "ZW"), class = "factor"), reported.Price = c(10535, 23800, 
24070, 24080, 2023.5, 2028, 2029.5, 2058.5, 2060.5, 2061, 2062, 
1748000, 3788, 3792.5, 3795, 32061, 25.38, 25.43, 166900, 167850, 
30449, 32139, 32160, 32165, 32250, 32255, 32280, 32304, 3390, 
3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 631.75, 
638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, -1L, 
-2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 11L, 
-54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 17L, 
17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, -51L, 
-51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 
30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")

Have a good weekend.

Arnaud Gaboury
 
A2CT2 Ltd.
Trade: +41 22 849 88 63
Fax:   +41 22 849 88 66
arnaud.gabo...@a2ct2.com 

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
Access to this email by anyone else is unauthorized. If you are not the 
intended recipient, any disclosure, copying, distribution or any action taken 
or omitted to be taken in reliance on it, is prohibited and may be unlawful. If 
you have received this email in error please notify the sender. 


-Original Message-
From: Sarah Goslee [mailto:sarah.gos...@gmail.com] 
Sent: vendredi 24 février 2012 17:54
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition

You need, as I already suggested, to use a value of 1 for levels you don't want 
to change.

> mult <- c(AA = 10, BB = 25, CC=1, DD=15) mult[df$x]
AA BB CC AA DD DD
10 25  1 10 15 15
> df$y * mult[df$x]
AA BB CC AA DD DD
10 50  3 40 75 90


On Fri, Feb 24, 2012 at 11:36 AM, Arnaud Gaboury  
wrote:
>> df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6) mult 
>> <- c(AA = 10, BB = 25,DD=15) df$y <- df$y * mult[df$x] df
>   x  y
> 1 AA 10
> 2 BB 50
> 3 CC 45
> 4 AA 40
> 5 DD NA
> 6 DD NA
>
> My df is in fact much more longer than the chosen example shown here. It 
> seems your tip didn't do the job.
> I am expecting this as result :
>
>> df
>   x  y
> 1 AA 10  > if df$x==AA, df$y<-1*10
> 2 BB 50   > if df$x==BB, df$y<-2*25
> 3 CC 3         NOTHING
> 4 AA 40    > if df$x==AA, df$y<-4*10
> 5 DD 75   > if df$x==DD, df$y<-5*15
> 6 DD 90   > if df$x==DD, df$y<-6*15
>
> Arnaud Gaboury
&

Re: [R] data frame manipulation with condition

2012-02-24 Thread Arnaud Gaboury

OK Uwe, I understand, and I will be more explicit.

Here is how could my df be:

reported <-
structure(list(Product = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 4L, 5L, 5L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 
11L, 12L, 12L, 13L, 14L, 14L), .Label = c("CL", "Cocoa", "Coffee C", 
"GC", "HG", "HO", "NG", "PL", "RB", "SI", "Sugar No 11", "ZC", 
"ZL", "ZW"), class = "factor"), reported.Price = c(105.35, 2380, 
2407, 2408, 202.35, 202.8, 202.95, 205.85, 206.05, 206.1, 206.2, 
1748, 378.8, 379.25, 379.5, 320.61, 2.538, 2.543, 1669, 1678.5, 
304.49, 321.39, 321.6, 321.65, 322.5, 322.55, 322.8, 323.04, 
3390, 3397.5, 24.16, 24.2, 24.22, 24.23, 24.54, 25.5, 25.55, 
631.75, 638, 53.77, 630.75, 633), reported.Nbr.Lots = c(6L, 3L, 
-1L, -2L, -40L, -1L, -1L, 10L, 5L, 6L, 19L, 17L, 23L, 12L, 35L, 
11L, -54L, -52L, 26L, 26L, 10L, -10L, 1L, 4L, 4L, 1L, 5L, 5L, 
17L, 17L, 114L, 71L, 16L, 27L, -3L, 3L, -3L, -89L, -1L, -1L, 
-51L, -51L)), .Names = c("Product", "reported.Price", "reported.Nbr.Lots"
), row.names = c(7L, 4L, 5L, 6L, 13L, 14L, 15L, 16L, 17L, 18L, 
19L, 8L, 9L, 10L, 11L, 12L, 20L, 21L, 22L, 23L, 35L, 36L, 37L, 
38L, 39L, 40L, 41L, 42L, 31L, 32L, 24L, 25L, 26L, 27L, 28L, 29L, 
30L, 2L, 3L, 1L, 33L, 34L), class = "data.frame")


Row will change. I am looking to multiply reported.Price by 100 IF Product=CL, 
multiply by 10 IF product=GC, multiply by 100 IF product=HG, multiply by 1000 
IF Product=NG, multiply by 100 IF product=RB.

I hope I am clear enough, and YES I have tried many workarounds myself before 
posting. Feel free to ignore my post if you think I am lazy and disrespectful 
to the list.


Arnaud Gaboury
 
A2CT2 Ltd.



-----Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
Sent: vendredi 24 février 2012 17:41
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition



On 24.02.2012 17:36, Arnaud Gaboury wrote:
>> df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
>> mult<- c(AA = 10, BB = 25,DD=15)
>> df$y<- df$y * mult[df$x]
>> df
> x  y
> 1 AA 10
> 2 BB 50
> 3 CC 45
> 4 AA 40
> 5 DD NA
> 6 DD NA
>
> My df is in fact much more longer than the chosen example shown here. It 
> seems your tip didn't do the job.
> I am expecting this as result :


This is not the I do the job for you hotline. You are free to think a little 
bit yourself given you have not managed in two attempts to describe your 
problem sufficiently well!

Uwe Ligges



>> df
> x  y
> 1 AA 10  >  if df$x==AA, df$y<-1*10
> 2 BB 50   >  if df$x==BB, df$y<-2*25
> 3 CC 3 NOTHING
> 4 AA 40---->  if df$x==AA, df$y<-4*10
> 5 DD 75   >  if df$x==DD, df$y<-5*15
> 6 DD 90   >  if df$x==DD, df$y<-6*15
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
> -Original Message-
> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
> Sent: vendredi 24 février 2012 17:07
> To: Arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation with condition
>
>
>
> On 24.02.2012 16:59, Arnaud Gaboury wrote:
>> TY Uwe,
>>
>> So I will have to write a line for each condition? Right?
>>
>> In fact I was trying to do something with apply in one line, but couldn't 
>> achieve any result. In fact, all my transformation will be multiplying one 
>> object by a specific number according to the value of df$x.
>
> In that case:
>
> mult<- c(AA = 10, BB = 25)
>
> Then:
>
>
> df$y<- df$y * mult[df$x]
>
>
> Uwe Ligges
>
>
>>
>> Arnaud Gaboury
>>
>> A2CT2 Ltd.
>>
>>
>> -Original Message-
>> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
>> Sent: vendredi 24 février 2012 16:33
>> To: Arnaud Gaboury
>> Cc: r-help@r-project.org
>> Subject: Re: [R] data frame manipulation with condition
>>
>>
>>
>> On 24.02.2012 16:25, Arnaud Gaboury wrote:
>>> Dear list,
>>>
>>> n00b question, but still can't find any easy answer.
>>>
>>> Here is a df:
>>
>>
>> Change
>>
>>>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))
>>
>> to
>>
>> df<- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)
>>
>> to make your object a sensible data.frame.
&

Re: [R] data frame manipulation with condition

2012-02-24 Thread Sarah Goslee

You need, as I already suggested, to use a value of 1 for levels you don't want
to change.

> mult <- c(AA = 10, BB = 25, CC=1, DD=15)
> mult[df$x]
AA BB CC AA DD DD
10 25  1 10 15 15
> df$y * mult[df$x]
AA BB CC AA DD DD
10 50  3 40 75 90


On Fri, Feb 24, 2012 at 11:36 AM, Arnaud Gaboury
 wrote:
>> df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
>> mult <- c(AA = 10, BB = 25,DD=15)
>> df$y <- df$y * mult[df$x]
>> df
>   x  y
> 1 AA 10
> 2 BB 50
> 3 CC 45
> 4 AA 40
> 5 DD NA
> 6 DD NA
>
> My df is in fact much more longer than the chosen example shown here. It 
> seems your tip didn't do the job.
> I am expecting this as result :
>
>> df
>   x  y
> 1 AA 10  > if df$x==AA, df$y<-1*10
> 2 BB 50   > if df$x==BB, df$y<-2*25
> 3 CC 3         NOTHING
> 4 AA 40    > if df$x==AA, df$y<-4*10
> 5 DD 75   > if df$x==DD, df$y<-5*15
> 6 DD 90   > if df$x==DD, df$y<-6*15
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
> -Original Message-
> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
> Sent: vendredi 24 février 2012 17:07
> To: Arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation with condition
>
>
>
> On 24.02.2012 16:59, Arnaud Gaboury wrote:
>> TY Uwe,
>>
>> So I will have to write a line for each condition? Right?
>>
>> In fact I was trying to do something with apply in one line, but couldn't 
>> achieve any result. In fact, all my transformation will be multiplying one 
>> object by a specific number according to the value of df$x.
>
> In that case:
>
> mult <- c(AA = 10, BB = 25)
>
> Then:
>
>
> df$y <- df$y * mult[df$x]
>
>
> Uwe Ligges
>
>
>>
>> Arnaud Gaboury
>>
>> A2CT2 Ltd.
>>
>>
>> -Original Message-
>> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
>> Sent: vendredi 24 février 2012 16:33
>> To: Arnaud Gaboury
>> Cc: r-help@r-project.org
>> Subject: Re: [R] data frame manipulation with condition
>>
>>
>>
>> On 24.02.2012 16:25, Arnaud Gaboury wrote:
>>> Dear list,
>>>
>>> n00b question, but still can't find any easy answer.
>>>
>>> Here is a df:
>>
>>
>> Change
>>
>>>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))
>>
>> to
>>
>>    df<- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)
>>
>> to make your object a sensible data.frame.
>>
>>
>>
>>>> df
>>>      x y
>>> 1 AA 1
>>> 2 BB 2
>>> 3 CC 3
>>> 4 AA 4
>>>
>>>
>>> I want to modify this df this way :
>>>    if df$x=="AA" then df$y=df$y*10
>>
>> df$y[df$x=="AA"]<- df$y[df$x=="AA"] * 25
>>
>> ...
>>
>>
>> Uwe Ligges
>>
>>
>>>    if df$x=="BB" then df$y=df$y*25
>>
>>
-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Uwe Ligges




On 24.02.2012 17:36, Arnaud Gaboury wrote:

df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
mult<- c(AA = 10, BB = 25,DD=15)
df$y<- df$y * mult[df$x]
df

x  y
1 AA 10
2 BB 50
3 CC 45
4 AA 40
5 DD NA
6 DD NA

My df is in fact much more longer than the chosen example shown here. It seems 
your tip didn't do the job.
I am expecting this as result :



This is not the I do the job for you hotline. You are free to think a 
little bit yourself given you have not managed in two attempts to 
describe your problem sufficiently well!


Uwe Ligges




df

x  y
1 AA 10  >  if df$x==AA, df$y<-1*10
2 BB 50   >  if df$x==BB, df$y<-2*25
3 CC 3 NOTHING
4 AA 40>  if df$x==AA, df$y<-4*10
5 DD 75   >  if df$x==DD, df$y<-5*15
6 DD 90   >  if df$x==DD, df$y<-6*15

Arnaud Gaboury

A2CT2 Ltd.

-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
Sent: vendredi 24 février 2012 17:07
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition



On 24.02.2012 16:59, Arnaud Gaboury wrote:

TY Uwe,

So I will have to write a line for each condition? Right?

In fact I was trying to do something with apply in one line, but couldn't 
achieve any result. In fact, all my transformation will be multiplying one 
object by a specific number according to the value of df$x.


In that case:

mult<- c(AA = 10, BB = 25)

Then:


df$y<- df$y * mult[df$x]


Uwe Ligges




Arnaud Gaboury

A2CT2 Ltd.


-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
Sent: vendredi 24 février 2012 16:33
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition



On 24.02.2012 16:25, Arnaud Gaboury wrote:

Dear list,

n00b question, but still can't find any easy answer.

Here is a df:



Change


df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))


to

df<- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)

to make your object a sensible data.frame.




df

  x y
1 AA 1
2 BB 2
3 CC 3
4 AA 4


I want to modify this df this way :
if df$x=="AA" then df$y=df$y*10


df$y[df$x=="AA"]<- df$y[df$x=="AA"] * 25

...


Uwe Ligges



if df$x=="BB" then df$y=df$y*25






and so on with other conditions.

TY for any help.

Trading

A2CT2 Ltd.


Arnaud Gaboury

A2CT2 Ltd.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Arnaud Gaboury

> df<- data.frame(x = c("AA","BB","CC","AA","DD","DD"), y = 1:6)
> mult <- c(AA = 10, BB = 25,DD=15)
> df$y <- df$y * mult[df$x]
> df
   x  y
1 AA 10
2 BB 50
3 CC 45
4 AA 40
5 DD NA
6 DD NA

My df is in fact much more longer than the chosen example shown here. It seems 
your tip didn't do the job.
I am expecting this as result :

> df
   x  y
1 AA 10  > if df$x==AA, df$y<-1*10 
2 BB 50   > if df$x==BB, df$y<-2*25 
3 CC 3 NOTHING
4 AA 40> if df$x==AA, df$y<-4*10 
5 DD 75   > if df$x==DD, df$y<-5*15
6 DD 90   > if df$x==DD, df$y<-6*15

Arnaud Gaboury
 
A2CT2 Ltd.

-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
Sent: vendredi 24 février 2012 17:07
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition



On 24.02.2012 16:59, Arnaud Gaboury wrote:
> TY Uwe,
>
> So I will have to write a line for each condition? Right?
>
> In fact I was trying to do something with apply in one line, but couldn't 
> achieve any result. In fact, all my transformation will be multiplying one 
> object by a specific number according to the value of df$x.

In that case:

mult <- c(AA = 10, BB = 25)

Then:


df$y <- df$y * mult[df$x]


Uwe Ligges


>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
>
> -Original Message-
> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
> Sent: vendredi 24 février 2012 16:33
> To: Arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation with condition
>
>
>
> On 24.02.2012 16:25, Arnaud Gaboury wrote:
>> Dear list,
>>
>> n00b question, but still can't find any easy answer.
>>
>> Here is a df:
>
>
> Change
>
>>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))
>
> to
>
>df<- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)
>
> to make your object a sensible data.frame.
>
>
>
>>> df
>>  x y
>> 1 AA 1
>> 2 BB 2
>> 3 CC 3
>> 4 AA 4
>>
>>
>> I want to modify this df this way :
>>if df$x=="AA" then df$y=df$y*10
>
> df$y[df$x=="AA"]<- df$y[df$x=="AA"] * 25
>
> ...
>
>
> Uwe Ligges
>
>
>>if df$x=="BB" then df$y=df$y*25
>
>
>
>
>> and so on with other conditions.
>>
>> TY for any help.
>>
>> Trading
>>
>> A2CT2 Ltd.
>>
>>
>> Arnaud Gaboury
>>
>> A2CT2 Ltd.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Sarah Goslee

I sent it to you and the list, as is standard practice.

You must know in advance what level of your factor goes with what multiplicand.
Use that information to set up the association. The order within your data frame
is irrelevant, only the order R uses for factor levels is of importance.

Sarah

On Fri, Feb 24, 2012 at 11:16 AM, Arnaud Gaboury
 wrote:
> No sorry! Your email was in fact NOT in the lsit, and was not watching my 
> regular box, but only the list one.
>
> TY for your tip. But in fact my df is much more complex, and objects, numbers 
> of it and their order will change
>
> Have a good weekend
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
> Trade: +41 22 849 88 63
> Fax:   +41 22 849 88 66
> arnaud.gabo...@a2ct2.com
>
> -Original Message-
> From: Sarah Goslee [mailto:sarah.gos...@gmail.com]
> Sent: vendredi 24 février 2012 17:12
> To: Arnaud Gaboury
> Subject: Re: [R] data frame manipulation with condition
>
> Did you not see my solution?
>
> On Fri, Feb 24, 2012 at 10:59 AM, Arnaud Gaboury  
> wrote:
>> TY Uwe,
>>
>> So I will have to write a line for each condition? Right?
>>
>> In fact I was trying to do something with apply in one line, but couldn't 
>> achieve any result. In fact, all my transformation will be multiplying one 
>> object by a specific number according to the value of df$x.
>>
>> Arnaud Gaboury
>>
>> A2CT2 Ltd.
>>
>>
>> -Original Message-
>> From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
>> Sent: vendredi 24 février 2012 16:33
>> To: Arnaud Gaboury
>> Cc: r-help@r-project.org
>> Subject: Re: [R] data frame manipulation with condition
>>
>>
>>
>> On 24.02.2012 16:25, Arnaud Gaboury wrote:
>>> Dear list,
>>>
>>> n00b question, but still can't find any easy answer.
>>>
>>> Here is a df:
>>
>>
>> Change
>>
>>>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))
>>
>> to
>>
>>  df <- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)
>>
>> to make your object a sensible data.frame.
>>
>>
>>
>>>> df
>>>     x y
>>> 1 AA 1
>>> 2 BB 2
>>> 3 CC 3
>>> 4 AA 4
>>>
>>>
>>> I want to modify this df this way :
>>>   if df$x=="AA" then df$y=df$y*10
>>
>> df$y[df$x=="AA"] <- df$y[df$x=="AA"] * 25
>>
>> ...
>>
>>
>> Uwe Ligges
>>
>>
>>>   if df$x=="BB" then df$y=df$y*25
>>
>>
>>
>>
>>> and so on with other conditions.
>>>
>>> TY for any help.
>>>
>>> Trading
>>>
>>> A2CT2 Ltd.
>>>
>>>
>>> Arnaud Gaboury
>>>
>>> A2CT2 Ltd.
>>>


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Uwe Ligges




On 24.02.2012 16:59, Arnaud Gaboury wrote:

TY Uwe,

So I will have to write a line for each condition? Right?

In fact I was trying to do something with apply in one line, but couldn't 
achieve any result. In fact, all my transformation will be multiplying one 
object by a specific number according to the value of df$x.


In that case:

mult <- c(AA = 10, BB = 25)

Then:


df$y <- df$y * mult[df$x]


Uwe Ligges




Arnaud Gaboury

A2CT2 Ltd.


-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de]
Sent: vendredi 24 février 2012 16:33
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition



On 24.02.2012 16:25, Arnaud Gaboury wrote:

Dear list,

n00b question, but still can't find any easy answer.

Here is a df:



Change


df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))


to

   df<- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)

to make your object a sensible data.frame.




df

 x y
1 AA 1
2 BB 2
3 CC 3
4 AA 4


I want to modify this df this way :
   if df$x=="AA" then df$y=df$y*10


df$y[df$x=="AA"]<- df$y[df$x=="AA"] * 25

...


Uwe Ligges



   if df$x=="BB" then df$y=df$y*25






and so on with other conditions.

TY for any help.

Trading

A2CT2 Ltd.


Arnaud Gaboury

A2CT2 Ltd.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Arnaud Gaboury

TY Uwe,

So I will have to write a line for each condition? Right?

In fact I was trying to do something with apply in one line, but couldn't 
achieve any result. In fact, all my transformation will be multiplying one 
object by a specific number according to the value of df$x.

Arnaud Gaboury

A2CT2 Ltd.

-Original Message-
From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
Sent: vendredi 24 février 2012 16:33
To: Arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation with condition

On 24.02.2012 16:25, Arnaud Gaboury wrote:
> Dear list,
>
> n00b question, but still can't find any easy answer.
>
> Here is a df:

Change

>> df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))

to

  df <- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)

to make your object a sensible data.frame.

>> df
> x y
> 1 AA 1
> 2 BB 2
> 3 CC 3
> 4 AA 4
>
>
> I want to modify this df this way :
>   if df$x=="AA" then df$y=df$y*10

df$y[df$x=="AA"] <- df$y[df$x=="AA"] * 25

...

Uwe Ligges

>   if df$x=="BB" then df$y=df$y*25

> and so on with other conditions.
>
> TY for any help.
>
> Trading
>
> A2CT2 Ltd.
>
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with condition

2012-02-24 Thread Uwe Ligges




On 24.02.2012 16:25, Arnaud Gaboury wrote:

Dear list,

n00b question, but still can't find any easy answer.

Here is a df:



Change


df<-data.frame(cbind(x=c("AA","BB","CC","AA"),y=1:4))


to

 df <- data.frame(x = c("AA","BB","CC","AA"), y = 1:4)

to make your object a sensible data.frame.




df

x y
1 AA 1
2 BB 2
3 CC 3
4 AA 4


I want to modify this df this way :
  if df$x=="AA" then df$y=df$y*10


df$y[df$x=="AA"] <- df$y[df$x=="AA"] * 25

...


Uwe Ligges



  if df$x=="BB" then df$y=df$y*25






and so on with other conditions.

TY for any help.

Trading

A2CT2 Ltd.


Arnaud Gaboury

A2CT2 Ltd.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data frame manipulation by eliminating rows containing extreme values

2011-10-23 Thread aajit75

Hi David,

Thanks for the reply,


f=function(x){quantile(x, c(0.25, 0.75),na.rm = TRUE) - matrix(IQR(x,na.rm =
TRUE) * c(1.5), nrow = 1) %*% c(-1, 1)} 

Here parameter 1.5 is set for example in the above function as argument, it
can be even more may be 3.0 after analyzing actual data. Here expectation is
to find cut-off on both sides(higher and lower values) for each variable as
like in box plot. And then I would like to eliminate observations based on
the cut-off.

For the second point, I am extremly sorry. It was because of the typo
mistake, actually in 
xyz <- lapply(data1, f) here it is data2

n <- 100 
x1 <- runif(n) 
x2 <- runif(n) 
x3 <- x1 + x2 + runif(n)/10 
x4 <- x1 + x2 + x3 + runif(n)/10 
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE)) 
x6 <- 1*(x5=='a' | x5=='c') 
data1 <- cbind(x1,x2,x3,x4,x5,x6) 
data2 <- data.frame(data1) 
xyz <- lapply(data2, f) 
str (xyz)

Now it has list of six only
List of 6
 $ x1: num [1, 1:2] 0.7797 0.0613
 $ x2: num [1, 1:2] 0.9533 0.0194
 $ x3: num [1, 1:2] 1.438 0.532
 $ x4: num [1, 1:2] 2.85 1.03
 $ x5: num [1, 1:2] 4 0
 $ x6: num [1, 1:2] 1.5 -0.5

Third point you mentioned is the problem to resolved, now I am overwriting
data2 applying these cut-offs for each variable. Is there any efficient way
to do this?

 data2 <- subset (data2, x1<=xyz$x1[,1] &  x1>=xyz$x1[,2]) 
 data2 <- subset (data2, x1<=xyz$x2[,1] &  x1>=xyz$x2[,2]) 

On the last point you mentioned, I agree on the removing "extreme values" is
a serious distortion of the data.  But in my data values to some
observations is set to very high number like say . Also this is
not consistent across all variables in the data. So I can set value higher
than 1.5 in the function and get cut-offs for each varibales and remove such
obervations. As rm.outlier removes only one value, I am using above
function.

Thanks for the help in advance.

Regards,
-Ajit




--
View this message in context: 
http://r.789695.n4.nabble.com/Data-frame-manipulation-by-eliminating-rows-containing-extreme-values-tp3927941p3929927.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data frame manipulation by eliminating rows containing extreme values

2011-10-22 Thread David Winsemius



On Oct 22, 2011, at 6:57 AM, aajit75 wrote:


Dear All,

I have got the limits for removing extreme values for each variables  
using

following function .

f=function(x){quantile(x, c(0.25, 0.75),na.rm = TRUE) -  
matrix(IQR(x,na.rm =

TRUE) * c(1.5), nrow = 1) %*% c(-1, 1)}


I think you need to clarify what your expectations are for that  
function. First you calculate the interquartile range and then you  
subtract 1.5 times the interquartile range. Exactly how does that  
identify extreme values? It appears you would be removing substantial  
amounts of your data.





#Example:

n <- 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- x1 + x2 + runif(n)/10
x4 <- x1 + x2 + x3 + runif(n)/10
x5 <- factor(sample(c('a','b','c'),n,replace=TRUE))
x6 <- 1*(x5=='a' | x5=='c')
data1 <- cbind(x1,x2,x3,x4,x5,x6)
data2 <- data.frame(data1)
xyz <- lapply(data1, f)


Have you looked at the output of that operation? I get a list of 600  
elements:


> str(xyz)
List of 600
 $ : num [1, 1:2] 0.315 0.315
 $ : num [1, 1:2] 0.0132 0.0132
 $ : num [1, 1:2] 0.519 0.519
 $ : num [1, 1:2] 0.0917 0.0917
snipped




#Now, I can eliminate those rows(observations) from the data which  
contains

extreme values for each of the variables one by one as below.


And now you propose to overwrite data2 not one but twice?



data2 <- subset (data2, x1<=xyz$x1[,1] &  x1>=xyz$x1[,2])
data2 <- subset (data2, x1<=xyz$x2[,1] &  x1>=xyz$x2[,2])

.
.
and so on..

But my data has more number of variables (more than 120),  can any  
body

suggest efficient way of eliminating rows containg extreme values?


The first step would be arriving at a sensible definiton for "extreme  
value". And you should also consider that these are data and removing  
"extreme values" is a serious distortion of the data. There needs to  
be some justification for cutting out the extremes.


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation using function

2010-07-09 Thread David Winsemius

Really? I don't usually think of Vectorize as a performance  
enhancement, probably because my use of with a complex function then  
gets applied to 4.5 million records. I need to go out, get a cup of  
coffee, and leave it alone for about half an hour. I tried  recently  
to figure out how I can do the matrix look-up and function application  
without the Vectorize route but gave up after a couple of hours after  
realizing that I had a method that worked and I had spent way more  
time on it than just doing it would have.

Glad it helped.
David.

On Jul 9, 2010, at 11:01 AM, harsh yadav wrote:

> Hi,
>
> Thanks a lot.
> The Vectorize method worked and its much faster than looping through  
> the data frame.
>
> Regards,
> Harsh Yadav
>
> On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius  > wrote:
>
> On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:
>
>
> I have a data frame:
> id  
> url urlType
> 1 1  www.yahoo.com  www.yahoo.com>1
> 2 2  www.google.com/?search=  search=> 2
> 3 3  www.google.com  www.google.com>   1
> 4 4  www.yahoo.com/?query=  query=>   2
> 5 5  www.gmail.com  www.gmail.com> 1
>
> This is not output from ?dput, which means more work to read it in.
>
>
> Yeah it was kind of pain, but ...
>
> dta <- read.table(textConnection(' id  
> url urlType
>
> 1 1  "www.yahoo.com "  1
> 2 2  "www.google.com/?search=  search=>" 2
> 3 3  "www.google.com " 1
> 4 4  "www.yahoo.com/?query=  query=>"   2
> 5 5  "www.gmail.com " 1') )
>
>
>
>
> Here is the definition for WHITELIST:-
> WHITELIST = "[?]query=, [?]search="
> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>
> What is the 'trim' function?  I do not have that defined.
>
> Perhaps David's answer will work for you...
>
> Seems to ... after I fixed my incorrect cmd-V paste of the function  
> name and guessing that trim was the one in gdata:
>
> > require(gdata)
>
> > checkBaseLine <- function(s){
> + for (listItem in WHITELIST){
> + if(regexpr(as.character(listItem), s)[1] > -1){
> + return(TRUE)
> + }
> + }
> + return(FALSE)
> + }
> >
> > #Here is the definition for WHITELIST:-
>
> >
> > WHITELIST = "[?]query=, [?]search="
> > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> > vcheck <- Vectorize(checkBaseLine)
> >
> > vcheck <- Vectorize(checkBaseLine)
> >
> > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
> [1] www.google.com/?search=  
> www.yahoo.com/?query= 
>  
> 5 Levels: www.gmail.com  www.google.com 
>  > ... www.yahoo.com/?query= 
>
> -- 
> David.
>

David Winsemius, MD
West Hartford, CT


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation using function

2010-07-09 Thread harsh yadav

Hi,

Thanks a lot.
The Vectorize method worked and its much faster than looping through the
data frame.

Regards,
Harsh Yadav

On Thu, Jul 8, 2010 at 11:06 PM, David Winsemius wrote:

>
> On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:
>
>
>>  I have a data frame:
>>> id url
>>> urlType
>>> 1 1  www.yahoo.com 
>>>  1
>>> 2 2  www.google.com/?search= 
>>>   2
>>> 3 3  www.google.com 
>>>   1
>>> 4 4  www.yahoo.com/?query= 
>>> 2
>>> 5 5  www.gmail.com 
>>>   1
>>>
>>
>> This is not output from ?dput, which means more work to read it in.
>>
>>
> Yeah it was kind of pain, but ...
>
> dta <- read.table(textConnection(' id url
>   urlType
>
> 1 1  "www.yahoo.com "  1
> 2 2  "www.google.com/?search= " 2
> 3 3  "www.google.com " 1
> 4 4  "www.yahoo.com/?query= "   2
> 5 5  "www.gmail.com " 1') )
>
>
>
>
>>  Here is the definition for WHITELIST:-
>>> WHITELIST = "[?]query=, [?]search="
>>> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>>>
>>
>> What is the 'trim' function?  I do not have that defined.
>>
>> Perhaps David's answer will work for you...
>>
>
> Seems to ... after I fixed my incorrect cmd-V paste of the function name
> and guessing that trim was the one in gdata:
>
> > require(gdata)
>
> > checkBaseLine <- function(s){
> + for (listItem in WHITELIST){
> + if(regexpr(as.character(listItem), s)[1] > -1){
> + return(TRUE)
> + }
> + }
> + return(FALSE)
> + }
> >
> > #Here is the definition for WHITELIST:-
>
> >
> > WHITELIST = "[?]query=, [?]search="
> > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> > vcheck <- Vectorize(checkBaseLine)
> >
> > vcheck <- Vectorize(checkBaseLine)
> >
> > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
> [1] www.google.com/?search= 
> www.yahoo.com/?query= 
> 5 Levels: www.gmail.com  www.google.com <
> http://www.google.com> ... www.yahoo.com/?query= <
> http://www.yahoo.com/?query=>
>
> --
> David.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation using function

2010-07-08 Thread David Winsemius

On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:

I have a data frame:
 id  
url urlType
1 1  www.yahoo.com www.yahoo.com>1
2 2  www.google.com/?search=  2
3 3  www.google.com www.google.com>   1
4 4  www.yahoo.com/?query=    2
5 5  www.gmail.com www.gmail.com> 1

This is not output from ?dput, which means more work to read it in.

Yeah it was kind of pain, but ...

dta <- read.table(textConnection(' id  
url urlType

1 1  "www.yahoo.com "  1
2 2  "www.google.com/?search= " 2

3 3  "www.google.com " 1
4 4  "www.yahoo.com/?query= "   2
5 5  "www.gmail.com " 1') )

Here is the definition for WHITELIST:-
WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))

What is the 'trim' function?  I do not have that defined.

Perhaps David's answer will work for you...

Seems to ... after I fixed my incorrect cmd-V paste of the function  
name and guessing that trim was the one in gdata:

> require(gdata)
> checkBaseLine <- function(s){
+ for (listItem in WHITELIST){
+ if(regexpr(as.character(listItem), s)[1] > -1){
+ return(TRUE)
+ }
+ }
+ return(FALSE)
+ }
>
> #Here is the definition for WHITELIST:-
>
> WHITELIST = "[?]query=, [?]search="
> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> vcheck <- Vectorize(checkBaseLine)
>
> vcheck <- Vectorize(checkBaseLine)
>
> dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
[1] www.google.com/?search=  www.yahoo.com/?query= 

5 Levels: www.gmail.com  www.google.com  ... www.yahoo.com/?query= 

--
David.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation using function

2010-07-08 Thread Erik Iverson




I have a data frame:

  id url 
urlType
1 1  www.yahoo.com    
 1
2 2  www.google.com/?search=    
  2
3 3  www.google.com  
  1
4 4  www.yahoo.com/?query=    
2
5 5  www.gmail.com    
  1




This is not output from ?dput, which means more work to read it in.




Here is the definition for WHITELIST:-

WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))


What is the 'trim' function?  I do not have that defined.

Perhaps David's answer will work for you...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation using function

2010-07-08 Thread David Winsemius



On Jul 8, 2010, at 10:09 PM, harsh yadav wrote:


Hi,

Here is a somewhat detailed explanation of what I want to achieve:

I have a data frame:

 id url
urlType
1 1  www.yahoo.com1
2 2  www.google.com/?search= 2
3 3  www.google.com   1
4 4  www.yahoo.com/?query=   2
5 5  www.gmail.com 1

I want to get all the URLs that are not of type `1` and satisfy the
condition defined by the following function:

checkBaseLine <- function(s){
for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
}
return(FALSE)
}

Here is the definition for WHITELIST:-

WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))

Now, for the given data frame I want to apply the above function for
all row values for a given column:-

That is:

It works fine when I define a condition like:
data <- data[data$urlType != 1,]


Arrrgh. Why do people keep using "data" as an object name? Is there  
some water pump from which I can remove the handle?


Anyway ... try:

vcheck <- Vectorize(V)

data[ data$urlType != 1 & vcheck(data$url) , "url" ]

--
David


However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values  
that !=

1, and the column `url` contains row values that satisfy the function
definition.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav


On Thu, Jul 8, 2010 at 9:43 PM, Erik Iverson   
wrote:


It will be a lot easier to help you if you follow the posting guide  
and

PLEASE do read the posting guide and provide commented, minimal,
self-contained, reproducible code.

You gave your function definition, which is good.  Use ?dput to  
give us a

small data.frame that can accurately show what you want.


harsh yadav wrote:


Hi all,

I have a data frame for which I want to limit the output by checking
whether
row values for specific column meets particular conditions.

Here are the more specific details:

I have a function that checks whether an input string exists in a  
defined

list:-

checkBaseLine <- function(s){
for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
}
return(FALSE)
}

Now, I have a data frame for which I want to apply the above  
function for

all row values for a given column:-

This works fine when I define a condition like:
data <- data[data$urlType != 1,]

However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values  
that !=

1,
and the column `url` contains row values that gets evaluated using  
the

defined function.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav






David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation using function

2010-07-08 Thread harsh yadav

Hi,

Here is a somewhat detailed explanation of what I want to achieve:

I have a data frame:

  id url
urlType
1 1  www.yahoo.com1
2 2  www.google.com/?search= 2
3 3  www.google.com   1
4 4  www.yahoo.com/?query=   2
5 5  www.gmail.com 1

I want to get all the URLs that are not of type `1` and satisfy the
condition defined by the following function:

checkBaseLine <- function(s){
for (listItem in WHITELIST){
 if(regexpr(as.character(listItem), s)[1] > -1){
return(TRUE)
}
 }
return(FALSE)
}

Here is the definition for WHITELIST:-

WHITELIST = "[?]query=, [?]search="
WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))

Now, for the given data frame I want to apply the above function for
all row values for a given column:-

That is:

It works fine when I define a condition like:
data <- data[data$urlType != 1,]

However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values that !=
1, and the column `url` contains row values that satisfy the function
definition.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav


On Thu, Jul 8, 2010 at 9:43 PM, Erik Iverson  wrote:

> It will be a lot easier to help you if you follow the posting guide and
> PLEASE do read the posting guide and provide commented, minimal,
> self-contained, reproducible code.
>
> You gave your function definition, which is good.  Use ?dput to give us a
> small data.frame that can accurately show what you want.
>
>
> harsh yadav wrote:
>
>> Hi all,
>>
>> I have a data frame for which I want to limit the output by checking
>> whether
>> row values for specific column meets particular conditions.
>>
>> Here are the more specific details:
>>
>> I have a function that checks whether an input string exists in a defined
>> list:-
>>
>> checkBaseLine <- function(s){
>>  for (listItem in WHITELIST){
>> if(regexpr(as.character(listItem), s)[1] > -1){
>>  return(TRUE)
>> }
>> }
>>  return(FALSE)
>> }
>>
>> Now, I have a data frame for which I want to apply the above function for
>> all row values for a given column:-
>>
>> This works fine when I define a condition like:
>> data <- data[data$urlType != 1,]
>>
>> However, I want to combine two logical conditions together like:
>> data <- data[data$urlType != 1 & checkBaseLine(data$url),]
>>
>> This would check whether the column `urlType` contains row values that !=
>> 1,
>> and the column `url` contains row values that gets evaluated using the
>> defined function.
>>
>> Any ideas how this can be done?
>>
>> Thanks in advance.
>>
>> Regards,
>> Harsh Yadav
>>
>>[[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation using function

2010-07-08 Thread Erik Iverson

It will be a lot easier to help you if you follow the posting guide and PLEASE 
do read the posting guide and provide commented, minimal, self-contained, 
reproducible code.


You gave your function definition, which is good.  Use ?dput to give us a small 
data.frame that can accurately show what you want.



harsh yadav wrote:

Hi all,

I have a data frame for which I want to limit the output by checking whether
row values for specific column meets particular conditions.

Here are the more specific details:

I have a function that checks whether an input string exists in a defined
list:-

checkBaseLine <- function(s){
 for (listItem in WHITELIST){
if(regexpr(as.character(listItem), s)[1] > -1){
 return(TRUE)
}
}
 return(FALSE)
}

Now, I have a data frame for which I want to apply the above function for
all row values for a given column:-

This works fine when I define a condition like:
data <- data[data$urlType != 1,]

However, I want to combine two logical conditions together like:
data <- data[data$urlType != 1 & checkBaseLine(data$url),]

This would check whether the column `urlType` contains row values that !=
1,
and the column `url` contains row values that gets evaluated using the
defined function.

Any ideas how this can be done?

Thanks in advance.

Regards,
Harsh Yadav

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with zero rows

2010-06-02 Thread arnaud Gaboury

I do really think it is a very good idea.
TY





> -Original Message-
> From: h.wick...@gmail.com [mailto:h.wick...@gmail.com] On Behalf Of
> Hadley Wickham
> Sent: Wednesday, June 02, 2010 3:31 PM
> To: arnaud Gaboury
> Cc: Peter Ehlers; r-help@r-project.org; Prof Brian Ripley
> Subject: Re: [R] data frame manipulation with zero rows
> 
> Hi Arnaud,
> 
> I've added this case to the set of test cases in plyr and it will be
> fixed in the next version.
> 
> Hadley
> 
> On Tue, Jun 1, 2010 at 2:33 PM, arnaud Gaboury
>  wrote:
> > Maybe not the cleanest way, but I create a fake data frame with one
> row so
> > ddply() is happy!!
> >> if (nrow(futures)==0) futures<-data.frame(...)
> >
> >
> >
> >
> >
> >> -Original Message-
> >> From: Peter Ehlers [mailto:ehl...@ucalgary.ca]
> >> Sent: Tuesday, June 01, 2010 12:07 PM
> >> To: arnaud Gaboury
> >> Cc: 'Prof Brian Ripley'; r-help@r-project.org
> >> Subject: Re: [R] data frame manipulation with zero rows
> >>
> >> On 2010-06-01 1:53, arnaud Gaboury wrote:
> >> > Brian,
> >> >
> >> > If I do understand correctly, I must use in my function something
> >> else than
> >> > ddply() if I want to avoid any error each time my df has zero
> rows?
> >> > Am I correct?
> >> >
> >>
> >> You could define a function to handle the zero-rows case:
> >>
> >> f <- function(x){
> >>   if(nrow(x) < 1) out <- x[, c(1,3,2)]  # or whatever
> >>   else
> >>     out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
> >>                      POSITION=sum(QUANTITY))[,c(1,3,2)]
> >>   out
> >> }
> >> f(futures)
> >>
> >>   -Peter Ehlers
> >>
> >> >
> >> >
> >> >> -Original Message-
> >> >> From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
> >> >> Sent: Tuesday, June 01, 2010 9:47 AM
> >> >> To: arnaud Gaboury
> >> >> Subject: Re: [R] data frame manipulation with zero rows
> >> >>
> >> >> On Tue, 1 Jun 2010, arnaud Gaboury wrote:
> >> >>
> >> >>> Dear group,
> >> >>>
> >> >>> Here is the kind of data.frame I obtain every day with my
> function
> >> :
> >> >>>
> >> >>> futures<-
> >> >>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
> >> >>> "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE
> Aug/10",
> >> >>> "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11
> Jul/10",
> >> >>> "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
> >> >>> ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
> >> >>> 18407, 18408, 18406, 18407, 18407, 18407, 18407), class =
> "Date"),
> >> >>>     QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT
> =
> >> >>> c("373.2500",
> >> >>>     "373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
> >> >>>     "90.7750", "14.9200", "14.9200", "14.9200", "14.9200",
> >> "14.9200"
> >> >>>     )), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
> >> >>> "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
> >> >>>
> >> >>> I need then to apply to the df this following code line :
> >> >>>
> >> >>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> >> >> POSITION=
> >> >>> sum(QUANTITY))[,c(1,3,2)]
> >> >>>
> >> >>> It works perfectly in most of case, BUT I have a new problem: it
> >> can
> >> >>> sometime occurs that my df "futures" is empty, with zero rows.
> >> >>>
> >> >>>
> >> >>> futures<-
> >> >>> structure(list(DESCRIPTION = character(0), CREATED.DATE =
> >> >>> structure(numeric(0), class = "Date"),
> >> >>>

Re: [R] data frame manipulation with zero rows

2010-06-02 Thread Hadley Wickham

Hi Arnaud,

I've added this case to the set of test cases in plyr and it will be
fixed in the next version.

Hadley

On Tue, Jun 1, 2010 at 2:33 PM, arnaud Gaboury  wrote:
> Maybe not the cleanest way, but I create a fake data frame with one row so
> ddply() is happy!!
>> if (nrow(futures)==0) futures<-data.frame(...)
>
>
>
>
>
>> -Original Message-
>> From: Peter Ehlers [mailto:ehl...@ucalgary.ca]
>> Sent: Tuesday, June 01, 2010 12:07 PM
>> To: arnaud Gaboury
>> Cc: 'Prof Brian Ripley'; r-help@r-project.org
>> Subject: Re: [R] data frame manipulation with zero rows
>>
>> On 2010-06-01 1:53, arnaud Gaboury wrote:
>> > Brian,
>> >
>> > If I do understand correctly, I must use in my function something
>> else than
>> > ddply() if I want to avoid any error each time my df has zero rows?
>> > Am I correct?
>> >
>>
>> You could define a function to handle the zero-rows case:
>>
>> f <- function(x){
>>   if(nrow(x) < 1) out <- x[, c(1,3,2)]  # or whatever
>>   else
>>     out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
>>                      POSITION=sum(QUANTITY))[,c(1,3,2)]
>>   out
>> }
>> f(futures)
>>
>>   -Peter Ehlers
>>
>> >
>> >
>> >> -Original Message-
>> >> From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
>> >> Sent: Tuesday, June 01, 2010 9:47 AM
>> >> To: arnaud Gaboury
>> >> Subject: Re: [R] data frame manipulation with zero rows
>> >>
>> >> On Tue, 1 Jun 2010, arnaud Gaboury wrote:
>> >>
>> >>> Dear group,
>> >>>
>> >>> Here is the kind of data.frame I obtain every day with my function
>> :
>> >>>
>> >>> futures<-
>> >>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
>> >>> "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
>> >>> "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
>> >>> "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
>> >>> ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
>> >>> 18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
>> >>>     QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
>> >>> c("373.2500",
>> >>>     "373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
>> >>>     "90.7750", "14.9200", "14.9200", "14.9200", "14.9200",
>> "14.9200"
>> >>>     )), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
>> >>> "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
>> >>>
>> >>> I need then to apply to the df this following code line :
>> >>>
>> >>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
>> >> POSITION=
>> >>> sum(QUANTITY))[,c(1,3,2)]
>> >>>
>> >>> It works perfectly in most of case, BUT I have a new problem: it
>> can
>> >>> sometime occurs that my df "futures" is empty, with zero rows.
>> >>>
>> >>>
>> >>> futures<-
>> >>> structure(list(DESCRIPTION = character(0), CREATED.DATE =
>> >>> structure(numeric(0), class = "Date"),
>> >>>     QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
>> >>> c("DESCRIPTION",
>> >>> "CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),
>> >> class =
>> >>> "data.frame")
>> >>>
>> >>> It is not the usual case, but it can happen. With this df, when I
>> >> pass the
>> >>> above mentione line, I get an error :
>> >>>
>> >>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
>> >> POSITION=
>> >>> sum(QUANTITY))[,c(1,3,2)]
>> >>> Error in tapply(1:nrow(data), splitv, list) :
>> >>>   arguments must have same length
>> >>>
>> &

Re: [R] data frame manipulation with zero rows

2010-06-01 Thread arnaud Gaboury

Maybe not the cleanest way, but I create a fake data frame with one row so
ddply() is happy!!
> if (nrow(futures)==0) futures<-data.frame(...)





> -Original Message-
> From: Peter Ehlers [mailto:ehl...@ucalgary.ca]
> Sent: Tuesday, June 01, 2010 12:07 PM
> To: arnaud Gaboury
> Cc: 'Prof Brian Ripley'; r-help@r-project.org
> Subject: Re: [R] data frame manipulation with zero rows
> 
> On 2010-06-01 1:53, arnaud Gaboury wrote:
> > Brian,
> >
> > If I do understand correctly, I must use in my function something
> else than
> > ddply() if I want to avoid any error each time my df has zero rows?
> > Am I correct?
> >
> 
> You could define a function to handle the zero-rows case:
> 
> f <- function(x){
>   if(nrow(x) < 1) out <- x[, c(1,3,2)]  # or whatever
>   else
> out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
>  POSITION=sum(QUANTITY))[,c(1,3,2)]
>   out
> }
> f(futures)
> 
>   -Peter Ehlers
> 
> >
> >
> >> -Original Message-----
> >> From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
> >> Sent: Tuesday, June 01, 2010 9:47 AM
> >> To: arnaud Gaboury
> >> Subject: Re: [R] data frame manipulation with zero rows
> >>
> >> On Tue, 1 Jun 2010, arnaud Gaboury wrote:
> >>
> >>> Dear group,
> >>>
> >>> Here is the kind of data.frame I obtain every day with my function
> :
> >>>
> >>> futures<-
> >>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
> >>> "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
> >>> "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
> >>> "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
> >>> ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
> >>> 18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
> >>> QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
> >>> c("373.2500",
> >>> "373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
> >>> "90.7750", "14.9200", "14.9200", "14.9200", "14.9200",
> "14.9200"
> >>> )), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
> >>> "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
> >>>
> >>> I need then to apply to the df this following code line :
> >>>
> >>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> >> POSITION=
> >>> sum(QUANTITY))[,c(1,3,2)]
> >>>
> >>> It works perfectly in most of case, BUT I have a new problem: it
> can
> >>> sometime occurs that my df "futures" is empty, with zero rows.
> >>>
> >>>
> >>> futures<-
> >>> structure(list(DESCRIPTION = character(0), CREATED.DATE =
> >>> structure(numeric(0), class = "Date"),
> >>> QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
> >>> c("DESCRIPTION",
> >>> "CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),
> >> class =
> >>> "data.frame")
> >>>
> >>> It is not the usual case, but it can happen. With this df, when I
> >> pass the
> >>> above mentione line, I get an error :
> >>>
> >>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> >> POSITION=
> >>> sum(QUANTITY))[,c(1,3,2)]
> >>> Error in tapply(1:nrow(data), splitv, list) :
> >>>   arguments must have same length
> >>>
> >>>
> >>> How can I avoid this when my df is empty?
> >>
> >> Ask the author of the (missing) function ddply() to correct the
> error
> >> of using 1:nrow(data) by replacing it by seq_len(nrow(data)).
> >>
> >> It's helpful to give example code, but much more helpful if you test
> >> it: yours cannot work without the function ddply() -- this is what
> >> 'self-contained' means in the footer here.
> >>
> >>
> >>>
> >>> Any help is appreciated
> >>>
> >>> __
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >> --
> >> Brian D. Ripley,  rip...@stats.ox.ac.uk
> >> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> >> University of Oxford, Tel:  +44 1865 272861 (self)
> >> 1 South Parks Road, +44 1865 272866 (PA)
> >> Oxford OX1 3TG, UKFax:  +44 1865 272595
> >

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with zero rows

2010-06-01 Thread arnaud Gaboury

It is indeed ddply() from package plyr.





> -Original Message-
> From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
> Sent: Tuesday, June 01, 2010 12:24 PM
> To: Peter Ehlers
> Cc: arnaud Gaboury; r-help@r-project.org
> Subject: Re: [R] data frame manipulation with zero rows
> 
> On Tue, 1 Jun 2010, Peter Ehlers wrote:
> 
> > On 2010-06-01 1:53, arnaud Gaboury wrote:
> >> Brian,
> >>
> >> If I do understand correctly, I must use in my function something
> else than
> >> ddply() if I want to avoid any error each time my df has zero rows?
> >> Am I correct?
> >>
> >
> > You could define a function to handle the zero-rows case:
> >
> > f <- function(x){
> > if(nrow(x) < 1) out <- x[, c(1,3,2)]  # or whatever
> > else
> >   out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
> >POSITION=sum(QUANTITY))[,c(1,3,2)]
> > out
> > }
> > f(futures)
> 
> Or simply fix ddply.  We don't know what that is or what it should do
> for the case of zero rows: it may or may not be the one in package
> plyr.
> 
> >
> > -Peter Ehlers
> >
> >>
> >>
> >>> -Original Message-
> >>> From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
> >>> Sent: Tuesday, June 01, 2010 9:47 AM
> >>> To: arnaud Gaboury
> >>> Subject: Re: [R] data frame manipulation with zero rows
> >>>
> >>> On Tue, 1 Jun 2010, arnaud Gaboury wrote:
> >>>
> >>>> Dear group,
> >>>>
> >>>> Here is the kind of data.frame I obtain every day with my function
> :
> >>>>
> >>>> futures<-
> >>>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
> >>>> "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
> >>>> "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
> >>>> "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
> >>>> ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
> >>>> 18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
> >>>> QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
> >>>> c("373.2500",
> >>>> "373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
> >>>> "90.7750", "14.9200", "14.9200", "14.9200", "14.9200",
> "14.9200"
> >>>> )), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
> >>>> "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
> >>>>
> >>>> I need then to apply to the df this following code line :
> >>>>
> >>>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> >>> POSITION=
> >>>> sum(QUANTITY))[,c(1,3,2)]
> >>>>
> >>>> It works perfectly in most of case, BUT I have a new problem: it
> can
> >>>> sometime occurs that my df "futures" is empty, with zero rows.
> >>>>
> >>>>
> >>>> futures<-
> >>>> structure(list(DESCRIPTION = character(0), CREATED.DATE =
> >>>> structure(numeric(0), class = "Date"),
> >>>> QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
> >>>> c("DESCRIPTION",
> >>>> "CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),
> >>> class =
> >>>> "data.frame")
> >>>>
> >>>> It is not the usual case, but it can happen. With this df, when I
> >>> pass the
> >>>> above mentione line, I get an error :
> >>>>
> >>>>> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> >>> POSITION=
> >>>> sum(QUANTITY))[,c(1,3,2)]
> >>>> Error in tapply(1:nrow(data), splitv, list) :
> >>>>   arguments must have same length
> >>>>
> >>>>
> >>>> How can I avoid this when my df is empty?
> >>>
> >>> Ask the author of the (missing) function ddply() to correct the
> error
> >>> of using 1:nrow(data) by replacing it by seq_len(nrow(data)).
> >>>
> >>> It's helpful to give example code, but much more helpful if you
> test
> >>> it: yours cannot work without the function ddply() -- this is what
> >>> 'self-contained' means in the footer here.
> 
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with zero rows

2010-06-01 Thread Prof Brian Ripley


On Tue, 1 Jun 2010, Peter Ehlers wrote:


On 2010-06-01 1:53, arnaud Gaboury wrote:

Brian,

If I do understand correctly, I must use in my function something else than
ddply() if I want to avoid any error each time my df has zero rows?
Am I correct?



You could define a function to handle the zero-rows case:

f <- function(x){
if(nrow(x) < 1) out <- x[, c(1,3,2)]  # or whatever
else
  out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
   POSITION=sum(QUANTITY))[,c(1,3,2)]
out
}
f(futures)


Or simply fix ddply.  We don't know what that is or what it should do 
for the case of zero rows: it may or may not be the one in package 
plyr.




-Peter Ehlers





-Original Message-
From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
Sent: Tuesday, June 01, 2010 9:47 AM
To: arnaud Gaboury
Subject: Re: [R] data frame manipulation with zero rows

On Tue, 1 Jun 2010, arnaud Gaboury wrote:


Dear group,

Here is the kind of data.frame I obtain every day with my function :

futures<-
structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
"CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
"LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
"SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
c("373.2500",
"373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
"90.7750", "14.9200", "14.9200", "14.9200", "14.9200", "14.9200"
)), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
"SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")

I need then to apply to the df this following code line :


PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,

POSITION=

sum(QUANTITY))[,c(1,3,2)]

It works perfectly in most of case, BUT I have a new problem: it can
sometime occurs that my df "futures" is empty, with zero rows.


futures<-
structure(list(DESCRIPTION = character(0), CREATED.DATE =
structure(numeric(0), class = "Date"),
QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
c("DESCRIPTION",
"CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),

class =

"data.frame")

It is not the usual case, but it can happen. With this df, when I

pass the

above mentione line, I get an error :


PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,

POSITION=

sum(QUANTITY))[,c(1,3,2)]
Error in tapply(1:nrow(data), splitv, list) :
  arguments must have same length


How can I avoid this when my df is empty?


Ask the author of the (missing) function ddply() to correct the error
of using 1:nrow(data) by replacing it by seq_len(nrow(data)).

It's helpful to give example code, but much more helpful if you test
it: yours cannot work without the function ddply() -- this is what
'self-contained' means in the footer here.


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation ddply

2010-06-01 Thread arnaud Gaboury

Patrick,

When apply to this following df :

futures <-
structure(list(DESCRIPTION = character(0), CREATED.DATE =
structure(numeric(0), class = "Date"), 
QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
c("DESCRIPTION", 
"CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0), class =
"data.frame")


> PosFut <- aggregate(futures$QUANTITY, list(DESCRIPTION =
futures$DESCRIPTION,SETTLEMENT=futures$SETTLEMENT),sum)[,c(1,3,2)]
Error in aggregate.data.frame(as.data.frame(x), ...) : 
  no rows to aggregate



> -Original Message-
> From: Patrick Hausmann [mailto:patrick.hausm...@uni-bremen.de]
> Sent: Tuesday, June 01, 2010 11:38 AM
> To: arnaud Gaboury
> Subject: Re: [R] data frame manipulation ddply
> 
> Hi Arnaud,
> 
> maybe "aggregate" can help:
> 
> PosFut <- aggregate(futures$QUANTITY, list(DESCRIPTION =
> futures$DESCRIPTION,
>   SETTLEMENT  = futures$SETTLEMENT),
> sum)[, c(1,3,2)]
> 
> HTH,
> Patrick
> 
> Am 01.06.2010 11:02, schrieb arnaud Gaboury:
> > Dear group,
> >
> > Here is my data frame:
> >
> >
> > futures<-
> > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
> > "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
> > "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
> > "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
> > ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
> > 18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
> >  QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
> > c("373.2500",
> >  "373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
> >  "90.7750", "14.9200", "14.9200", "14.9200", "14.9200", "14.9200"
> >  )), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
> > "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
> >
> > Here is the line I pass :
> >
> >> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> POSITION=
> > sum(QUANTITY))[,c(1,3,2)]
> >
> > And here the result :
> >
> > PosFut<-
> > structure(list(DESCRIPTION = structure(1:3, .Label = c("CORN Jul/10",
> > "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10"), class = "factor"),
> >  POSITION = c(5, 4, 5), SETTLEMENT = structure(c(2L, 3L, 1L
> >  ), .Label = c("14.9200", "373.2500", "90.7750"), class =
> "factor")),
> > .Names = c("DESCRIPTION",
> > "POSITION", "SETTLEMENT"), class = "data.frame", row.names = c(NA,
> > -3L))
> >
> > I can no more use ddply, as this above command line is in a function,
> and
> > this line should be able to work with a data frame with zero rows,
> and in
> > this case ddply doesn't work.
> > Any suggestion how to obtain the same result without ddply?
> >
> > TY for any help.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with zero rows

2010-06-01 Thread Peter Ehlers


On 2010-06-01 1:53, arnaud Gaboury wrote:

Brian,

If I do understand correctly, I must use in my function something else than
ddply() if I want to avoid any error each time my df has zero rows?
Am I correct?



You could define a function to handle the zero-rows case:

f <- function(x){
 if(nrow(x) < 1) out <- x[, c(1,3,2)]  # or whatever
 else
   out <- ddply(x, c("DESCRIPTION","SETTLEMENT"), summarise,
POSITION=sum(QUANTITY))[,c(1,3,2)]
 out
}
f(futures)

 -Peter Ehlers





-Original Message-
From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
Sent: Tuesday, June 01, 2010 9:47 AM
To: arnaud Gaboury
Subject: Re: [R] data frame manipulation with zero rows

On Tue, 1 Jun 2010, arnaud Gaboury wrote:


Dear group,

Here is the kind of data.frame I obtain every day with my function :

futures<-
structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
"CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
"LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
"SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
c("373.2500",
"373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
"90.7750", "14.9200", "14.9200", "14.9200", "14.9200", "14.9200"
)), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
"SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")

I need then to apply to the df this following code line :


PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,

POSITION=

sum(QUANTITY))[,c(1,3,2)]

It works perfectly in most of case, BUT I have a new problem: it can
sometime occurs that my df "futures" is empty, with zero rows.


futures<-
structure(list(DESCRIPTION = character(0), CREATED.DATE =
structure(numeric(0), class = "Date"),
QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
c("DESCRIPTION",
"CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),

class =

"data.frame")

It is not the usual case, but it can happen. With this df, when I

pass the

above mentione line, I get an error :


PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,

POSITION=

sum(QUANTITY))[,c(1,3,2)]
Error in tapply(1:nrow(data), splitv, list) :
  arguments must have same length


How can I avoid this when my df is empty?


Ask the author of the (missing) function ddply() to correct the error
of using 1:nrow(data) by replacing it by seq_len(nrow(data)).

It's helpful to give example code, but much more helpful if you test
it: yours cannot work without the function ddply() -- this is what
'self-contained' means in the footer here.




Any help is appreciated

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-

guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation with zero rows

2010-06-01 Thread arnaud Gaboury

Brian,

If I do understand correctly, I must use in my function something else than
ddply() if I want to avoid any error each time my df has zero rows?
Am I correct?

TY




> -Original Message-
> From: Prof Brian Ripley [mailto:rip...@stats.ox.ac.uk]
> Sent: Tuesday, June 01, 2010 9:47 AM
> To: arnaud Gaboury
> Subject: Re: [R] data frame manipulation with zero rows
> 
> On Tue, 1 Jun 2010, arnaud Gaboury wrote:
> 
> > Dear group,
> >
> > Here is the kind of data.frame I obtain every day with my function :
> >
> > futures <-
> > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN Jul/10",
> > "CORN Jul/10", "CORN Jul/10", "CORN Jul/10", "LIVE CATTLE Aug/10",
> > "LIVE CATTLE Aug/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10",
> > "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10", "SUGAR NO.11 Jul/10"
> > ), CREATED.DATE = structure(c(18403, 18406, 18406, 18406, 18406,
> > 18407, 18408, 18406, 18407, 18407, 18407, 18407), class = "Date"),
> >QUANTITY = c(1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1), SETTLEMENT =
> > c("373.2500",
> >"373.2500", "373.2500", "373.2500", "373.2500", "90.7750",
> >"90.7750", "14.9200", "14.9200", "14.9200", "14.9200", "14.9200"
> >)), .Names = c("DESCRIPTION", "CREATED.DATE", "QUANTITY",
> > "SETTLEMENT"), row.names = c(NA, 12L), class = "data.frame")
> >
> > I need then to apply to the df this following code line :
> >
> >> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> POSITION=
> > sum(QUANTITY))[,c(1,3,2)]
> >
> > It works perfectly in most of case, BUT I have a new problem: it can
> > sometime occurs that my df "futures" is empty, with zero rows.
> >
> >
> > futures <-
> > structure(list(DESCRIPTION = character(0), CREATED.DATE =
> > structure(numeric(0), class = "Date"),
> >QUANTITY = numeric(0), SETTLEMENT = character(0)), .Names =
> > c("DESCRIPTION",
> > "CREATED.DATE", "QUANTITY", "SETTLEMENT"), row.names = integer(0),
> class =
> > "data.frame")
> >
> > It is not the usual case, but it can happen. With this df, when I
> pass the
> > above mentione line, I get an error :
> >
> >> PosFut=ddply(futures, c("DESCRIPTION","SETTLEMENT"), summarise,
> POSITION=
> > sum(QUANTITY))[,c(1,3,2)]
> > Error in tapply(1:nrow(data), splitv, list) :
> >  arguments must have same length
> >
> >
> > How can I avoid this when my df is empty?
> 
> Ask the author of the (missing) function ddply() to correct the error
> of using 1:nrow(data) by replacing it by seq_len(nrow(data)).
> 
> It's helpful to give example code, but much more helpful if you test
> it: yours cannot work without the function ddply() -- this is what
> 'self-contained' means in the footer here.
> 
> 
> >
> > Any help is appreciated
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> --
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data frame manipulation

2010-05-29 Thread Tal Galili

Hi there,
I am glad it helped.
I used mean as something to use, not because I had an understanding that
this is what you need - so if you believe sum is what you where after - go
with it :)

Regarding loving R, and time spending - everyone on this list probably know
how you feel.  We all spent time trying to invent a wheel, and then found
someone else compiled a better solution then our patch work.
So 1 - this is how we learn I guess.  And 2 - each of us contribute in his
own way so it is all fine :)

Best,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, May 28, 2010 at 9:37 PM, LCOG1  wrote:

>
> Tal,
>   Wow, i cant believe how many different manipulations i went through
> trying to coerce it into the format i wanted.  The below works nearly
> perfectly, i had to change the "mean" call to "sum".   Im curious why you
> used mean?  Other than that thank you very much, i feel a little foolish
> about how long i spent trying to do this.  Got to love R.
>
> 
> From: Tal Galili [via R] [mailto:
> ml-node+2234184-1067705461-103...@n4.nabble.com
> ]
> Sent: Friday, May 28, 2010 12:04 AM
> To: ROLL Josh F
> Subject: Re: Data frame manipulation
>
> Hi there,
>
> The tool to learn for this is the cast function using the reshape package.
> In your example you have more then one value for RTL, which you should
> think
> of how to account for.
> But basically, here is a solution to what you asked for (assuming I
> understood you correctly)
>
>
> require(reshape)
> #?cast
> cast(EmpTotCt.Zn..,  Taz ~ ClusterType  , value = "TotEmp", mean, fill = 0)
>
>
>
> Best,
> Tal
>
> Contact
> Details:---
> Contact me: [hidden email]
> |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
>
> --
>
>
>
>
> On Fri, May 28, 2010 at 3:14 AM, LCOG1 <[hidden
> email]> wrote:
>
> >
> > Hello All,
> > Please consider the following:
> >
> > TotEmp<-c(19,6,1,1,8,44,2,33,48,1)
> >
> >
> ClusterType<-c("AGF","CNS","OSV","RTL","RTL","TRN","REL","ACC_CLUST","RTL","WHL")
> > Taz<-c(0,0,0,100,100,100,101,101,102,103)
> >
> >
> >
> AllCtTypes_<-c("AGF","CNS","OSV","RTL","TRN","REL","ACC_CLUST","WHL","ADM_CLUST",
> >
> >
> "HLH","HLH_CLUST","ACC","RTL_CLUST","MFG","ADM","MFG_CLUST","CNS_CLUST","PRF","PUB",
> > "FIN","INF_CLUST","INF","EDU_CLUST","REC","EDU",
> > "MNG","UTL","MIN")
> > #Build data frame
> > EmpTotCt.Zn..<-data.frame(TotEmp,ClusterType,Taz)
> > #Reverse rows to columns
> > EmpTotCt.Zn2..<-as.data.frame(t(as.matrix(EmpTotCt.Zn..)))
> >
> >
> > "EmpTotCt.Zn.." is a data frame that i would like to alter by adding new
> > columns and input 0s where no values exist.  I tried the line below as
> its
> > the only way i know of switching columns to rows but its far from what i
> am
> > looking for.  So "EmpTotCt.Zn.." returns
> >
> >   TotEmp ClusterType Taz
> > 1  19 AGF   0
> > 2   6 CNS0
> > 3   1 OSV   0
> > 4   1 RTL 100
> > 5   8 RTL 100
> > 6  44 TRN100
> > 7   2 REL 101
> > 8  33   ACC_CLUST 101
> > 9  48 RTL 102
> > 10  1 RTL 103
> >
> > But what i want is to return the below:
> >
> >AGF CNS OSV RTL RTL TRN REL ACC_CLUST
> > RTL
> > 0   19  6   1   0   0   0   0   0
> >   0
> > 100 0   0   0   1   8   44  0   0
> >   0
> > 101 0   0   0   0   0   0   2   33
> >0
> > 102 0   0   0   0   0   0   0   0
> >  48
> > 103 0   0   0   0   0   0   0   0
> >1
> >
> > Where the rows represent "Taz" and the columns represent ALL
> > "ClusterType"'s
> > found in "AllCtTypes_", this would mean that the above output example
> would
> > have many more columns with 0s in all the rows since there are no
> > observations.  Its taken me a while to get the data into the above format
> > and im afraid im stuck with how to get it into the final computational
> > format, so hopefully someone can help.
> >
> > Perhaps i have to build a blank data frame with the appropriate
> dimensions
> > first but i am not sure if this is the most efficient way of
> accomplishing
> > this.
> >
> > Thanks in advance.
> >
> >
> > --
> > View this message in context:
> >
> http://r.789695.n4.nabble.com/Data-frame-manipul

Re: [R] Data frame manipulation

2010-05-28 Thread LCOG1


Tal,
   Wow, i cant believe how many different manipulations i went through trying 
to coerce it into the format i wanted.  The below works nearly perfectly, i had 
to change the "mean" call to "sum".   Im curious why you used mean?  Other than 
that thank you very much, i feel a little foolish about how long i spent trying 
to do this.  Got to love R.


From: Tal Galili [via R] 
[mailto:ml-node+2234184-1067705461-103...@n4.nabble.com]
Sent: Friday, May 28, 2010 12:04 AM
To: ROLL Josh F
Subject: Re: Data frame manipulation

Hi there,

The tool to learn for this is the cast function using the reshape package.
In your example you have more then one value for RTL, which you should think
of how to account for.
But basically, here is a solution to what you asked for (assuming I
understood you correctly)


require(reshape)
#?cast
cast(EmpTotCt.Zn..,  Taz ~ ClusterType  , value = "TotEmp", mean, fill = 0)



Best,
Tal

Contact
Details:---
Contact me: [hidden email] |  
972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, May 28, 2010 at 3:14 AM, LCOG1 <[hidden 
email]> wrote:

>
> Hello All,
> Please consider the following:
>
> TotEmp<-c(19,6,1,1,8,44,2,33,48,1)
>
> ClusterType<-c("AGF","CNS","OSV","RTL","RTL","TRN","REL","ACC_CLUST","RTL","WHL")
> Taz<-c(0,0,0,100,100,100,101,101,102,103)
>
>
> AllCtTypes_<-c("AGF","CNS","OSV","RTL","TRN","REL","ACC_CLUST","WHL","ADM_CLUST",
>
> "HLH","HLH_CLUST","ACC","RTL_CLUST","MFG","ADM","MFG_CLUST","CNS_CLUST","PRF","PUB",
> "FIN","INF_CLUST","INF","EDU_CLUST","REC","EDU",
> "MNG","UTL","MIN")
> #Build data frame
> EmpTotCt.Zn..<-data.frame(TotEmp,ClusterType,Taz)
> #Reverse rows to columns
> EmpTotCt.Zn2..<-as.data.frame(t(as.matrix(EmpTotCt.Zn..)))
>
>
> "EmpTotCt.Zn.." is a data frame that i would like to alter by adding new
> columns and input 0s where no values exist.  I tried the line below as its
> the only way i know of switching columns to rows but its far from what i am
> looking for.  So "EmpTotCt.Zn.." returns
>
>   TotEmp ClusterType Taz
> 1  19 AGF   0
> 2   6 CNS0
> 3   1 OSV   0
> 4   1 RTL 100
> 5   8 RTL 100
> 6  44 TRN100
> 7   2 REL 101
> 8  33   ACC_CLUST 101
> 9  48 RTL 102
> 10  1 RTL 103
>
> But what i want is to return the below:
>
>AGF CNS OSV RTL RTL TRN REL ACC_CLUST
> RTL
> 0   19  6   1   0   0   0   0   0
>   0
> 100 0   0   0   1   8   44  0   0
>   0
> 101 0   0   0   0   0   0   2   33
>0
> 102 0   0   0   0   0   0   0   0
>  48
> 103 0   0   0   0   0   0   0   0
>1
>
> Where the rows represent "Taz" and the columns represent ALL
> "ClusterType"'s
> found in "AllCtTypes_", this would mean that the above output example would
> have many more columns with 0s in all the rows since there are no
> observations.  Its taken me a while to get the data into the above format
> and im afraid im stuck with how to get it into the final computational
> format, so hopefully someone can help.
>
> Perhaps i have to build a blank data frame with the appropriate dimensions
> first but i am not sure if this is the most efficient way of accomplishing
> this.
>
> Thanks in advance.
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Data-frame-manipulation-tp2233932p2233932.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



View message @ 
http://r.789695.n4.nabble.com/Data-frame-manipulation-tp2233932p2234184.html
To unsubscribe from Data frame manipulation, click here< (link removed) ==>.


-- 
View this message in context: 
http://r.789695.n4.nabble.com/Data-frame-manipulation-tp2233932p2235019.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__

Re: [R] Data frame manipulation

2010-05-28 Thread Tal Galili

Hi there,

The tool to learn for this is the cast function using the reshape package.
In your example you have more then one value for RTL, which you should think
of how to account for.
But basically, here is a solution to what you asked for (assuming I
understood you correctly)


require(reshape)
#?cast
cast(EmpTotCt.Zn..,  Taz ~ ClusterType  , value = "TotEmp", mean, fill = 0)



Best,
Tal

Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, May 28, 2010 at 3:14 AM, LCOG1  wrote:

>
> Hello All,
> Please consider the following:
>
> TotEmp<-c(19,6,1,1,8,44,2,33,48,1)
>
> ClusterType<-c("AGF","CNS","OSV","RTL","RTL","TRN","REL","ACC_CLUST","RTL","WHL")
> Taz<-c(0,0,0,100,100,100,101,101,102,103)
>
>
> AllCtTypes_<-c("AGF","CNS","OSV","RTL","TRN","REL","ACC_CLUST","WHL","ADM_CLUST",
>
> "HLH","HLH_CLUST","ACC","RTL_CLUST","MFG","ADM","MFG_CLUST","CNS_CLUST","PRF","PUB",
> "FIN","INF_CLUST","INF","EDU_CLUST","REC","EDU",
> "MNG","UTL","MIN")
> #Build data frame
> EmpTotCt.Zn..<-data.frame(TotEmp,ClusterType,Taz)
> #Reverse rows to columns
> EmpTotCt.Zn2..<-as.data.frame(t(as.matrix(EmpTotCt.Zn..)))
>
>
> "EmpTotCt.Zn.." is a data frame that i would like to alter by adding new
> columns and input 0s where no values exist.  I tried the line below as its
> the only way i know of switching columns to rows but its far from what i am
> looking for.  So "EmpTotCt.Zn.." returns
>
>   TotEmp ClusterType Taz
> 1  19 AGF   0
> 2   6 CNS0
> 3   1 OSV   0
> 4   1 RTL 100
> 5   8 RTL 100
> 6  44 TRN100
> 7   2 REL 101
> 8  33   ACC_CLUST 101
> 9  48 RTL 102
> 10  1 RTL 103
>
> But what i want is to return the below:
>
>AGF CNS OSV RTL RTL TRN REL ACC_CLUST
> RTL
> 0   19  6   1   0   0   0   0   0
>   0
> 100 0   0   0   1   8   44  0   0
>   0
> 101 0   0   0   0   0   0   2   33
>0
> 102 0   0   0   0   0   0   0   0
>  48
> 103 0   0   0   0   0   0   0   0
>1
>
> Where the rows represent "Taz" and the columns represent ALL
> "ClusterType"'s
> found in "AllCtTypes_", this would mean that the above output example would
> have many more columns with 0s in all the rows since there are no
> observations.  Its taken me a while to get the data into the above format
> and im afraid im stuck with how to get it into the final computational
> format, so hopefully someone can help.
>
> Perhaps i have to build a blank data frame with the appropriate dimensions
> first but i am not sure if this is the most efficient way of accomplishing
> this.
>
> Thanks in advance.
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Data-frame-manipulation-tp2233932p2233932.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread arnaud Gaboury

Sorry Joris, but I am totally lost on this issue!!

>tradenews<-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status=="DEL"
)],switch,Sell="Buy",Buy="Sell")

> tradenews
 Sell 
"Buy"

Not really what I want !!

From: Joris Meys [mailto:jorism...@gmail.com] 
Sent: Thursday, May 27, 2010 10:38 AM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

Off course. You put in a matrix to sapply, but sapply is for vectors. You
want to apply the switch command on every entry of the vector
trades$Buy.Sell..Cleared for which trades$Trade.Status equals "DEL". Why do
you try to put in a matrix with all variables for the observations where
status is DEL?

You should have done :

tradesnew<-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status=="DEL")
],
 switch,Sell="Buy",Buy="Sell")

Check the help files, and keep track of what goes in and out a function.

Cheers
Joris
On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury 
wrote:
Joris,

If i pass this line :

>tradesnew<-sapply(trades[which(trades$Trade.Status=="DEL"),],switch,Sel
>l="Buy",Buy="Sell")

Here is what I get :

> tradesnew
$Trade.Status
NULL

$Instrument.Long.Name
NULL

$Delivery.Prompt.Date
NULL

$Buy.Sell..Cleared.
[1] "Buy"

$Volume
[1] "Buy"

$Price
NULL

$Net.Charges..sum.
NULL

That's certainly not what I want.

From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Thursday, May 27, 2010 8:43 AM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

The loop is due to the switch statement, not the condition. Without
condition it would become:

for (i in 1:length(Y)){
    new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
You can make an sapply construct too off course :

new.vect <- sapply(X[which(Y=="DEL")],switch,Sell="Buy",Buy="Sell")

This will speed up things a little bit, but the effect is marginal.
Cheers
Joris
On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury 
wrote:
Thank you for the answer.
Is there any way to combine if() and switch() in one line? In my case,
something like :

>if(trade$Trade.Status=="DEL")switch(.)

I would like to avoid the loop .

From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Wednesday, May 26, 2010 9:15 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

see ?switch

X<- rep(c("Buy","Sell","something else"),each=5)
Y<- rep(c("DEL","INS","DEL"),5)

new.vect <- X
for (i in which(Y=="DEL")){
    new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
cbind(new.vect,X,Y)
On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury 
wrote:
Dear group,

Here is my df :

trade <-
structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name =
c("SUGAR NO.11",
"CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
"Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
c(4.01,
-8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
"Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
"Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")

Here is what I want :

If trade$Trade.Status=="DEL": then if trade$buy.Sell..Cleared==Sell , change
it to "Buy", if trade$buy.Sell..Cleared==Buy, change it to "Sell".
If trade$Trade.Status=="INS", do nothing
I tried to work around with ifelse, but don't know how to deal with so many
conditions.

Any help is appreciated.

TY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, b

Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread arnaud Gaboury

Maybe should I be more precise. Here is what I have :

trades <-
structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name =
c("SUGAR NO.11",
"CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
"Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
c(4.01,
-8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
"Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
"Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")

Here is what I want :

tradesnew <-
structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name =
c("SUGAR NO.11",
"CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
"Jul/10"), Buy.Sell..Cleared. = c("Buy", "Buy", "Buy"), Volume = c(1L,
2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
c(4.01,
-8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
"Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
"Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")



From: Joris Meys [mailto:jorism...@gmail.com] 
Sent: Thursday, May 27, 2010 10:38 AM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

Off course. You put in a matrix to sapply, but sapply is for vectors. You
want to apply the switch command on every entry of the vector
trades$Buy.Sell..Cleared for which trades$Trade.Status equals "DEL". Why do
you try to put in a matrix with all variables for the observations where
status is DEL?

You should have done :

tradesnew<-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status=="DEL")
],
 switch,Sell="Buy",Buy="Sell")

Check the help files, and keep track of what goes in and out a function.

Cheers
Joris
On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury 
wrote:
Joris,

If i pass this line :

>tradesnew<-sapply(trades[which(trades$Trade.Status=="DEL"),],switch,Sel
>l="Buy",Buy="Sell")

Here is what I get :

> tradesnew
$Trade.Status
NULL

$Instrument.Long.Name
NULL

$Delivery.Prompt.Date
NULL

$Buy.Sell..Cleared.
[1] "Buy"

$Volume
[1] "Buy"

$Price
NULL

$Net.Charges..sum.
NULL

That's certainly not what I want.




From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Thursday, May 27, 2010 8:43 AM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

The loop is due to the switch statement, not the condition. Without
condition it would become:

for (i in 1:length(Y)){
    new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
You can make an sapply construct too off course :

new.vect <- sapply(X[which(Y=="DEL")],switch,Sell="Buy",Buy="Sell")

This will speed up things a little bit, but the effect is marginal.
Cheers
Joris
On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury 
wrote:
Thank you for the answer.
Is there any way to combine if() and switch() in one line? In my case,
something like :

>if(trade$Trade.Status=="DEL")switch(.)

I would like to avoid the loop .



From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Wednesday, May 26, 2010 9:15 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

see ?switch

X<- rep(c("Buy","Sell","something else"),each=5)
Y<- rep(c("DEL","INS","DEL"),5)


new.vect <- X
for (i in which(Y=="DEL")){
    new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
cbind(new.vect,X,Y)
On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury 
wrote:
Dear group,

Here is my df :

trade <-
structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name =
c("SUGAR NO.11",
"CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
"Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
c(4.01,
-8.64, -4.32)), .Names = c("Trade.Status", &

Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread Joris Meys

Ah, OK. sapply -evidently- only gives an output for every case that goes in.
Which is only one, as there is only one DEL case. You can use that output to
change the corresponding value in the dataframe, like :

tradenews <- trades
tradenews$Buy.Sell..Cleared.[which(trades$Trade.Status=="DEL")] <-
 sapply(trades$Buy.Sell..Cleared.[which(trades$Trade.Status=="DEL")],
switch,Sell="Buy",Buy="Sell")

Also take a look at these help files and the examples mentioned in there.
?switch
?sapply
?which

And please, give your variables some decent names. All those points make
your code very error-prone.

Cheers
Joris

On Thu, May 27, 2010 at 10:47 AM, arnaud Gaboury
wrote:

> Sorry Joris, but I am totally lost on this issue!!
>
>
>
> >tradenews<-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status=="DEL"
> )],switch,Sell="Buy",Buy="Sell")
>
> > tradenews
>  Sell
> "Buy"
>
> Not really what I want !!
>
> From: Joris Meys [mailto:jorism...@gmail.com]
> Sent: Thursday, May 27, 2010 10:38 AM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation change elements meeting criteria
>
> Off course. You put in a matrix to sapply, but sapply is for vectors. You
> want to apply the switch command on every entry of the vector
> trades$Buy.Sell..Cleared for which trades$Trade.Status equals "DEL". Why do
> you try to put in a matrix with all variables for the observations where
> status is DEL?
>
> You should have done :
>
>
> tradesnew<-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status=="DEL")
> ],
>  switch,Sell="Buy",Buy="Sell")
>
> Check the help files, and keep track of what goes in and out a function.
>
> Cheers
> Joris
> On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury 
> wrote:
> Joris,
>
> If i pass this line :
>
> >tradesnew<-sapply(trades[which(trades$Trade.Status=="DEL"),],switch,Sel
> >l="Buy",Buy="Sell")
>
> Here is what I get :
>
> > tradesnew
> $Trade.Status
> NULL
>
> $Instrument.Long.Name
> NULL
>
> $Delivery.Prompt.Date
> NULL
>
> $Buy.Sell..Cleared.
> [1] "Buy"
>
> $Volume
> [1] "Buy"
>
> $Price
> NULL
>
> $Net.Charges..sum.
> NULL
>
> That's certainly not what I want.
>
>
>
>
> From: Joris Meys [mailto:jorism...@gmail.com]
> Sent: Thursday, May 27, 2010 8:43 AM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation change elements meeting criteria
>
> The loop is due to the switch statement, not the condition. Without
> condition it would become:
>
> for (i in 1:length(Y)){
> new.vect[i]<-switch(
>   EXPR = X[i],
>   Sell="Buy",
>   Buy="Sell",
>   X[i])
> }
> You can make an sapply construct too off course :
>
> new.vect <- sapply(X[which(Y=="DEL")],switch,Sell="Buy",Buy="Sell")
>
> This will speed up things a little bit, but the effect is marginal.
> Cheers
> Joris
> On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury 
> wrote:
> Thank you for the answer.
> Is there any way to combine if() and switch() in one line? In my case,
> something like :
>
> >if(trade$Trade.Status=="DEL")switch(.)
>
> I would like to avoid the loop .
>
>
>
> From: Joris Meys [mailto:jorism...@gmail.com]
> Sent: Wednesday, May 26, 2010 9:15 PM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation change elements meeting criteria
>
> see ?switch
>
> X<- rep(c("Buy","Sell","something else"),each=5)
> Y<- rep(c("DEL","INS","DEL"),5)
>
>
> new.vect <- X
> for (i in which(Y=="DEL")){
> new.vect[i]<-switch(
>   EXPR = X[i],
>   Sell="Buy",
>   Buy="Sell",
>   X[i])
> }
> cbind(new.vect,X,Y)
> On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury 
> wrote:
> Dear group,
>
> Here is my df :
>
> trade <-
> structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name=
> c("SUGAR NO.11",
> "CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
> "Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
> 2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum.

Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread Joris Meys

Off course. You put in a matrix to sapply, but sapply is for vectors. You
want to apply the switch command on every entry of the vector
trades$Buy.Sell..Cleared for which trades$Trade.Status equals "DEL". Why do
you try to put in a matrix with all variables for the observations where
status is DEL?

You should have done :

tradesnew<-sapply(trades$Buy.Sell..Cleared[which(trades$Trade.Status=="DEL")],
 switch,Sell="Buy",Buy="Sell")

Check the help files, and keep track of what goes in and out a function.

Cheers
Joris

On Thu, May 27, 2010 at 9:41 AM, arnaud Gaboury wrote:

> Joris,
>
> If i pass this line :
>
> >tradesnew<-sapply(trades[which(trades$Trade.Status=="DEL"),],switch,Sel
> >l="Buy",Buy="Sell")
>
> Here is what I get :
>
> > tradesnew
> $Trade.Status
> NULL
>
> $Instrument.Long.Name
> NULL
>
> $Delivery.Prompt.Date
> NULL
>
> $Buy.Sell..Cleared.
> [1] "Buy"
>
> $Volume
> [1] "Buy"
>
> $Price
> NULL
>
> $Net.Charges..sum.
> NULL
>
> That's certainly not what I want.
>
>
>
>
> From: Joris Meys [mailto:jorism...@gmail.com]
> Sent: Thursday, May 27, 2010 8:43 AM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation change elements meeting criteria
>
> The loop is due to the switch statement, not the condition. Without
> condition it would become:
>
> for (i in 1:length(Y)){
> new.vect[i]<-switch(
>   EXPR = X[i],
>   Sell="Buy",
>   Buy="Sell",
>   X[i])
> }
> You can make an sapply construct too off course :
>
> new.vect <- sapply(X[which(Y=="DEL")],switch,Sell="Buy",Buy="Sell")
>
> This will speed up things a little bit, but the effect is marginal.
> Cheers
> Joris
> On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury 
> wrote:
> Thank you for the answer.
> Is there any way to combine if() and switch() in one line? In my case,
> something like :
>
> >if(trade$Trade.Status=="DEL")switch(.)
>
> I would like to avoid the loop .
>
>
>
> From: Joris Meys [mailto:jorism...@gmail.com]
> Sent: Wednesday, May 26, 2010 9:15 PM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation change elements meeting criteria
>
> see ?switch
>
> X<- rep(c("Buy","Sell","something else"),each=5)
> Y<- rep(c("DEL","INS","DEL"),5)
>
>
> new.vect <- X
> for (i in which(Y=="DEL")){
> new.vect[i]<-switch(
>   EXPR = X[i],
>   Sell="Buy",
>   Buy="Sell",
>   X[i])
> }
> cbind(new.vect,X,Y)
> On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury 
> wrote:
> Dear group,
>
> Here is my df :
>
> trade <-
> structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name=
> c("SUGAR NO.11",
> "CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
> "Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
> 2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
> c(4.01,
> -8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
> "Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
> "Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")
>
> Here is what I want :
>
> If trade$Trade.Status=="DEL": then if trade$buy.Sell..Cleared==Sell ,
> change
> it to "Buy", if trade$buy.Sell..Cleared==Buy, change it to "Sell".
> If trade$Trade.Status=="INS", do nothing
> I tried to work around with ifelse, but don't know how to deal with so many
> conditions.
>
> Any help is appreciated.
>
> TY
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> -

Re: [R] data frame manipulation change elements meeting criteria

2010-05-27 Thread arnaud Gaboury

Joris,

If i pass this line :

>tradesnew<-sapply(trades[which(trades$Trade.Status=="DEL"),],switch,Sel
>l="Buy",Buy="Sell")

Here is what I get :

> tradesnew
$Trade.Status
NULL

$Instrument.Long.Name
NULL

$Delivery.Prompt.Date
NULL

$Buy.Sell..Cleared.
[1] "Buy"

$Volume
[1] "Buy"

$Price
NULL

$Net.Charges..sum.
NULL

That's certainly not what I want.

From: Joris Meys [mailto:jorism...@gmail.com] 
Sent: Thursday, May 27, 2010 8:43 AM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

The loop is due to the switch statement, not the condition. Without
condition it would become:

for (i in 1:length(Y)){
    new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
You can make an sapply construct too off course :

new.vect <- sapply(X[which(Y=="DEL")],switch,Sell="Buy",Buy="Sell") 

This will speed up things a little bit, but the effect is marginal.
Cheers
Joris
On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury 
wrote:
Thank you for the answer.
Is there any way to combine if() and switch() in one line? In my case,
something like :

>if(trade$Trade.Status=="DEL")switch(.)

I would like to avoid the loop .

From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Wednesday, May 26, 2010 9:15 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

see ?switch

X<- rep(c("Buy","Sell","something else"),each=5)
Y<- rep(c("DEL","INS","DEL"),5)

new.vect <- X
for (i in which(Y=="DEL")){
    new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
cbind(new.vect,X,Y)
On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury 
wrote:
Dear group,

Here is my df :

trade <-
structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name =
c("SUGAR NO.11",
"CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
"Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
c(4.01,
-8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
"Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
"Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")

Here is what I want :

If trade$Trade.Status=="DEL": then if trade$buy.Sell..Cleared==Sell , change
it to "Buy", if trade$buy.Sell..Cleared==Buy, change it to "Sell".
If trade$Trade.Status=="INS", do nothing
I tried to work around with ifelse, but don't know how to deal with so many
conditions.

Any help is appreciated.

TY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering 
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be 
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation change elements meeting criteria

2010-05-26 Thread arnaud Gaboury

Thank you for the answer.
Is there any way to combine if() and switch() in one line? In my case,
something like :

>if(trade$Trade.Status=="DEL")switch(.)

I would like to avoid the loop .



From: Joris Meys [mailto:jorism...@gmail.com] 
Sent: Wednesday, May 26, 2010 9:15 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation change elements meeting criteria

see ?switch

X<- rep(c("Buy","Sell","something else"),each=5)
Y<- rep(c("DEL","INS","DEL"),5)


new.vect <- X
for (i in which(Y=="DEL")){
    new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
cbind(new.vect,X,Y)
On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury 
wrote:
Dear group,

Here is my df :

trade <-
structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name =
c("SUGAR NO.11",
"CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
"Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
c(4.01,
-8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
"Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
"Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")

Here is what I want :

If trade$Trade.Status=="DEL": then if trade$buy.Sell..Cleared==Sell , change
it to "Buy", if trade$buy.Sell..Cleared==Buy, change it to "Sell".
If trade$Trade.Status=="INS", do nothing
I tried to work around with ifelse, but don't know how to deal with so many
conditions.

Any help is appreciated.

TY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering 
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be 
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation change elements meeting criteria

2010-05-26 Thread Joris Meys

The loop is due to the switch statement, not the condition. Without
condition it would become:

for (i in 1:length(Y)){
new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
You can make an sapply construct too off course :

new.vect <- sapply(X[which(Y=="DEL")],switch,Sell="Buy",Buy="Sell")

This will speed up things a little bit, but the effect is marginal.
Cheers
Joris

On Thu, May 27, 2010 at 8:33 AM, arnaud Gaboury wrote:

> Thank you for the answer.
> Is there any way to combine if() and switch() in one line? In my case,
> something like :
>
> >if(trade$Trade.Status=="DEL")switch(.)
>
> I would like to avoid the loop .
>
>
>
> From: Joris Meys [mailto:jorism...@gmail.com]
> Sent: Wednesday, May 26, 2010 9:15 PM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation change elements meeting criteria
>
> see ?switch
>
> X<- rep(c("Buy","Sell","something else"),each=5)
> Y<- rep(c("DEL","INS","DEL"),5)
>
>
> new.vect <- X
> for (i in which(Y=="DEL")){
> new.vect[i]<-switch(
>   EXPR = X[i],
>   Sell="Buy",
>   Buy="Sell",
>   X[i])
> }
> cbind(new.vect,X,Y)
> On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury 
> wrote:
> Dear group,
>
> Here is my df :
>
> trade <-
> structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name=
> c("SUGAR NO.11",
> "CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
> "Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
> 2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
> c(4.01,
> -8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
> "Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
> "Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")
>
> Here is what I want :
>
> If trade$Trade.Status=="DEL": then if trade$buy.Sell..Cleared==Sell ,
> change
> it to "Buy", if trade$buy.Sell..Cleared==Buy, change it to "Sell".
> If trade$Trade.Status=="INS", do nothing
> I tried to work around with ifelse, but don't know how to deal with so many
> conditions.
>
> Any help is appreciated.
>
> TY
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Joris Meys
> Statistical Consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Applied mathematics, biometrics and process control
>
> Coupure Links 653
> B-9000 Gent
>
> tel : +32 9 264 59 87
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>


-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation change elements meeting criteria

2010-05-26 Thread Joris Meys

see ?switch

X<- rep(c("Buy","Sell","something else"),each=5)
Y<- rep(c("DEL","INS","DEL"),5)


new.vect <- X
for (i in which(Y=="DEL")){
new.vect[i]<-switch(
  EXPR = X[i],
  Sell="Buy",
  Buy="Sell",
  X[i])
}
cbind(new.vect,X,Y)

On Wed, May 26, 2010 at 7:43 PM, arnaud Gaboury wrote:

> Dear group,
>
> Here is my df :
>
> trade <-
> structure(list(Trade.Status = c("DEL", "INS", "INS"), Instrument.Long.Name=
> c("SUGAR NO.11",
> "CORN", "CORN"), Delivery.Prompt.Date = c("Jul/10", "Jul/10",
> "Jul/10"), Buy.Sell..Cleared. = c("Sell", "Buy", "Buy"), Volume = c(1L,
> 2L, 1L), Price = c("15.2500", "368.", "368.5000"), Net.Charges..sum. =
> c(4.01,
> -8.64, -4.32)), .Names = c("Trade.Status", "Instrument.Long.Name",
> "Delivery.Prompt.Date", "Buy.Sell..Cleared.", "Volume", "Price",
> "Net.Charges..sum."), row.names = c(NA, 3L), class = "data.frame")
>
> Here is what I want :
>
> If trade$Trade.Status=="DEL": then if trade$buy.Sell..Cleared==Sell ,
> change
> it to "Buy", if trade$buy.Sell..Cleared==Buy, change it to "Sell".
> If trade$Trade.Status=="INS", do nothing
> I tried to work around with ifelse, but don't know how to deal with so many
> conditions.
>
> Any help is appreciated.
>
> TY
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation and regex

2010-04-28 Thread David Winsemius

On Apr 28, 2010, at 8:30 AM, arnaud Gaboury wrote:

TY so much david. We are getting close. But I need to keep "USD" in my
object name (i.e "STANDARD LEAD USD")

> sub("USD+.*.(.../\\d{2})", "USD", avprix$DESCRIPTION)
[1] "CORN Jul/10""CORN May/10""ROBUSTA  
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC USD"
"STANDARD LEAD USD"

>
I had been attempting  (unsuccessfully to get the portion within hte  
parens to be the replaced string; This also works and has hte side  
effect of keeping hte \n that I had not intended to remove from the  
5th item:

> sub("(USD+.*).../\\d{2}", "\\1", avprix$DESCRIPTION)
[1] "CORN Jul/10""CORN May/10""ROBUSTA  
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC USD\n"  
"STANDARD LEAD USD "

--
David

***
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***********

-----Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Wednesday, April 28, 2010 2:25 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation and regex

On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:

Dear group,

Here is my data.frame :

avprix <-
structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
"ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE
ZINC USD
Jul/10",
"STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
-2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =
c("DESCRIPTION",
"prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")

avprix

DESCRIPTIONprix quantity
1 CORN Jul/10-1.50
2 CORN May/10 -1082.0   -3
3  ROBUSTA COFFEE (10) Jul/10 11084.08
4 SOYBEANS Jul/10  1983.52
5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0   -1
6STANDARD LEAD USD Jul/10  -118.00

I need to remove the date (i.e. Jul/10 in this example) for each
element of
the DESCRIPTION column that contains the USD symbol. I am trying to
do this
using regular expressions, but must admit I am going nowhere.
My elements in the DESCRIPTION column and the dates can change every
day.

This searches for the pattern USD and then replaces any three
characters , forward-slash, any two characters:

sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
[1] "CORN Jul/10""CORN May/10" 
"ROBUSTA

COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "
"STANDARD LEAD "

This tightens up the matching by requiring that that the characters
after the slash be digits:

sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
[1] "CORN Jul/10""CORN May/10" 
"ROBUSTA

COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "
"STANDARD LEAD "

-- David.

TY for any help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-

guide.html

and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation and regex

2010-04-28 Thread Henrique Dallazuanna

Try this:

gsub("(.*)[ \n]\\w{3}/\\d{2}", "\\1", avprix$DESCRIPTION)

On Wed, Apr 28, 2010 at 9:30 AM, arnaud Gaboury wrote:

> TY so much david. We are getting close. But I need to keep "USD" in my
> object name (i.e "STANDARD LEAD USD")
>
>
>
> ***
> Arnaud Gaboury
> Mobile: +41 79 392 79 56
> BBM: 255B488F
> ***
>
>
> > -Original Message-
> > From: David Winsemius [mailto:dwinsem...@comcast.net]
> > Sent: Wednesday, April 28, 2010 2:25 PM
> > To: arnaud Gaboury
> > Cc: r-help@r-project.org
> > Subject: Re: [R] data frame manipulation and regex
> >
> >
> > On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:
> >
> > > Dear group,
> > >
> > > Here is my data.frame :
> > >
> > > avprix <-
> > > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
> > > "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE
> > > ZINC USD
> > > Jul/10",
> > > "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
> > > -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =
> > > c("DESCRIPTION",
> > > "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")
> > >
> > >> avprix
> > >  DESCRIPTIONprix quantity
> > > 1 CORN Jul/10-1.50
> > > 2 CORN May/10 -1082.0   -3
> > > 3  ROBUSTA COFFEE (10) Jul/10 11084.08
> > > 4 SOYBEANS Jul/10  1983.52
> > > 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0   -1
> > > 6STANDARD LEAD USD Jul/10  -118.00
> > >
> > > I need to remove the date (i.e. Jul/10 in this example) for each
> > > element of
> > > the DESCRIPTION column that contains the USD symbol. I am trying to
> > > do this
> > > using regular expressions, but must admit I am going nowhere.
> > > My elements in the DESCRIPTION column and the dates can change every
> > > day.
> >
> > This searches for the pattern USD and then replaces any three
> > characters , forward-slash, any two characters:
> >  > sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
> > [1] "CORN Jul/10""CORN May/10""ROBUSTA
> > COFFEE (10) Jul/10"
> > [4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "
> > "STANDARD LEAD "
> >
> > This tightens up the matching by requiring that that the characters
> > after the slash be digits:
> >
> >  > sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
> > [1] "CORN Jul/10""CORN May/10""ROBUSTA
> > COFFEE (10) Jul/10"
> > [4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "
> > "STANDARD LEAD "
> >
> > -- David.
> >
> >
> >  >
> > >
> > > TY for any help.
> > >
> > > __
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius, MD
> > West Hartford, CT
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation and regex

2010-04-28 Thread arnaud Gaboury

TY so much david. We are getting close. But I need to keep "USD" in my
object name (i.e "STANDARD LEAD USD")



***
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***


> -Original Message-
> From: David Winsemius [mailto:dwinsem...@comcast.net]
> Sent: Wednesday, April 28, 2010 2:25 PM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation and regex
> 
> 
> On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:
> 
> > Dear group,
> >
> > Here is my data.frame :
> >
> > avprix <-
> > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
> > "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE
> > ZINC USD
> > Jul/10",
> > "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
> > -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =
> > c("DESCRIPTION",
> > "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")
> >
> >> avprix
> >  DESCRIPTIONprix quantity
> > 1 CORN Jul/10-1.50
> > 2 CORN May/10 -1082.0   -3
> > 3  ROBUSTA COFFEE (10) Jul/10 11084.08
> > 4 SOYBEANS Jul/10  1983.52
> > 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0   -1
> > 6STANDARD LEAD USD Jul/10  -118.00
> >
> > I need to remove the date (i.e. Jul/10 in this example) for each
> > element of
> > the DESCRIPTION column that contains the USD symbol. I am trying to
> > do this
> > using regular expressions, but must admit I am going nowhere.
> > My elements in the DESCRIPTION column and the dates can change every
> > day.
> 
> This searches for the pattern USD and then replaces any three
> characters , forward-slash, any two characters:
>  > sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
> [1] "CORN Jul/10""CORN May/10""ROBUSTA
> COFFEE (10) Jul/10"
> [4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "
> "STANDARD LEAD "
> 
> This tightens up the matching by requiring that that the characters
> after the slash be digits:
> 
>  > sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
> [1] "CORN Jul/10""CORN May/10""ROBUSTA
> COFFEE (10) Jul/10"
> [4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "
> "STANDARD LEAD "
> 
> -- David.
> 
> 
>  >
> >
> > TY for any help.
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation and regex

2010-04-28 Thread David Winsemius

On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:

Dear group,

Here is my data.frame :

avprix <-
structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
"ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE  
ZINC USD

Jul/10",
"STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
-2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =  
c("DESCRIPTION",

"prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")

avprix

 DESCRIPTIONprix quantity
1 CORN Jul/10-1.50
2 CORN May/10 -1082.0   -3
3  ROBUSTA COFFEE (10) Jul/10 11084.08
4 SOYBEANS Jul/10  1983.52
5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0   -1
6STANDARD LEAD USD Jul/10  -118.00

I need to remove the date (i.e. Jul/10 in this example) for each  
element of
the DESCRIPTION column that contains the USD symbol. I am trying to  
do this

using regular expressions, but must admit I am going nowhere.
My elements in the DESCRIPTION column and the dates can change every  
day.

This searches for the pattern USD and then replaces any three  
characters , forward-slash, any two characters:

> sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
[1] "CORN Jul/10""CORN May/10""ROBUSTA  
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "   
"STANDARD LEAD "

This tightens up the matching by requiring that that the characters  
after the slash be digits:

> sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
[1] "CORN Jul/10""CORN May/10""ROBUSTA  
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10""SPCL HIGH GRADE ZINC "   
"STANDARD LEAD "

-- David.

>

TY for any help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation

2010-04-16 Thread arnaud Gaboury

Excellent!! You saved me hours and hours of turning around and around.
TY so much.





From: Ista Zahn [mailto:istaz...@gmail.com] 
Sent: Friday, April 16, 2010 1:37 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation

It works for me...

> DF <-
+ structure(list(DESCRIPTION = c("PRM HGH GD ALU", "PRM HGH GD ALU",
+ "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL",
+ "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
+ "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
+ "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
+ "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
+ "SPCL HIGH GRAD", "SPCL HIGH GRAD"), CREATED.DATE = structure(c(14708,
+ 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700,
+ 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708,
+ 14708, 14708, 14708, 14622, 14634), class = "Date"), QUANITY = c(-1,
+ 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
+ -1, 1, 1, 1, -1), CLOSING.PRICE = c("2,415.9000", "2,415.9000",
+ "25,755.7100", "25,755.7100", "25,760.8600", "25,760.8600", "2,355.9600",
+ "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600",
+ "2,355.9600", "2,357.1200", "2,420.7300", "2,420.7300", "2,420.7300",
+ "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500",
+ "2,388.4300", "2,388.4300")), .Names = c("DESCRIPTION", "CREATED.DATE",
+ "QUANITY", "CLOSING.PRICE"), row.names = 26:49, class = "data.frame")
> 
> library(plyr)
> 
> op=ddply(DF, c("DESCRIPTION"), summarise, POSITION=
+ sum(QUANITY),DATE=max(CREATED.DATE), SETTLEMENT =
CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)])
> 
> op <- unique(op)
> op
  DESCRIPTION POSITION   DATE  SETTLEMENT
1  PRIMARY NICKEL    0 2010-03-10 25,760.8600
3  PRM HGH GD ALU    0 2010-04-09  2,415.9000
5  SPCL HIGH GRAD    2 2010-04-09  2,421.0500
10 STANDARD LEAD 0 2010-04-06  2,357.1200
> 

-Ista
On Fri, Apr 16, 2010 at 7:21 AM, arnaud Gaboury 
wrote:
When I pass your command line, here is what I get :

>op=ddply(df,c("DESCRIPTION"),summarise,POSITION=sum(QUANITY),DATE=max(CREAT
ED.DATE),SETTLEMENT=CLOSING.PRICE[CREATED.DATE=max(CREATED.DATE)])
> op

    DESCRIPTION POSITION       DATE SETTLEMENT
1 PRIMARY NICKEL        0 2010-03-10       
2 PRM HGH GD ALU        0 2010-04-09       
3 SPCL HIGH GRAD        2 2010-04-09       
4 STANDARD LEAD         0 2010-04-06       


That is exactly what I want, but not with the NA ! the SETTLEMENT column
should show the corresponding CLOSING.PRICE for the CREATED.DATE

***
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***

From: Ista Zahn [mailto:istaz...@gmail.com]
Sent: Friday, April 16, 2010 1:05 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation

Hi,
I'm not sure I understand what you want exactly. My best guess is that you
want something like

op=ddply(DF, c("DESCRIPTION"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE =
CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)])

op <- unique(op)

Does that do it?

-Ista
On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury 
wrote:
Dear group,

Here is my data.frame :


df <-
structure(list(DESCRIPTION = c("PRM HGH GD ALU", "PRM HGH GD ALU",
"PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL",
"STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
"STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
"SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
"SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
"SPCL HIGH GRAD", "SPCL HIGH GRAD"), CREATED.DATE = structure(c(14708,
14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700,
14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708,
14708, 14708, 14708, 14622, 14634), class = "Date"), QUANITY = c(-1,
1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
-1, 1, 1, 1, -1), CLOSING.PRICE = c("

Re: [R] data frame manipulation

2010-04-16 Thread Ista Zahn

It works for me...

> DF <-
+ structure(list(DESCRIPTION = c("PRM HGH GD ALU", "PRM HGH GD ALU",
+ "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL",
+ "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
+ "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
+ "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
+ "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
+ "SPCL HIGH GRAD", "SPCL HIGH GRAD"), CREATED.DATE = structure(c(14708,
+ 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700,
+ 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708,
+ 14708, 14708, 14708, 14622, 14634), class = "Date"), QUANITY = c(-1,
+ 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
+ -1, 1, 1, 1, -1), CLOSING.PRICE = c("2,415.9000", "2,415.9000",
+ "25,755.7100", "25,755.7100", "25,760.8600", "25,760.8600", "2,355.9600",
+ "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600",
+ "2,355.9600", "2,357.1200", "2,420.7300", "2,420.7300", "2,420.7300",
+ "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500",
+ "2,388.4300", "2,388.4300")), .Names = c("DESCRIPTION", "CREATED.DATE",
+ "QUANITY", "CLOSING.PRICE"), row.names = 26:49, class = "data.frame")
>
> library(plyr)
>
> op=ddply(DF, c("DESCRIPTION"), summarise, POSITION=
+ sum(QUANITY),DATE=max(CREATED.DATE), SETTLEMENT =
CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)])
>
> op <- unique(op)
> op
  DESCRIPTION POSITION   DATE  SETTLEMENT
1  PRIMARY NICKEL0 2010-03-10 25,760.8600
3  PRM HGH GD ALU0 2010-04-09  2,415.9000
5  SPCL HIGH GRAD2 2010-04-09  2,421.0500
10 STANDARD LEAD 0 2010-04-06  2,357.1200
>

-Ista

On Fri, Apr 16, 2010 at 7:21 AM, arnaud Gaboury wrote:

> When I pass your command line, here is what I get :
>
>
> >op=ddply(df,c("DESCRIPTION"),summarise,POSITION=sum(QUANITY),DATE=max(CREAT
> ED.DATE),SETTLEMENT=CLOSING.PRICE[CREATED.DATE=max(CREATED.DATE)])
> > op
>
> DESCRIPTION POSITION       DATE SETTLEMENT
> 1 PRIMARY NICKEL0 2010-03-10   
> 2 PRM HGH GD ALU0 2010-04-09   
> 3 SPCL HIGH GRAD2 2010-04-09   
> 4 STANDARD LEAD 0 2010-04-06   
>
>
> That is exactly what I want, but not with the NA ! the SETTLEMENT column
> should show the corresponding CLOSING.PRICE for the CREATED.DATE
>
> ***
> Arnaud Gaboury
> Mobile: +41 79 392 79 56
> BBM: 255B488F
> ***
>
> From: Ista Zahn [mailto:istaz...@gmail.com]
> Sent: Friday, April 16, 2010 1:05 PM
> To: arnaud Gaboury
> Cc: r-help@r-project.org
> Subject: Re: [R] data frame manipulation
>
> Hi,
> I'm not sure I understand what you want exactly. My best guess is that you
> want something like
>
> op=ddply(DF, c("DESCRIPTION"), summarise, POSITION=
> sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE =
> CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)])
>
> op <- unique(op)
>
> Does that do it?
>
> -Ista
> On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury 
> wrote:
> Dear group,
>
> Here is my data.frame :
>
>
> df <-
> structure(list(DESCRIPTION = c("PRM HGH GD ALU", "PRM HGH GD ALU",
> "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL",
> "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
> "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
> "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
> "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
> "SPCL HIGH GRAD", "SPCL HIGH GRAD"), CREATED.DATE = structure(c(14708,
> 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700,
> 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708,
> 14708, 14708, 14708, 14622, 14634), class = "Date"), QUANITY = c(-1,
> 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
> -

Re: [R] data frame manipulation

2010-04-16 Thread arnaud Gaboury

When I pass your command line, here is what I get :

>op=ddply(df,c("DESCRIPTION"),summarise,POSITION=sum(QUANITY),DATE=max(CREAT
ED.DATE),SETTLEMENT=CLOSING.PRICE[CREATED.DATE=max(CREATED.DATE)])
> op

 DESCRIPTION POSITION   DATE SETTLEMENT
1 PRIMARY NICKEL0 2010-03-10   
2 PRM HGH GD ALU0 2010-04-09   
3 SPCL HIGH GRAD2 2010-04-09   
4 STANDARD LEAD 0 2010-04-06   


That is exactly what I want, but not with the NA ! the SETTLEMENT column
should show the corresponding CLOSING.PRICE for the CREATED.DATE

***
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***

From: Ista Zahn [mailto:istaz...@gmail.com] 
Sent: Friday, April 16, 2010 1:05 PM
To: arnaud Gaboury
Cc: r-help@r-project.org
Subject: Re: [R] data frame manipulation

Hi,
I'm not sure I understand what you want exactly. My best guess is that you
want something like

op=ddply(DF, c("DESCRIPTION"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE =
CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)])

op <- unique(op)

Does that do it?

-Ista
On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury 
wrote:
Dear group,

Here is my data.frame :


df <-
structure(list(DESCRIPTION = c("PRM HGH GD ALU", "PRM HGH GD ALU",
"PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL",
"STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
"STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
"SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
"SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
"SPCL HIGH GRAD", "SPCL HIGH GRAD"), CREATED.DATE = structure(c(14708,
14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700,
14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708,
14708, 14708, 14708, 14622, 14634), class = "Date"), QUANITY = c(-1,
1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
-1, 1, 1, 1, -1), CLOSING.PRICE = c("2,415.9000", "2,415.9000",
"25,755.7100", "25,755.7100", "25,760.8600", "25,760.8600", "2,355.9600",
"2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600",
"2,355.9600", "2,357.1200", "2,420.7300", "2,420.7300", "2,420.7300",
"2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500",
"2,388.4300", "2,388.4300")), .Names = c("DESCRIPTION", "CREATED.DATE",
"QUANITY", "CLOSING.PRICE"), row.names = 26:49, class = "data.frame")

I am looking at summarize it in something like this :

> op
    DESCRIPTION POSITION       DATE
1 PRIMARY NICKEL        0 2010-03-10
2 PRM HGH GD ALU        0 2010-04-09
3 SPCL HIGH GRAD        2 2010-04-09
4 STANDARD LEAD         0 2010-04-06



To obtain "op", I wrote this following line :

>   op=ddply(df, c("DESCRIPTION"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE)).

Until there, fine. But I need to have one more column, "CLOSING.PRICE". If I
write this line :


>   op1=ddply(c, c("DESCRIPTION","CLOSING.PRICE"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE))

Here is what I get:


> op1
    DESCRIPTION CLOSING.PRICE POSITION       DATE
1 PRIMARY NICKEL   25,755.7100        0 2010-03-05
2 PRIMARY NICKEL   25,760.8600        0 2010-03-10
3 PRM HGH GD ALU    2,415.9000        0 2010-04-09
4 SPCL HIGH GRAD    2,388.4300        0 2010-01-25
5 SPCL HIGH GRAD    2,420.7300        1 2010-04-08
6 SPCL HIGH GRAD    2,421.0500        1 2010-04-09
7 STANDARD LEAD     2,355.9600       -1 2010-04-01
8 STANDARD LEAD     2,357.1200        1 2010-04-06

Not exactly what I want. Can anyone help?
TY

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation

2010-04-16 Thread Ista Zahn

Hi,
I'm not sure I understand what you want exactly. My best guess is that you
want something like

op=ddply(DF, c("DESCRIPTION"), summarise, POSITION=
sum(QUANITY),DATE=max(CREATED.DATE), CLOSING.PRICE =
CLOSING.PRICE[CREATED.DATE == max(CREATED.DATE)])

op <- unique(op)

Does that do it?

-Ista

On Fri, Apr 16, 2010 at 4:16 AM, arnaud Gaboury wrote:

> Dear group,
>
> Here is my data.frame :
>
>
> df <-
> structure(list(DESCRIPTION = c("PRM HGH GD ALU", "PRM HGH GD ALU",
> "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL", "PRIMARY NICKEL",
> "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
> "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ", "STANDARD LEAD ",
> "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
> "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD", "SPCL HIGH GRAD",
> "SPCL HIGH GRAD", "SPCL HIGH GRAD"), CREATED.DATE = structure(c(14708,
> 14708, 14672, 14673, 14678, 14678, 14700, 14700, 14700, 14700,
> 14700, 14700, 14700, 14705, 14707, 14707, 14707, 14708, 14708,
> 14708, 14708, 14708, 14622, 14634), class = "Date"), QUANITY = c(-1,
> 1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, 1, 1, 1, -1, -1, 1,
> -1, 1, 1, 1, -1), CLOSING.PRICE = c("2,415.9000", "2,415.9000",
> "25,755.7100", "25,755.7100", "25,760.8600", "25,760.8600", "2,355.9600",
> "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600", "2,355.9600",
> "2,355.9600", "2,357.1200", "2,420.7300", "2,420.7300", "2,420.7300",
> "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500", "2,421.0500",
> "2,388.4300", "2,388.4300")), .Names = c("DESCRIPTION", "CREATED.DATE",
> "QUANITY", "CLOSING.PRICE"), row.names = 26:49, class = "data.frame")
>
> I am looking at summarize it in something like this :
>
> > op
> DESCRIPTION POSITION   DATE
> 1 PRIMARY NICKEL0 2010-03-10
> 2 PRM HGH GD ALU0 2010-04-09
> 3 SPCL HIGH GRAD2 2010-04-09
> 4 STANDARD LEAD 0 2010-04-06
>
>
>
> To obtain "op", I wrote this following line :
>
> >   op=ddply(df, c("DESCRIPTION"), summarise, POSITION=
> sum(QUANITY),DATE=max(CREATED.DATE)).
>
> Until there, fine. But I need to have one more column, "CLOSING.PRICE". If
> I
> write this line :
>
>
> >   op1=ddply(c, c("DESCRIPTION","CLOSING.PRICE"), summarise, POSITION=
> sum(QUANITY),DATE=max(CREATED.DATE))
>
> Here is what I get:
>
>
> > op1
> DESCRIPTION CLOSING.PRICE POSITION   DATE
> 1 PRIMARY NICKEL   25,755.71000 2010-03-05
> 2 PRIMARY NICKEL   25,760.86000 2010-03-10
> 3 PRM HGH GD ALU2,415.90000 2010-04-09
> 4 SPCL HIGH GRAD2,388.43000 2010-01-25
> 5 SPCL HIGH GRAD2,420.73001 2010-04-08
> 6 SPCL HIGH GRAD2,421.05001 2010-04-09
> 7 STANDARD LEAD 2,355.9600   -1 2010-04-01
> 8 STANDARD LEAD 2,357.12001 2010-04-06
>
> Not exactly what I want. Can anyone help?
> TY
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation: Time Series

2009-01-27 Thread Josip Dasovic

Hello Jim:

Yes, that's exactly what I needed!

Thank you!

Josip

- Original Message -
From: "jim holtman" 
To: "Josip Dasovic" 
Cc: r-help@r-project.org
Sent: Tuesday, January 27, 2009 4:45:31 PM GMT -08:00 US/Canada Pacific
Subject: Re: [R] Data Frame Manipulation: Time Series

Is the what you are after:

> df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7),
+ rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)),
+ "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3),
+ rep(1,4), rep(0,6), rep(1,3)))
> x <- split(df, df$country)
> do.call(rbind, lapply(x, function(.cty){
+ # create where the war starts
+ .start <- diff(c(0, .cty$war))
+ .cty[(.start == 1) & (.cty$war == 1),]
+ }))
   country year war
Angola.1Angola 1975   1
Angola.8Angola 1982   1
Burundi.10 Burundi 1989   1
Burundi.14 Burundi 1993   1
Chad.17   Chad 1965   1
Chad.27   Chad 1975   1


On Tue, Jan 27, 2009 at 5:45 PM, Josip Dasovic  wrote:
> Dear R Helpers:
>
> I have a data set where the unit of observation is country-year. I would like 
> to generate a new data set based on some inclusionary (exclusionary) 
> criteria. Here is an example of the type of data that I have.
>
> df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7), 
> rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)), 
> "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3), rep(1,4), 
> rep(0,6), rep(1,3)))
>> df
>   country year war
> 1   Angola 1975   1
> 2   Angola 1976   1
> 3   Angola 1977   0
> 4   Angola 1978   0
> 5   Angola 1979   0
> 6   Angola 1980   0
> 7   Angola 1981   0
> 8   Angola 1982   1
> 9   Angola 1983   1
> 10 Burundi 1989   1
> 11 Burundi 1990   1
> 12 Burundi 1991   0
> 13 Burundi 1992   0
> 14 Burundi 1993   1
> 15 Burundi 1994   1
> 16 Burundi 1995   1
> 17Chad 1965   1
> 18Chad 1966   1
> 19Chad 1967   1
> 20Chad 1968   1
> 21Chad 1969   0
> 22Chad 1970   0
> 23Chad 1971   0
> 24Chad 1972   0
> 25Chad 1973   0
> 26Chad 1974   0
> 27Chad 1975   1
> 28Chad 1976   1
> 29Chad 1977   1
>
> What I would like to do is to create a new data frame with only those 
> observations for which a) the "war" variable value is 1, (this ie easy 
> enough) and 2) it is the first (in time) instance of war for that country for 
> that war "episode" (each of the countries above has two war episodes). Thus, 
> the new data frame should look like this:
>
>   country year war
> 1   Angola 1975   1
> 8   Angola 1982   1
> 10 Burundi 1989   1
> 14 Burundi 1993   1
> 17Chad 1965   1
> 27Chad 1975   1
>
> Any suggestions as to how this can be done?
>
> Thanks in advance,
> Josip
>
> R version 2.7.2 Patched (2008-09-20 r47259)
> Mac OSX 10.5.5
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data Frame Manipulation: Time Series

2009-01-27 Thread jim holtman

Is the what you are after:

> df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7),
+ rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)),
+ "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3),
+ rep(1,4), rep(0,6), rep(1,3)))
> x <- split(df, df$country)
> do.call(rbind, lapply(x, function(.cty){
+ # create where the war starts
+ .start <- diff(c(0, .cty$war))
+ .cty[(.start == 1) & (.cty$war == 1),]
+ }))
   country year war
Angola.1Angola 1975   1
Angola.8Angola 1982   1
Burundi.10 Burundi 1989   1
Burundi.14 Burundi 1993   1
Chad.17   Chad 1965   1
Chad.27   Chad 1975   1


On Tue, Jan 27, 2009 at 5:45 PM, Josip Dasovic  wrote:
> Dear R Helpers:
>
> I have a data set where the unit of observation is country-year. I would like 
> to generate a new data set based on some inclusionary (exclusionary) 
> criteria. Here is an example of the type of data that I have.
>
> df<-data.frame(cbind("country"=c(rep("Angola", 9), rep("Burundi", 7), 
> rep("Chad", 13)), "year"=c(1975:1983, 1989:1995, 1965:1977)), 
> "war"=c(rep(1,2), rep(0,5), rep(1,2), rep(1,2), rep(0,2), rep(1,3), rep(1,4), 
> rep(0,6), rep(1,3)))
>> df
>   country year war
> 1   Angola 1975   1
> 2   Angola 1976   1
> 3   Angola 1977   0
> 4   Angola 1978   0
> 5   Angola 1979   0
> 6   Angola 1980   0
> 7   Angola 1981   0
> 8   Angola 1982   1
> 9   Angola 1983   1
> 10 Burundi 1989   1
> 11 Burundi 1990   1
> 12 Burundi 1991   0
> 13 Burundi 1992   0
> 14 Burundi 1993   1
> 15 Burundi 1994   1
> 16 Burundi 1995   1
> 17Chad 1965   1
> 18Chad 1966   1
> 19Chad 1967   1
> 20Chad 1968   1
> 21Chad 1969   0
> 22Chad 1970   0
> 23Chad 1971   0
> 24Chad 1972   0
> 25Chad 1973   0
> 26Chad 1974   0
> 27Chad 1975   1
> 28Chad 1976   1
> 29Chad 1977   1
>
> What I would like to do is to create a new data frame with only those 
> observations for which a) the "war" variable value is 1, (this ie easy 
> enough) and 2) it is the first (in time) instance of war for that country for 
> that war "episode" (each of the countries above has two war episodes). Thus, 
> the new data frame should look like this:
>
>   country year war
> 1   Angola 1975   1
> 8   Angola 1982   1
> 10 Burundi 1989   1
> 14 Burundi 1993   1
> 17Chad 1965   1
> 27Chad 1975   1
>
> Any suggestions as to how this can be done?
>
> Thanks in advance,
> Josip
>
> R version 2.7.2 Patched (2008-09-20 r47259)
> Mac OSX 10.5.5
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation - splitting monitoring interval and assigning stage

2008-06-26 Thread Jessi Brown

I'd like to thank those who contacted me with ideas on how to solve
this little problem. I learned something from looking through each
snippet of code, even if it wasn't doing quite what I'd hoped it would
do. Mark Leeds deserves special thanks, for helping me debug my
several attempts to "improve" the function.

Here's what I've settled on, for the record:

> DFAmke<-data.frame(Check1=c(113, 148, 117, 122, 120, 154), Check2=c(148, 170, 
> 122, 129, 154, 175),
+ HatchDate=c(148, 148, NA, NA, 153, 153))
>
> DFAmke
  Check1 Check2 HatchDate
1113148   148
2148170   148
3117122NA
4122129NA
5120154   153
6154175   153
>
>
> final<-do.call(rbind, lapply(1:nrow(DFAmke), function(.index) {
+temp <- DFAmke[.index,]
+# what to do in case of missing values in HatchDate
+if (is.na(temp$HatchDate)){
+   temp$Stage<-"I"
+   temp
+# checking if entire interval is past hatch date and
in Brood stage
+} else if ( DFAmke$Check1[.index] >=
DFAmke$HatchDate[.index] ) {
+temp$Stage<-"B"
+temp
+# checking if entire interval is before hatch date
and in Incubation
+} else if ( DFAmke$Check1[.index] <
DFAmke$HatchDate[.index] && DFAmke$Check2[.index] <=
+ DFAmke$HatchDate[.index] ) {
+temp$Stage<-"I"
+temp
+# splitting remaining cases into two intervals
+} else if ( DFAmke$Check1[.index] <
DFAmke$HatchDate[.index] && DFAmke$Check2[.index] >
+ DFAmke$HatchDate[.index] ) {
+temp<-rbind(temp,temp)
+savecheck2<-temp$Check2[1]
+temp$Check2[1]<- temp$HatchDate[1]
+temp$Stage[1]<- "I"
+temp$Check1[2]<-temp$HatchDate[2]
+temp$Check2[2]<- savecheck2
+temp$Stage[2]<- "B"
+temp
+}}))
> final
   Check1 Check2 HatchDate Stage
1 113148   148 I
2 148170   148 B
3 117122NA I
4 122129NA I
5 120153   153 I
51153154   153 B
6 154175   153 B


I'm sure there are many other ways to accomplish my goal, but the
above works just fine. Thanks again to everyone!

cheers, Jessi Brown

> On Wed, Jun 25, 2008 at 1:29 PM, Jessi Brown <[EMAIL PROTECTED]> wrote:
>> Hello, everyone.
>>
>> I'm hoping to prevent myself from doing a lot of pointing and clicking
>> in Excel. I have a dataframe of bird nest check observations, in which
>> I know the date of the first check, the date of the second check (both
>> currently in Julian date format), the status of the nest at the second
>> check (alive or failed), and the date that the nest hatched (i.e.
>> changed from Incubation stages to Brood-rearing stage). Many nests
>> have more than one record, as there were several nest checks
>> throughout the duration of the nesting attempt.
>>
>> What I want to do is assign a nest Stage variable, either Incubation
>> or Brood-rearing. It's very easy to do so when the second nest check
>> was before the hatch date (incubation), or when the first nest check
>> was after the hatch date (brood-rearing). But I can't figure out a
>> quick way to split the interval when it contained both incubation and
>> brood-rearing activities.
>>
>> I'd like to go from:
>>
>> Check1 Check2 HatchDate
>> 101   121   110
>>
>> to:
>>
>> Check1 Check2 HatchDateStage
>> 101   109  110   I
>> 110   121  110   B
>>
>> because even though the nest wasn't actually checked on the day of
>> hatching, we know that it transitioned to the next stage on hatch day.
>>
>> There's other covariates as well as the unique nest ID which need to
>> be carried along too, just like HatchDate.
>>
>> If anyone who is good at dataframe manipulation could suggest some
>> code to perform these actions, I would really appreciate it. Thanks in
>> advance.
>>
>>
>> cheers, Jessi Brown
>> Ecology, Evolution, and Conservation Biology
>> University of Nevada, Reno
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame manipulation - splitting monitoring interval and assigning stage

2008-06-25 Thread jim holtman

Is this what you want:

> x <- read.table(textConnection("Check1 Check2 HatchDate
+ 101   121   110
+ 130  150 140
+ 140  150  160"), header=TRUE)
> closeAllConnections()
> x
  Check1 Check2 HatchDate
1101121   110
2130150   140
3140150   160
> do.call(rbind, apply(x,1,function(.row){
+ if (.row[3] < .row[2]){
+ # split out the data
+ data.frame(Check1=c(.row[1], .row[3]),
+Check2=c(.row[3] - 1, .row[2]),
+Check3=c(.row[3], .row[3]),
+Stage=c("I", "B"))
+ }
+ else {
+ # normal; just copy over
+ data.frame(Check1=.row[1], Check2=.row[2], Check3=.row[3], Stage="X")
+ }
+ }))
   Check1 Check2 Check3 Stage
Check1101109110 I
HatchDate 110121110 B
Check11   130139140 I
HatchDate1140150140 B
Check12   140150160 X


On Wed, Jun 25, 2008 at 1:29 PM, Jessi Brown <[EMAIL PROTECTED]> wrote:
> Hello, everyone.
>
> I'm hoping to prevent myself from doing a lot of pointing and clicking
> in Excel. I have a dataframe of bird nest check observations, in which
> I know the date of the first check, the date of the second check (both
> currently in Julian date format), the status of the nest at the second
> check (alive or failed), and the date that the nest hatched (i.e.
> changed from Incubation stages to Brood-rearing stage). Many nests
> have more than one record, as there were several nest checks
> throughout the duration of the nesting attempt.
>
> What I want to do is assign a nest Stage variable, either Incubation
> or Brood-rearing. It's very easy to do so when the second nest check
> was before the hatch date (incubation), or when the first nest check
> was after the hatch date (brood-rearing). But I can't figure out a
> quick way to split the interval when it contained both incubation and
> brood-rearing activities.
>
> I'd like to go from:
>
> Check1 Check2 HatchDate
> 101   121   110
>
> to:
>
> Check1 Check2 HatchDateStage
> 101   109  110   I
> 110   121  110   B
>
> because even though the nest wasn't actually checked on the day of
> hatching, we know that it transitioned to the next stage on hatch day.
>
> There's other covariates as well as the unique nest ID which need to
> be carried along too, just like HatchDate.
>
> If anyone who is good at dataframe manipulation could suggest some
> code to perform these actions, I would really appreciate it. Thanks in
> advance.
>
>
> cheers, Jessi Brown
> Ecology, Evolution, and Conservation Biology
> University of Nevada, Reno
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Data frame manipulation - newbie question

2008-01-06 Thread jim holtman

There are a number of different ways that you would have to manipulate
your data to do what you want.  It is useful to learn some of these
techniques.  Here, I think, are the set of actions that you want to
do.

> x <- read.table(textConnection("row  k.idx  step.forwd   pt.nummodel  
>  prev   valueabs.error
+ 1  2000  1 lm  09
 10.5   1.5
+ 2  2000  2 lm  11
10.5   1.5
+ 3  2011  1 lm  10
12  2.0
+ 4  2011  2 lm  12
12  2.0
+ 5  2022  1 lm  12
12.1   0.1
+ 6  2022  2 lm  12
12.1   0.1
+ 7  2000  1 rlm 10.1
10.5   0.4
+ 8  2000  2 rlm 10.3
10.5   0.2
+ 9  2011  1 rlm 11.6
12  0.4
+ 102011  2 rlm 11.4
12  0.6
+ 112022  1 rlm 11.8
12.1   0.1
+ 122022  2 rlm 11.9
12.1   0.2"), header=TRUE)
> closeAllConnections()
>
> # split the data by the grouping factors
> x.split <- split(x, list(x$k.idx, x$step.forwd, x$model), drop=TRUE)
> x.split
$`200.0.lm`
  row k.idx step.forwd pt.num model prev value abs.error
1   1   200  0  1lm9  10.5   1.5
2   2   200  0  2lm   11  10.5   1.5

$`201.1.lm`
  row k.idx step.forwd pt.num model prev value abs.error
3   3   201  1  1lm   1012 2
4   4   201  1  2lm   1212 2

$`202.2.lm`
  row k.idx step.forwd pt.num model prev value abs.error
5   5   202  2  1lm   12  12.1   0.1
6   6   202  2  2lm   12  12.1   0.1

$`200.0.rlm`
  row k.idx step.forwd pt.num model prev value abs.error
7   7   200  0  1   rlm 10.1  10.5   0.4
8   8   200  0  2   rlm 10.3  10.5   0.2

$`201.1.rlm`
   row k.idx step.forwd pt.num model prev value abs.error
99   201  1  1   rlm 11.612   0.4
10  10   201  1  2   rlm 11.412   0.6

$`202.2.rlm`
   row k.idx step.forwd pt.num model prev value abs.error
11  11   202  2  1   rlm 11.8  12.1   0.1
12  12   202  2  2   rlm 11.9  12.1   0.2

>
> # now take the means of given columns
> x.mean <- lapply(x.split, function(.grp) colMeans(.grp[, c('prev', 'value', 
> 'abs.error')]))
>
> # put back into a matrix
> (x.mean <- do.call(rbind, x.mean))
   prev value abs.error
200.0.lm  10.00  10.5  1.50
201.1.lm  11.00  12.0  2.00
202.2.lm  12.00  12.1  0.10
200.0.rlm 10.20  10.5  0.30
201.1.rlm 11.50  12.0  0.50
202.2.rlm 11.85  12.1  0.15
>
> #boxplot
> boxplot(abs.error ~ k.idx, data=x)
>
> # create a table with average of the abs.error for each 'model'
> cbind(x, abs.error.mean=ave(x$abs.error, x$model))
   row k.idx step.forwd pt.num model prev value abs.error abs.error.mean
11   200  0  1lm  9.0  10.5   1.5  1.200
22   200  0  2lm 11.0  10.5   1.5  1.200
33   201  1  1lm 10.0  12.0   2.0  1.200
44   201  1  2lm 12.0  12.0   2.0  1.200
55   202  2  1lm 12.0  12.1   0.1  1.200
66   202  2  2lm 12.0  12.1   0.1  1.200
77   200  0  1   rlm 10.1  10.5   0.4  0.317
88   200  0  2   rlm 10.3  10.5   0.2  0.317
99   201  1  1   rlm 11.6  12.0   0.4  0.317
10  10   201  1  2   rlm 11.4  12.0   0.6  0.317
11  11   202  2  1   rlm 11.8  12.1   0.1  0.317
12  12   202  2  2   rlm 11.9  12.1   0.2  0.317
>


On Jan 6, 2008 10:50 AM, Rense Nieuwenhuis <[EMAIL PROTECTED]> wrote:
> Hi,
>
> you may want to use that apply / tapply function. Some find it a bit
> hard to grasp at first, but it will help you many times in many
> situations when you get the hang of it.
>
> Maybe you can get some information on my site: http://
> www.rensenieuwenhuis.nl/r-project/manual/basics/tables/
>
>
> Hope this helps,
>
> Rense Nieuwenhuis
>
>
>
> On Jan 3, 2008, at 11:53 , José Augusto M. de Andrade Junior wrote:
>
> > Hi all,
> >
> > Could someone please explain how can i efficientily query a data frame
> > with several factors, as shown below:
> >
> > --
> > ---
> > Data frame: pt.knn
> > ---

Re: [R] Data frame manipulation - newbie question

2008-01-06 Thread Rense Nieuwenhuis

Hi,

you may want to use that apply / tapply function. Some find it a bit  
hard to grasp at first, but it will help you many times in many  
situations when you get the hang of it.

Maybe you can get some information on my site: http:// 
www.rensenieuwenhuis.nl/r-project/manual/basics/tables/


Hope this helps,

Rense Nieuwenhuis



On Jan 3, 2008, at 11:53 , José Augusto M. de Andrade Junior wrote:

> Hi all,
>
> Could someone please explain how can i efficientily query a data frame
> with several factors, as shown below:
>
> -- 
> ---
> Data frame: pt.knn
> -- 
> ---
> row | k.idx   |   step.forwd  |  pt.num |   model |   prev  |  value
> |  abs.error
> 1  2000  1 lm  09
> 10.5   1.5
> 2  2000  2 lm  11
> 10.5   1.5
> 3  2011  1 lm  10
> 12  2.0
> 4  2011  2 lm  12
> 12  2.0
> 5  2022  1 lm  12
> 12.1   0.1
> 6  2022  2 lm  12
> 12.1   0.1
> 7  2000  1 rlm 10.1
> 10.5   0.4
> 8  2000  2 rlm 10.3
> 10.5   0.2
> 9  2011  1 rlm 11.6
> 12  0.4
> 102011  2 rlm 11.4
> 12  0.6
> 112022  1 rlm 11.8
> 12.1   0.1
> 122022  2 rlm 11.9
> 12.1   0.2
> -- 
> 
>
> k.idx, step.forwd, pt.num and model columns are FACTORS.
> prev, value, abs.error are numeric
>
> I need to take the mean value of the numeric columns  (prev, value and
> abs.error) for each k.idx and step.forwd and model. So: rows 1 and 2,
> 3 and 4, 5 and 6,7 and 8, 9 and 10, 11 and 12 must be grouped
> together.
>
> Next, i need to plot a boxplot of the mean(abs.error) of each model
> for each k.idx.
> I need to compare the abs.error of the two models for each step and
> the mean overall abs.error of each model. And so on.
>
> I read the manuals, but the examples there are too simple. I know how
> to do this manipulation in a "brute force" manner, but i wish to learn
> how to work the right way with R.
>
> Could someone help me?
> Thanks in advance.
>
> José Augusto
> Undergraduate student
> University of São Paulo
> Business Administration Faculty
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

63 matches

Mail list logo