[R] Banner using R

2019-01-15 Thread Luca Meyer
Hi,

I am a bit rusty with R programming and I would appreciate some assistance with 
the following.

I have a dataset like:

Data <- data.frame(v1 = c('A', 'B' ,'B' ,'A', 'B'), v2 =c('A', 'B', 'A', 'A', 
'B'), v3 = c('A', 'A', 'A', 'A', 'A’))

How can I get a banner of the sort?

Count   v1  v2  v3  TOT
A   2   3   5   10
B   3   2   0   5

I have tried with xtabs and expss but I do not seem to get what I need...

Thanks,

Luca

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to visualise what code is processed within a for loop

2018-04-30 Thread Luca Meyer
Thank you for both replies Don & Rui,

The very issue here is that there is a search that needs to be done within
a text field and I agree with Rui later comment that regexpr might indeed
be the time consuming piece of code.

I might try to optimise this piece of code later on, but for the time being
I am working on the following part of building a neural network to try
indeed classifying some text.

Again, thanks,

Luca

2018-04-30 17:25 GMT+02:00 MacQueen, Don :

> Luca,
>
>
>
> If speed is important, you might improve performance by making d0 into a
> true matrix, rather than a data frame (assuming d0 is indeed a data frame
> at this point). Although data frames may look like matrices, they aren’t,
> and they have some overhead that matrices don’t.  I don’t think you would
> be able to use the [[nm]] syntax with a matrix, but [ , nm] should work,
> provided the matrix has column names. Or you could perhaps index by column
> number.
>
>
>
> I had a project some years ago in which I reduced calculation time a lot
> by extracting the numeric columns of a data frame and working with them,
> then recombining them with the character columns. R’s performance working
> with data frames has improved since then, so I really don’t know if it
> would make a difference for your task.
>
>
>
> -Don
>
>
>
> --
>
> Don MacQueen
>
> Lawrence Livermore National Laboratory
>
> 7000 East Ave., L-627
>
> Livermore, CA 94550
>
> 925-423-1062
>
> Lab cell 925-724-7509
>
>
>
>
>
> *From: *Luca Meyer 
> *Date: *Monday, April 30, 2018 at 8:08 AM
> *To: *Rui Barradas 
> *Cc: *"MacQueen, Don" , array R-help <
> r-help@r-project.org>
> *Subject: *Re: [R] How to visualise what code is processed within a for
> loop
>
>
>
> Hi Rui
>
> Thank you for your suggestion,
>
>
>
> I have tested the code suggested by you against that supplied by Don in
> terms of timing and results are very much aligned: to populate a 5954x899
> 0/1 matrix on my machine your procedure took 79 secs, while the one with
> ifelse employed 80 secs, hence unfortunately not really any significant
> time saved there.
>
> Nevertheless thank you for your contribution.
>
> Kind regards,
>
>
>
> Luca
>
>
>
> 2018-04-28 23:18 GMT+02:00 Rui Barradas :
>
> I forgot to explain why my suggestion.
>
> The logical condition returns FALSE/TRUE that in R are coded as 0/1.
> So all you have to do is coerce to integer.
>
> This works because the ifelse will return a 1 or a 0 depending on the
> condition. Meaning exactly the same values. And is more efficient since
> ifelse creates both vectors, the true part and the false part, and then
> indexes those vectors in order to return the appropriate values. This is
> the double of the trouble and a great deal of memory used.
>
> Rui Barradas
>
> On 4/28/2018 10:12 PM, Rui Barradas wrote:
>
> Hello,
>
> instead of ifelse, the following is exactly the same and much more
> efficient.
>
> d0[[nm]] <- as.integer(regexpr(d1[i,1], d0$X0) > 0)
>
>
> Hope this helps,
>
> Rui Barradas
>
> On 4/28/2018 8:45 PM, Luca Meyer wrote:
>
> Thanks Don,
>
>  for (i in 1:10){
>nm <- paste0("V", i)
>d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
>  }
>
> is exaclty what I needed.
>
> Best regards,
>
> Luca
>
>
> 2018-04-25 23:03 GMT+02:00 MacQueen, Don :
>
> Your code doesn't make sense to me in a couple of ways.
>
> Inside the loop, the first line assigns a value to an object named "t".
> Then, the second line does the same thing, assigns a value to an object
> named "t".
>
> The value of the object named "t" after the second line will be the output
> of the ifelse() expression, whatever that is. This has the effect of making
> the first line irrelevant. Whatever value t has after the first line is
> replaced by whatever it gets from the second line.
>
> It looks like the first line inside the loop is constructing the name of a
> data frame column, and storing that name as a character string. However,
> the second line doesn't use that name at all. If your goal is to update the
> contents of a column, you need to assign something to that column in the
> next line. Instead you assign it to the object named "t".
>
> What you're looking for will be more along the lines of this:
>
>  for (i in 1:10){
>nm <- paste0("V", i)
>d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
>  }
>
> This may not a complete solution, since I have no idea what the con

Re: [R] How to visualise what code is processed within a for loop

2018-04-30 Thread Luca Meyer
Hi Rui

Thank you for your suggestion,

I have tested the code suggested by you against that supplied by Don in
terms of timing and results are very much aligned: to populate a 5954x899
0/1 matrix on my machine your procedure took 79 secs, while the one with
ifelse employed 80 secs, hence unfortunately not really any significant
time saved there.

Nevertheless thank you for your contribution.

Kind regards,

Luca

2018-04-28 23:18 GMT+02:00 Rui Barradas :

> I forgot to explain why my suggestion.
>
> The logical condition returns FALSE/TRUE that in R are coded as 0/1.
> So all you have to do is coerce to integer.
>
> This works because the ifelse will return a 1 or a 0 depending on the
> condition. Meaning exactly the same values. And is more efficient since
> ifelse creates both vectors, the true part and the false part, and then
> indexes those vectors in order to return the appropriate values. This is
> the double of the trouble and a great deal of memory used.
>
> Rui Barradas
>
> On 4/28/2018 10:12 PM, Rui Barradas wrote:
>
>> Hello,
>>
>> instead of ifelse, the following is exactly the same and much more
>> efficient.
>>
>> d0[[nm]] <- as.integer(regexpr(d1[i,1], d0$X0) > 0)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> On 4/28/2018 8:45 PM, Luca Meyer wrote:
>>
>>> Thanks Don,
>>>
>>>  for (i in 1:10){
>>>nm <- paste0("V", i)
>>>d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
>>>  }
>>>
>>> is exaclty what I needed.
>>>
>>> Best regards,
>>>
>>> Luca
>>>
>>>
>>> 2018-04-25 23:03 GMT+02:00 MacQueen, Don :
>>>
>>> Your code doesn't make sense to me in a couple of ways.
>>>>
>>>> Inside the loop, the first line assigns a value to an object named "t".
>>>> Then, the second line does the same thing, assigns a value to an object
>>>> named "t".
>>>>
>>>> The value of the object named "t" after the second line will be the
>>>> output
>>>> of the ifelse() expression, whatever that is. This has the effect of
>>>> making
>>>> the first line irrelevant. Whatever value t has after the first line is
>>>> replaced by whatever it gets from the second line.
>>>>
>>>> It looks like the first line inside the loop is constructing the name
>>>> of a
>>>> data frame column, and storing that name as a character string. However,
>>>> the second line doesn't use that name at all. If your goal is to update
>>>> the
>>>> contents of a column, you need to assign something to that column in the
>>>> next line. Instead you assign it to the object named "t".
>>>>
>>>> What you're looking for will be more along the lines of this:
>>>>
>>>>  for (i in 1:10){
>>>>nm <- paste0("V", i)
>>>>d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
>>>>  }
>>>>
>>>> This may not a complete solution, since I have no idea what the contents
>>>> or structure of d1 are, or what the regexpr() is expected to return.
>>>>
>>>> And notice the use of double brackets, [[ and ]]. This is one way to
>>>> reference a column of a  data frame when you have the column's name
>>>> stored
>>>> in a variable. Another way is d0[ , nm]
>>>>
>>>>
>>>> A couple of additional comments:
>>>>
>>>>   "t" is a poor choice of object name, because it is one of R's built-in
>>>> functions (immediately after starting a fresh session of R, with nothing
>>>> left over from any previous session, type help("r") and see what you
>>>> get).
>>>>
>>>>   ifelse() is intended for use on vectors, not scalars, and it looks
>>>> like
>>>> maybe you're using it on a scalar (can't be sure about this, though)
>>>>
>>>> For example, ifelse() is designed for this kind of usage:
>>>>
>>>>> ifelse( c(TRUE, FALSE, TRUE) , 1:3, 11:13)
>>>>>
>>>> [1]  1 12  3
>>>>
>>>> Although it works ok for these
>>>>
>>>>> ifelse(TRUE, 3, 4)
>>>>>
>>>> [1] 3
>>>>
>>>>> ifelse(FALSE, 3, 4)
>>>>>

Re: [R] How to visualise what code is processed within a for loop

2018-04-28 Thread Luca Meyer
Thanks Don,

for (i in 1:10){
  nm <- paste0("V", i)
  d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
}

is exaclty what I needed.

Best regards,

Luca


2018-04-25 23:03 GMT+02:00 MacQueen, Don :

> Your code doesn't make sense to me in a couple of ways.
>
> Inside the loop, the first line assigns a value to an object named "t".
> Then, the second line does the same thing, assigns a value to an object
> named "t".
>
> The value of the object named "t" after the second line will be the output
> of the ifelse() expression, whatever that is. This has the effect of making
> the first line irrelevant. Whatever value t has after the first line is
> replaced by whatever it gets from the second line.
>
> It looks like the first line inside the loop is constructing the name of a
> data frame column, and storing that name as a character string. However,
> the second line doesn't use that name at all. If your goal is to update the
> contents of a column, you need to assign something to that column in the
> next line. Instead you assign it to the object named "t".
>
> What you're looking for will be more along the lines of this:
>
> for (i in 1:10){
>   nm <- paste0("V", i)
>   d0[[nm]] <- ifelse( regexpr(d1[i,1], d0$X0) > 0, 1, 0)
> }
>
> This may not a complete solution, since I have no idea what the contents
> or structure of d1 are, or what the regexpr() is expected to return.
>
> And notice the use of double brackets, [[ and ]]. This is one way to
> reference a column of a  data frame when you have the column's name stored
> in a variable. Another way is d0[ , nm]
>
>
> A couple of additional comments:
>
>  "t" is a poor choice of object name, because it is one of R's built-in
> functions (immediately after starting a fresh session of R, with nothing
> left over from any previous session, type help("r") and see what you get).
>
>  ifelse() is intended for use on vectors, not scalars, and it looks like
> maybe you're using it on a scalar (can't be sure about this, though)
>
> For example, ifelse() is designed for this kind of usage:
> > ifelse( c(TRUE, FALSE, TRUE) , 1:3, 11:13)
> [1]  1 12  3
>
> Although it works ok for these
> > ifelse(TRUE, 3, 4)
> [1] 3
> > ifelse(FALSE, 3, 4)
> [1] 4
> They are not really what it is intended for.
>
> --
> Don MacQueen
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
> Lab cell 925-724-7509
>
>
> On 4/24/18, 12:30 AM, "R-help on behalf of Luca Meyer" <
> r-help-boun...@r-project.org on behalf of lucam1...@gmail.com> wrote:
>
> Hi,
>
> I am trying to debug the following code:
>
> for (i in 1:10){
>   t <- paste("d0$V",i,sep="")
>   t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0)
> }
>
> and I would like to see what code is actually processing R, how can I
> do
> that?
>
> More to the point, I am trying to update my variables d0$V1 to d0$V10
> according to the presence or absence of some text (contained in the
> file
> d1) within the d0$X0 variable.
>
> The code seem to run ok, if I add print(table(t)) within the loop I
> can see
> that the ifelse procedure is working and to some cases within the
> d0$V1 to
> d0$V10 variable range a 1 is assigned. But when checking my d0$V1 to
> d0$V10
> after the for loop they are all still equal to zero...
>
> Thanks,
>
> Luca
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to visualise what code is processed within a for loop

2018-04-24 Thread Luca Meyer
Hi Bob,

Thank you for your suggestion. Actually d0 is a dataframe, does that change
something in the code you propose?

Kind regards,

Luca

2018-04-24 10:19 GMT+02:00 Bob O'Hara :

> The loop never assigns anything to d0, only t. The first line makes t
> a character string "d0$V1" (or "d0$V2" etc.). The second line assigns
> either 0 or 1 to t.
>
> Looking at this, I don't think you've got into the R psychology (bad
> news if you want to use R, good news in many other ways). I assume d0
> is a list, so could you put the V's into a vector, and then just use
> this:
>
> d0$V <- sapply(d1[1:10,1], grepl, d0$X0)
>
> (I haven't checked it, but it looks,like it will do the trick. It
> returns a logical vector, so if you need integers, then use an
> as.numeric() around the right hand side. Or hope that R does type
> conversion for you when you need it)
>
> HTH
>
> Bob
>
> On 24 April 2018 at 09:30, Luca Meyer  wrote:
> > Hi,
> >
> > I am trying to debug the following code:
> >
> > for (i in 1:10){
> >   t <- paste("d0$V",i,sep="")
> >   t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0)
> > }
> >
> > and I would like to see what code is actually processing R, how can I do
> > that?
> >
> > More to the point, I am trying to update my variables d0$V1 to d0$V10
> > according to the presence or absence of some text (contained in the file
> > d1) within the d0$X0 variable.
> >
> > The code seem to run ok, if I add print(table(t)) within the loop I can
> see
> > that the ifelse procedure is working and to some cases within the d0$V1
> to
> > d0$V10 variable range a 1 is assigned. But when checking my d0$V1 to
> d0$V10
> > after the for loop they are all still equal to zero...
> >
> > Thanks,
> >
> > Luca
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Bob O'Hara
> NOTE NEW ADDRESS!!!
> Institutt for matematiske fag
> NTNU
> 7491 Trondheim
> Norway
>
> Mobile: +49 1515 888 5440
> Journal of Negative Results - EEB: www.jnr-eeb.org
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to visualise what code is processed within a for loop

2018-04-24 Thread Luca Meyer
Hi,

I am trying to debug the following code:

for (i in 1:10){
  t <- paste("d0$V",i,sep="")
  t <- ifelse(regexpr(d1[i,1],d0$X0)>0,1,0)
}

and I would like to see what code is actually processing R, how can I do
that?

More to the point, I am trying to update my variables d0$V1 to d0$V10
according to the presence or absence of some text (contained in the file
d1) within the d0$X0 variable.

The code seem to run ok, if I add print(table(t)) within the loop I can see
that the ifelse procedure is working and to some cases within the d0$V1 to
d0$V10 variable range a 1 is assigned. But when checking my d0$V1 to d0$V10
after the for loop they are all still equal to zero...

Thanks,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to dynamically add variables to a dataframe

2018-04-22 Thread Luca Meyer
Hi,

I am a bit rusty with R programming and do not seem to find a solution to
add a number of variables to my existing dataframe. Basically I need to add
n=dim(d1)[1] variables to my d0 dataframe and I would like them to be named
V1, V2, V3, ... , V[dim(d1)[1])

When running the following code:

for (t in 1:dim(d1)[1]){
  d0$V[t] <- 0
}

all I get is a V variable populated with zeros...

I am sure there is a fairly straightforward code to accomplish what I need,
any suggestion?

Thank you,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to integrate a dynamic code within a R script?

2018-03-24 Thread Luca Meyer
Hi,

I am working on a script which should includes a dynamic listing, i.e.

# SCRIPT BEGINS

# some R procedures here

# DYNAMIC PART BEGINS
d1$X5 <-f1("AAA")
d1$X5 <-f1("AAa")
d1$X5 <-f1("ABa")
# etc...
d1$X6 <-f2("AAA")
d1$X6 <-f2("AAs")
d1$X6 <-f2("ABs")
# etc...
# DYNAMIC PART ENDS

# other procedures here

# SCRIPT ENDS

Basically I have an Excel page with a quite long listing of "AAA", "AAa",
"ABa", "ccc", "Ded", etc, one entry on each line. The listing is likely to
change over time and the script will run at least once a day.

My initial planning was to do something like

f1 <- read.xlsx("LIST.xlsx",1, startRow=2, colNames = F)
f1$X2 <- paste('d1$X5 <-f1("',f1$X1,'")', sep='')
f1$X3 <- paste('d1$X6 <-f2("',f1$X1,'")', sep='')

and I obtain something like
   X1   X2X3
1 AAA d1$X5 <-f1("AAA") d1$X6 <-f2("AAA")
2 AAa d1$X5 <-f1("AAa") d1$X6 <-f2("AAs")
3 ABa d1$X5 <-f1("ABa") d1$X6 <-f2("ABs")

How can I integrate the above in the DYNAMIC PART of my script above? I am
sure there is a pretty simple solution but I seem not to get around to it.

Thanks,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Writing text files out of a dataset

2017-12-29 Thread Luca Meyer
Hello,

I am trying to run the following syntax for all cases within the dataframe
"data"

d1 <- data[1,c("material")]
fileConn<-file("TESTI/d1.txt")
writeLines(d1, fileConn)
close(fileConn)

I am trying to use the for function:

 for (i in 1:nrow(data)){
  d[i] <- data[i,c("material")]
  fileConn<-file("TESTI/d[i].txt")
  writeLines(d[i], fileConn)
  close(fileConn)
}

but I get the error:

Object "d" not found

Any suggestion on how I can solve the above?

Thanks,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R and Supervised learning

2017-10-02 Thread Luca Meyer
Hi Bert,

Thank you for your useful suggestions I will follow them and come back to
this list with any specific R code issue I might have.

Kind regards,

Luca

2017-10-02 16:57 GMT+02:00 Bert Gunter :

> Luca:
>
> 1. We are not a consulting service. We *help* with R pogramming issues.
> Users are typically expected to make an effort by providing R code and, if
> appropriate, small data sets that illustrate their difficulties.
>
> 2. SEARCH! e.g. on "text processing R" or some such; or try Rseek.org with
> such searches. R has extensive text processing capabilities, e.g. via
> regex's.
>
> 3. "Supervised Learning algorithm" is far too vague to be useful.
>
> 4. See this CRAN task view:
> https://cran.r-project.org/web/views/MachineLearning.html
>
> 4. The answer to your query is almost certainly yes, but you may have to
> do some reading to clarify your thinking. As this involves primarily
> statistical issues, you may wish to post on a statistical site like
> http://stats.stackexchange.com/  to get advice. R-help site helps with R
> programming primarily, not statistical methodology (although they do
> sometimes intersect).
>
> Cheers,
> Bert
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R and Supervised learning

2017-10-02 Thread Luca Meyer
Hi,

I am currently find myself selecting manually amoungts several hundreds
Google Alerts (GA) texts those that are indeed relevant for my research vs
those which are not (despite they are triggered by some relevant seach
keywords).

Basically each week I get several hundreds GA email such as:

https://www.dropbox.com/s/u7rp0ez1tamq001/Alerte%20Google%C2%A0-%20laitier%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0

and

https://www.dropbox.com/s/1ubx5enw6tc90hj/Google%20Alert%20-%20latte%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0

>From such emails I create a file such as:

https://www.dropbox.com/s/y5yqcsxp1zcmnhc/test_sample.xlsx?dl=0

And this is really becoming a time consuming procedure, hence my decision
to try appling artificial intelligence solutions to such a case.

What I would really need are 2 separate steps:

(1) A procedure that reads the GA email and creates a file such as the
excel I have shared here (only first 3 columns)

(2) Some sort of supervised learning algorithm that can learn by example
from my choices and decide on my behalf (see column 4 in the attached
file). That is: taking the output from step (1) above I can classify a few
hundreds cases and then let the algorithm learn and classify
future/additional data. I plan to regularly review such a classification,
correct missclassifications and train the algorithm again with the
objective to improve its ability to correctly classify the GA texts.

Is my explanation clear enought? Can all the above be done within R? If so,
is there any package/procedure I should be using?

Thank you in advance for any suggestion you might have.

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I best create a R procedure from a R file?

2017-02-08 Thread Luca Meyer
Hi,

I am working on the following file:

> str(elencositi)
'data.frame':641 obs. of  2 variables:
 $ indirizzo.sito: chr  "10ahora.com.ar" "abceconomia.co" "accmag.com" "
actu.orange.fr" ...
 $ nome.sito : chr  "10ahora" "ABC economia" "Acc Magazine" "Orange
Actu" ...

> head(elencositi)
  indirizzo.sitonome.sito
1 10ahora.com.ar  10ahora
2 abceconomia.co ABC economia
3 accmag.com Acc Magazine
4 actu.orange.fr  Orange Actu
5   affaires.lapresse.caLa Presse
6 agipapress.blogspot.it   Agigapress

Which is regularly updated and I consequently need to update a procedure
that takes elencositi data to update dati$FONTE as indicated below:

dati$FONTE <- ifelse(dati$FONTE=='10ahora.com.ar','10ahora',dati$FONTE)
dati$FONTE <- ifelse(dati$FONTE=='abceconomia.co','ABC economia',dati$FONTE)
dati$FONTE <- ifelse(dati$FONTE=='accmag.com','Acc Magazine',dati$FONTE)

Currently I am using a time consuming procedure involving Excel to update
that, but how can I make that automatic?

Thank you in advance,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Script/function/procedure with loop

2016-07-11 Thread Luca Meyer
Thanks Sarah,

The code works just fine.

Luca

2016-07-11 22:43 GMT+02:00 Sarah Goslee :

> Taking your question at face value, except for the factors in your
> original data frame, you can output anything you'd like to text
> onscreen using cat(). Output can also be saved to text files with
> sink() or using batch files, etc and so forth.
>
>
>
> date <-
> c("07-jul-16","07-jul-16","07-jul-16","08-jul-16","08-jul-16","08-jul-16","09-jul-16","09-jul-16")
> varA <- c("text A1","text A2","text A3","text A4","text A5","text
> A6","text A7","text A8")
> varB <- c("link B1","link B2","link B3","link B4","link B5","link
> B6","link B7","link B8")
> mydf <- data.frame(date, varA, varB, stringsAsFactors=FALSE)
>
> for(i in sort(unique(mydf$date))) {
>   thisdate <- subset(mydf, date==i)
>   cat(i, "\n\n")
>   for(j in seq_len(nrow(thisdate))) {
> cat(thisdate$varA[j], "\n")
> cat(thisdate$varB[j], "\n\n")
>  }
> }
>
> This code prints to screen:
>
> 07-jul-16
>
> text A1
> link B1
>
> text A2
> link B2
>
> text A3
> link B3
>
> 08-jul-16
>
> text A4
> link B4
>
> text A5
> link B5
>
> text A6
> link B6
>
> 09-jul-16
>
> text A7
> link B7
>
> text A8
> link B8
>
>
> On Mon, Jul 11, 2016 at 4:24 PM, Luca Meyer  wrote:
> > Can anyone point me to an R script/function/procedure which, starting
> from
> > the following sample data
> >
> > #sample data
> > #NB: nrow(df) is variable
> >
> > date =
> >
> c("07-jul-16","07-jul-16","07-jul-16","08-jul-16","08-jul-16","08-jul-16","09-jul-16","09-jul-16")
> > varA = c("text A1","text A2","text A3","text A4","text A5","text
> A6","text
> > A7","text A8")
> > varB = c("link B1","link B2","link B3","link B4","link B5","link
> B6","link
> > B7","link B8")
> > df = data.frame(date, varA, varB)
> >
> > allows me to obtain a text output such as:
> >
> >> 07-jul-16
> >
> > text A1
> > link B1
> >
> > text A2
> > link B2
> >
> > text A3
> > link B3
> >
> >> 08-jul-16
> >
> > text A4
> > link B4
> >
> > text A5
> > link B5
> >
> > text A6
> > link B6
> >
> >> 09-jul-16
> >
> > text A7
> > link B7
> >
> > text A8
> > link B8
> >
> > etc...
> >
> > Thanks,
> >
> > Luca
> >
> > [[alternative HTML version deleted]]
>
> Please post in plain text.
>
> --
> Sarah Goslee
> http://www.functionaldiversity.org
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Script/function/procedure with loop

2016-07-11 Thread Luca Meyer
Can anyone point me to an R script/function/procedure which, starting from
the following sample data

#sample data
#NB: nrow(df) is variable

date =
c("07-jul-16","07-jul-16","07-jul-16","08-jul-16","08-jul-16","08-jul-16","09-jul-16","09-jul-16")
varA = c("text A1","text A2","text A3","text A4","text A5","text A6","text
A7","text A8")
varB = c("link B1","link B2","link B3","link B4","link B5","link B6","link
B7","link B8")
df = data.frame(date, varA, varB)

allows me to obtain a text output such as:

> 07-jul-16

text A1
link B1

text A2
link B2

text A3
link B3

> 08-jul-16

text A4
link B4

text A5
link B5

text A6
link B6

> 09-jul-16

text A7
link B7

text A8
link B8

etc...

Thanks,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assistance with httr package with R version 3.3.0

2016-05-10 Thread Luca Meyer
Hi Jim,

Thank you for your suggestion. I have actually tried to upload XML and xml2
but nothing changed...any other suggestion?

Kind regards,

Luca

> rm(list=ls())
> library(httr)
> library(XML)
> library(xml2)
>
> #carico i dati da Google spreadsheets
> url <- "
https://docs.google.com/spreadsheets/d/102-jJ7x1YfIe4Kkvb9olQ4chQ_TS90jxoU0vAbFZewc/pubhtml?gid=0&single=true
"
> readSpreadsheet <- function(url, sheet = 1){
+   r <- GET(url)
+   html <- content(r)
+   sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE)
+   df <- sheets[[sheet]]
+   dfClean <- function(df){
+ nms <- t(df[1,])
+ names(df) <- nms
+ df <- df[-1,-1]
+ row.names(df) <- seq(1,nrow(df))
+ df
+   }
+   dfClean(df)
+ }
> dati <- readSpreadsheet(url)
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘readHTMLTable’ for
signature ‘"xml_document"’
> rm(readSpreadsheet,url)

2016-05-10 8:52 GMT+02:00 Jim Lemon :

> Hi Luca,
> The function readHTMLtable is in the XML package, not httr. Perhaps
> that is the problem as I don't see a dependency in httr for XML
> (although xml2 is suggested).
>
> Jim
>
>
> On Tue, May 10, 2016 at 2:58 PM, Luca Meyer  wrote:
> > Hello,
> >
> > I am trying to run a code I have been using for a few years now after
> > downloading the new R version 3.3.0 and I get the following error:
> >
> >> rm(list=ls())
> >> library(httr)
> >>
> >> #carico i dati da Google spreadsheets
> >> url <- "
> >
> https://docs.google.com/spreadsheets/d/102-jJ7x1YfIe4Kkvb9olQ4chQ_TS90jxoU0vAbFZewc/pubhtml?gid=0&single=true
> > "
> >> readSpreadsheet <- function(url, sheet = 1){
> > +   r <- GET(url)
> > +   html <- content(r)
> > +   sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE)
> > +   df <- sheets[[sheet]]
> > +   dfClean <- function(df){
> > + nms <- t(df[1,])
> > + names(df) <- nms
> > + df <- df[-1,-1]
> > + row.names(df) <- seq(1,nrow(df))
> > + df
> > +   }
> > +   dfClean(df)
> > + }
> >> dati <- readSpreadsheet(url)
> > Error in (function (classes, fdef, mtable)  :
> >   unable to find an inherited method for function ‘readHTMLTable’ for
> > signature ‘"xml_document"’
> >> rm(readSpreadsheet,url)
> >
> > Can anyone suggest a solution to it?
> >
> > Thanks,
> >
> > Luca
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Assistance with httr package with R version 3.3.0

2016-05-09 Thread Luca Meyer
Hello,

I am trying to run a code I have been using for a few years now after
downloading the new R version 3.3.0 and I get the following error:

> rm(list=ls())
> library(httr)
>
> #carico i dati da Google spreadsheets
> url <- "
https://docs.google.com/spreadsheets/d/102-jJ7x1YfIe4Kkvb9olQ4chQ_TS90jxoU0vAbFZewc/pubhtml?gid=0&single=true
"
> readSpreadsheet <- function(url, sheet = 1){
+   r <- GET(url)
+   html <- content(r)
+   sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE)
+   df <- sheets[[sheet]]
+   dfClean <- function(df){
+ nms <- t(df[1,])
+ names(df) <- nms
+ df <- df[-1,-1]
+ row.names(df) <- seq(1,nrow(df))
+ df
+   }
+   dfClean(df)
+ }
> dati <- readSpreadsheet(url)
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘readHTMLTable’ for
signature ‘"xml_document"’
> rm(readSpreadsheet,url)

Can anyone suggest a solution to it?

Thanks,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] How to remove the grid around the plot(ca(...)) function?

2015-10-09 Thread Luca Meyer
That worked just fine.

Thanks Paul!

Luca

2015-10-09 0:11 GMT+02:00 Paul Murrell :

> Hi
>
> The plot.ca() function contains explicit calls to axis(), box(), and
> abline(), so, for example, ...
>
>  plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="", axes=FALSE)
>
> ... does not work.
>
> One option is draw-it-yourself (as suggested by David Carlson), another
> option is to copy the function source and write your own version that has
> those axis(), box(), and abline() calls removed (not recommended for a
> number of reasons), and another option is like this ...
>
> # Draw original plot
> plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="")
> # Generate 'grid' version of the plot
> library(gridGraphics)
> grid.echo()
> # What has been drawn?
> grid.ls()
> # Remove whichever bits you want
> grid.remove("axis", grep=TRUE, global=TRUE)
> grid.remove("box", grep=TRUE)
> grid.remove("abline", grep=TRUE, global=TRUE)
>
> Paul
>
> On 09/10/15 07:06, Luca Meyer wrote:
>
>> Hello R-experts,
>>
>> Could anyone suggest how I can remove the grid coming out of the
>> plot(ca(...)) function?
>>
>> For instance I have:
>>
>> library(ca)
>> v1 <- c(10,15,20,15,25)
>> v2 <- c(23,4,7,12,2)
>> v3 <- c(10,70,2,3,7)
>> d1 <- data.frame(v1,v2,v3)
>> rownames(d1) <- c("B1","B2","B3","B4","B5")
>> plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="")
>>
>> As you can I could remove the X and Y axis label, but basically I am
>> looking for a chart containing only the data points - with relative
>> inertia
>> represented by their size - and labels with no extra lines or number, any
>> clue on how I can do that?
>>
>> Thank you,
>>
>> Luca
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> --
> Dr Paul Murrell
> Department of Statistics
> The University of Auckland
> Private Bag 92019
> Auckland
> New Zealand
> 64 9 3737599 x85392
> p...@stat.auckland.ac.nz
> http://www.stat.auckland.ac.nz/~paul/
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to remove the grid around the plot(ca(...)) function?

2015-10-08 Thread Luca Meyer
Hello R-experts,

Could anyone suggest how I can remove the grid coming out of the
plot(ca(...)) function?

For instance I have:

library(ca)
v1 <- c(10,15,20,15,25)
v2 <- c(23,4,7,12,2)
v3 <- c(10,70,2,3,7)
d1 <- data.frame(v1,v2,v3)
rownames(d1) <- c("B1","B2","B3","B4","B5")
plot(ca(d1), mass = c(TRUE,FALSE), xlab="", ylab="")

As you can I could remove the X and Y axis label, but basically I am
looking for a chart containing only the data points - with relative inertia
represented by their size - and labels with no extra lines or number, any
clue on how I can do that?

Thank you,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Joining two datasets - recursive procedure?

2015-03-23 Thread Luca Meyer
Hi David, hello R-experts

Thank you for your input. I have tried the syntax you suggested but
unfortunately the marginal distributions v1xv2 change after the
manipulation. Please see below or
https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 for the
syntax.

> rm(list=ls())
>
> # this is usual (an extract of) the INPUT file I have:
> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
+ "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
+ "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
+ "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917,
1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872,
+ 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names
= c(2L,
+ 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))
>
> #first I order the file such that I have 6 distinct v1xv2 combinations
> f1 <- f1[order(f1$v1,f1$v2),]
>
> # then I compute (manually) the relative importance of each v1xv2
combination:
> tAA <-
(18.18530+1.42917)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=A & v2=A
> tAB <-
(3.43806+1.05786)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=A & v2=B
> tAC <-
(0.00273+0.00042)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=A & v2=C
> tBA <-
(2.37232+1.13430)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=B & v2=A
> tBB <-
(3.01835+0.92872)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=B & v2=B
> tBC <-
(0.0+0.0)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=B & v2=C
> # and just to make sure I have not made mistakes the following should be
equal to 1
> tAA+tAB+tAC+tBA+tBB+tBC
[1] 1
>
> # procedure suggested by David Winsemius
> lookarr <- array(NA,
dim=c(length(unique(f1$v1)),length(unique(f1$v2)),length(unique(f1$v3)) ) ,
dimnames=list( unique(f1$v1), unique(f1$v2), unique(f1$v3) ) )
> lookarr[] <- c(tAA,tAA,tAB,tAB,tAC,tAC,tBA,tBA,tBB,tBB,tBC,tBC)
> lookarr["A","B","C"]
[1] 0.1250369
> lookarr[ with(f1, cbind(v1, v2, v3)) ]
 [1] 6.213554e-01 1.110842e-01 1.424236e-01 1.250369e-01 9.978703e-05
0.00e+00 6.213554e-01 1.110842e-01 1.424236e-01 1.250369e-01
9.978703e-05
[12] 0.00e+00
> f1$v4mod <- f1$v4*lookarr[ with(f1, cbind(v1,v2,v3)) ]
>
> # i compare original vs modified marginal distributions
> aggregate(v4~v1*v2,f1,sum)
  v1 v2   v4
1  A  A 19.61447
2  B  A  3.50662
3  A  B  4.49592
4  B  B  3.94707
5  A  C  0.00315
6  B  C  0.0
> aggregate(v4mod~v1*v2,f1,sum)
  v1 v2v4mod
1  A  A 1.145829e+01
2  B  A 1.600057e+00
3  A  B 6.219326e-01
4  B  B 5.460087e-01
5  A  C 2.724186e-07
6  B  C 0.00e+00
> aggregate(v4~v3,f1,sum)
  v3   v4
1  B 27.01676
2  C  4.55047
> aggregate(v4mod~v3,f1,sum)
  v3  v4mod
1  B 13.6931347
2  C  0.5331569

Any suggestion on how this can be fixed? Remember, I am searching for a
solution where by aggregate(v4~v1*v2,f1,sum)==aggregate(v4~v1*v2,f1,sum)
while aggregate(v4~v3,f1,sum)!=aggregate(v4mod~v3,f1,sum) by specified
amounts (see my earlier example).

Thank you very much,

Luca


2015-03-22 22:11 GMT+01:00 David Winsemius :

>
> On Mar 22, 2015, at 1:12 PM, Luca Meyer wrote:
>
> > Hi Bert,
> >
> > Maybe I did not explain myself clearly enough. But let me show you with a
> > manual example that indeed what I would like to do is feasible.
> >
> > The following is also available for download from
> > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0
> >
> > rm(list=ls())
> >
> > This is usual (an extract of) the INPUT file I have:
> >
> > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
> > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
>

Re: [R] Joining two datasets - recursive procedure?

2015-03-23 Thread Luca Meyer
Dear All,

I think I have found a fix developing the draft syntax I have provided
yesterday, see below or
https://www.dropbox.com/s/pbz9dcgxu6ljj8x/sample_code_1.txt?dl=0.

Only desirable improvement is related to the block where I compute the
modified v4 (lines 46-60 in the attached file). Provided the real data are
of the dimension 8x13x13 (v1xv2xv3), is there anyway to write that block
sentence in an automated way? I recall some function that could do that but
I can't remenber which one...

Thanks to everybody and especially to Bert and David for trying to assist
me with this one. And apologizes for not being so clear upfront but I was
trying to figure it out myself too...

Kind regards,

Luca

===

rm(list=ls())

# this is usual (an extract of) the INPUT file I have:
f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
"B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
"B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917,
1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872,
0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names =
c(2L,
9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))

#first I order the file such that I have 6 distinct v1xv2 combinations
f1 <- f1[order(f1$v1,f1$v2),]

#I compute the relative importance of each v1xv2 automatically
t1 <- aggregate(v4~1,f1,sum)
tXX <- aggregate(v4~v1*v2,f1,sum)
tAA <- as.numeric(tXX$v4[tXX$v1=="A"&tXX$v2=="A"]/t1)
tAB <- as.numeric(tXX$v4[tXX$v1=="A"&tXX$v2=="B"]/t1)
tAC <- as.numeric(tXX$v4[tXX$v1=="A"&tXX$v2=="C"]/t1)
tBA <- as.numeric(tXX$v4[tXX$v1=="B"&tXX$v2=="A"]/t1)
tBB <- as.numeric(tXX$v4[tXX$v1=="B"&tXX$v2=="B"]/t1)
tBC <- as.numeric(tXX$v4[tXX$v1=="B"&tXX$v2=="C"]/t1)
tAA+tAB+tAC+tBA+tBB+tBC
rm(t1)

# Next, I compute the difference I need to compute for each C category
(t1 <- aggregate(v4~v3,f1,sum)) # this is the actual distribution
(t2 <- structure(list(v3 = c("B", "C"), v4 = c(29, 2.56723)), .Names =
c("v3",
"v4"), row.names = c(NA, -2L), class = "data.frame")) # this is the target
distribution

# I verify t1 & t2 total is the same
aggregate(v4~1,t1,sum)
aggregate(v4~1,t2,sum)

# I determine the value to be added/subtracted to each instance of v3
t1 <- merge(t1,t2,by="v3")
t1$dif <- t1$v4.y-t1$v4.x
t1 <- t1[,c("v3","dif")]
t1

# I merge the t1 file with the f1
f1 <- merge (f1,t1,by="v3")
f1
rm(t1,t2)

# I compute the modified v4 value
f1$v4mod <- f1$v4
f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="B",
f1$v4+(tAA*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="C",
f1$v4+(tAA*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="B",
f1$v4+(tAB*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="C",
f1$v4+(tAB*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="B",
f1$v4+(tAC*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="C",
f1$v4+(tAC*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="B",
f1$v4+(tBA*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="C",
f1$v4+(tBA*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="B",
f1$v4+(tBB*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="C",
f1$v4+(tBB*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="B",
f1$v4+(tBC*f1$dif), f1$v4mod)
f1$v4mod <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="C",
f1$v4+(tBC*f1$dif), f1$v4mod)
f1

# i compare original vs modified marginal distributions
aggregate(v4~v1*v2,f1,sum)
aggregate(v4mod~v1*v2,f1,sum)
aggregate(v4~v3,f1,sum)
aggregate(v4mod~v3,f1,sum)
aggregate(v4~1,f1,sum)
aggregate(v4mod~1,f1,sum)

rm(list=ls())



2015-0

Re: [R] Fwd: Joining two datasets - recursive procedure?

2015-03-22 Thread Luca Meyer
Hi Bert,

Maybe I did not explain myself clearly enough. But let me show you with a
manual example that indeed what I would like to do is feasible.

The following is also available for download from
https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0

rm(list=ls())

This is usual (an extract of) the INPUT file I have:

f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
"B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
"B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917,
1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872,
0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names =
c(2L,
9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))

This are the initial marginal distributions

aggregate(v4~v1*v2,f1,sum)
aggregate(v4~v3,f1,sum)

First I order the file such that I have nicely listed 6 distinct v1xv2
combinations.

f1 <- f1[order(f1$v1,f1$v2),]

Then I compute (manually) the relative importance of each v1xv2 combination:

tAA <-
(18.18530+1.42917)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=A & v2=A
tAB <-
(3.43806+1.05786)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=A & v2=B
tAC <-
(0.00273+0.00042)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=A & v2=C
tBA <-
(2.37232+1.13430)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=B & v2=A
tBB <-
(3.01835+0.92872)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=B & v2=B
tBC <-
(0.0+0.0)/(18.18530+1.42917+3.43806+1.05786+0.00273+0.00042+2.37232+1.13430+3.01835+0.92872+0.0+0.0)
# this is for combination v1=B & v2=C
# and just to make sure I have not made mistakes the following should be
equal to 1
tAA+tAB+tAC+tBA+tBB+tBC

Next, I know I need to increase v4 any time v3=B and the total increase I
need to have over the whole dataset is 29-27.01676=1.98324. In turn, I need
to dimish v4 any time V3=C by the same amount (4.55047-2.56723=1.98324).
This aspect was perhaps not clear at first. I need to move v4 across v3
categories, but the totals will always remain unchanged.

Since I want the data alteration to be proportional to the v1xv2
combinations I do the following:

f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="B", f1$v4+(tAA*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="A" & f1$v3=="C", f1$v4-(tAA*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="B", f1$v4+(tAB*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="B" & f1$v3=="C", f1$v4-(tAB*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="B", f1$v4+(tAC*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="A" & f1$v2=="C" & f1$v3=="C", f1$v4-(tAC*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="B", f1$v4+(tBA*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="A" & f1$v3=="C", f1$v4-(tBA*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="B", f1$v4+(tBB*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="B" & f1$v3=="C", f1$v4-(tBB*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="B", f1$v4+(tBC*1.98324),
f1$v4)
f1$v4 <- ifelse (f1$v1=="B" & f1$v2=="C" & f1$v3=="C", f1$v4-(tBC*1.98324),
f1$v4)

This are the final marginal distributions:

aggregate(v4~v1*v2,f1,sum)
aggregate(v4~v3,f1,sum)

Can this procedure be made programmatic so that I can run it on the
(8x13x13) categories matrix? if so, how would you do it? I have really hard
time to do it with some (semi)automatic procedure.

Thank you very much indeed once more :)

Luca


2015-03-22 18:32 GMT+01:00 Bert Gunter :

> Nonsense. You are not telling us somet

[R] Fwd: Joining two datasets - recursive procedure?

2015-03-22 Thread Luca Meyer
Sorry forgot to keep the rest of the group in the loop - Luca
-- Forwarded message --
From: Luca Meyer 
Date: 2015-03-22 16:27 GMT+01:00
Subject: Re: [R] Joining two datasets - recursive procedure?
To: Bert Gunter 


Hi Bert,

That is exactly what I am trying to achieve. Please notice that negative v4
values are allowed. I have done a similar task in the past manually by
recursively alterating v4 distribution across v3 categories within fix each
v1&v2 combination so I am quite positive it can be achieved but honestly I
took me forever to do it manually and since this is likely to be an
exercise I need to repeat from time to time I wish I could learn how to do
it programmatically

Thanks again for any further suggestion you might have,

Luca


2015-03-22 16:05 GMT+01:00 Bert Gunter :

> Oh, wait a minute ...
>
> You still want the marginals for the other columns to be as originally?
>
> If so, then this is impossible in general as the sum of all the values
> must be what they were originally and you cannot therefore choose your
> values for V3 arbitrarily.
>
> Or at least, that seems to be what you are trying to do.
>
> -- Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> Clifford Stoll
>
>
>
>
> On Sun, Mar 22, 2015 at 7:55 AM, Bert Gunter  wrote:
> > I would have thought that this is straightforward given my previous
> email...
> >
> > Just set z to what you want -- e,g, all B values to 29/number of B's,
> > and all C values to 2.567/number of C's (etc. for more categories).
> >
> > A slick but sort of cheat way to do this programmatically -- in the
> > sense that it relies on the implementation of factor() rather than its
> > API -- is:
> >
> > y <- f1$v3  ## to simplify the notation; could be done using with()
> > z <- (c(29,2.567)/table(y))[c(y)]
> >
> > Then proceed to z1 as I previously described
> >
> > -- Bert
> >
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> > (650) 467-7374
> >
> > "Data is not information. Information is not knowledge. And knowledge
> > is certainly not wisdom."
> > Clifford Stoll
> >
> >
> >
> >
> > On Sun, Mar 22, 2015 at 2:00 AM, Luca Meyer  wrote:
> >> Hi Bert, hello R-experts,
> >>
> >> I am close to a solution but I still need one hint w.r.t. the following
> >> procedure (available also from
> >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0)
> >>
> >> rm(list=ls())
> >>
> >> # this is (an extract of) the INPUT file I have:
> >> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B",
> >> "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A",
> >> "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C",
> >> "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042,
> 2.37232,
> >> 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"),
> class
> >> = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L,
> 167L,
> >> 197L, 204L, 206L))
> >>
> >> # this is the procedure that Bert suggested (slightly adjusted):
> >> z <- rnorm(nrow(f1)) ## or anything you want
> >> z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5)
> >> aggregate(v4~v1*v2,f1,sum)
> >> aggregate(z1~v1*v2,f1,sum)
> >> aggregate(v4~v3,f1,sum)
> >> aggregate(z1~v3,f1,sum)
> >>
> >> My question to you is: how can I set z so that I can obtain specific
> values
> >> for z1-v4 in the v3 aggregation?
> >> In other words, how can I configure the procedure so that e.g. B=29 and
> >> C=2.56723 after running the procedure:
> >> aggregate(z1~v3,f1,sum)
> >>
> >> Thank you,
> >>
> >> Luca
> >>
> >> PS: to avoid any doubts you might have about who I am the following is
> my
> >> web page: http://lucameyer.wordpress.com/
> >>
> >>
> >> 2015-03-21 18:13 GMT+01:00 Bert Gunter :
> >

Re: [R] Joining two datasets - recursive procedure?

2015-03-22 Thread Luca Meyer
Hi Bert,

Thanks again for your assistance.

Unfortunately when I apply the additional code you suggest I get B=40.23326
& C=-8.66603 and not  B=29 & C=2.56723. Any idea why that might be
happening?

Please see below or on
https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 the code I
am running:

rm(list=ls())

# this is (an extract of) the INPUT file I have:
f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
"B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
"B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917,
1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872,
0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names =
c(2L,
9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))

# this is the procedure that Bert suggested (slightly adjusted):

y <- f1$v3  ## to simplify the notation; could be done using with()
z <- (c(29,2.567)/table(y))[c(y)]
# z <- rnorm(nrow(f1)) ## or anything you want
z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5)
aggregate(v4~v1*v2,f1,sum)
aggregate(z1~v1*v2,f1,sum)
aggregate(v4~v3,f1,sum)
aggregate(z1~v3,f1,sum)

Thanks again,

Luca


2015-03-22 15:55 GMT+01:00 Bert Gunter :

> I would have thought that this is straightforward given my previous
> email...
>
> Just set z to what you want -- e,g, all B values to 29/number of B's,
> and all C values to 2.567/number of C's (etc. for more categories).
>
> A slick but sort of cheat way to do this programmatically -- in the
> sense that it relies on the implementation of factor() rather than its
> API -- is:
>
> y <- f1$v3  ## to simplify the notation; could be done using with()
> z <- (c(29,2.567)/table(y))[c(y)]
>
> Then proceed to z1 as I previously described
>
> -- Bert
>
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> Clifford Stoll
>
>
>
>
> On Sun, Mar 22, 2015 at 2:00 AM, Luca Meyer  wrote:
> > Hi Bert, hello R-experts,
> >
> > I am close to a solution but I still need one hint w.r.t. the following
> > procedure (available also from
> > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0)
> >
> > rm(list=ls())
> >
> > # this is (an extract of) the INPUT file I have:
> > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B",
> > "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A",
> > "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C",
> > "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042,
> 2.37232,
> > 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"),
> class
> > = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L,
> 167L,
> > 197L, 204L, 206L))
> >
> > # this is the procedure that Bert suggested (slightly adjusted):
> > z <- rnorm(nrow(f1)) ## or anything you want
> > z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5)
> > aggregate(v4~v1*v2,f1,sum)
> > aggregate(z1~v1*v2,f1,sum)
> > aggregate(v4~v3,f1,sum)
> > aggregate(z1~v3,f1,sum)
> >
> > My question to you is: how can I set z so that I can obtain specific
> values
> > for z1-v4 in the v3 aggregation?
> > In other words, how can I configure the procedure so that e.g. B=29 and
> > C=2.56723 after running the procedure:
> > aggregate(z1~v3,f1,sum)
> >
> > Thank you,
> >
> > Luca
> >
> > PS: to avoid any doubts you might have about who I am the following is my
> > web page: http://lucameyer.wordpress.com/
> >
> >
> > 2015-03-21 18:13 GMT+01:00 Bert Gunter :
> >>
> >> ... or cleaner:
> >>
> >> z1 <- with(f1,v4 + z -

Re: [R] Joining two datasets - recursive procedure?

2015-03-22 Thread Luca Meyer
Hi Bert, hello R-experts,

I am close to a solution but I still need one hint w.r.t. the following
procedure (available also from
https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0)

rm(list=ls())

# this is (an extract of) the INPUT file I have:
f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A",
"B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C",
"C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232,
3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"),
class = "data.frame", row.names = c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L,
167L, 197L, 204L, 206L))

# this is the procedure that Bert suggested (slightly adjusted):
z <- rnorm(nrow(f1)) ## or anything you want
z1 <- round(with(f1,v4 + z -ave(z,v1,v2,FUN=mean)), digits=5)
aggregate(v4~v1*v2,f1,sum)
aggregate(z1~v1*v2,f1,sum)
aggregate(v4~v3,f1,sum)
aggregate(z1~v3,f1,sum)

My question to you is: how can I set z so that I can obtain specific values
for z1-v4 in the v3 aggregation?
In other words, how can I configure the procedure so that e.g. B=29 and
C=2.56723 after running the procedure:
aggregate(z1~v3,f1,sum)

Thank you,

Luca

PS: to avoid any doubts you might have about who I am the following is my
web page: http://lucameyer.wordpress.com/


2015-03-21 18:13 GMT+01:00 Bert Gunter :

> ... or cleaner:
>
> z1 <- with(f1,v4 + z -ave(z,v1,v2,FUN=mean))
>
>
> Just for curiosity, was this homework? (in which case I should
> probably have not provided you an answer -- that is, assuming that I
> HAVE provided an answer).
>
> Cheers,
> Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> Clifford Stoll
>
>
>
>
> On Sat, Mar 21, 2015 at 7:53 AM, Bert Gunter  wrote:
> > z <- rnorm(nrow(f1)) ## or anything you want
> > z1 <- f1$v4 + z - with(f1,ave(z,v1,v2,FUN=mean))
> >
> >
> > aggregate(v4~v1,f1,sum)
> > aggregate(z1~v1,f1,sum)
> > aggregate(v4~v2,f1,sum)
> > aggregate(z1~v2,f1,sum)
> > aggregate(v4~v3,f1,sum)
> > aggregate(z1~v3,f1,sum)
> >
> >
> > Cheers,
> > Bert
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> > (650) 467-7374
> >
> > "Data is not information. Information is not knowledge. And knowledge
> > is certainly not wisdom."
> > Clifford Stoll
> >
> >
> >
> >
> > On Sat, Mar 21, 2015 at 6:49 AM, Luca Meyer  wrote:
> >> Hi Bert,
> >>
> >> Thank you for your message. I am looking into ave() and tapply() as you
> >> suggested but at the same time I have prepared a example of input and
> output
> >> files, just in case you or someone else would like to make an attempt to
> >> generate a code that goes from input to output.
> >>
> >> Please see below or download it from
> >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0
> >>
> >> # this is (an extract of) the INPUT file I have:
> >> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
> >> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
> >> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
> >> "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273,
> 1.42917,
> >> 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872,
> >> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame",
> row.names =
> >> c(2L,
> >> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))
> >>
> >> # this is (an extract of) the OUTPUT file I would like to obtain:
> >> f2 <- structure(list(v1 = c("A", "A", "A", "A", "A

Re: [R] Joining two datasets - recursive procedure?

2015-03-21 Thread Luca Meyer
Hi Bert,

Thank you for your message. I am looking into ave() and tapply() as you
suggested but at the same time I have prepared a example of input and
output files, just in case you or someone else would like to make an
attempt to generate a code that goes from input to output.

Please see below or download it from
https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0

# this is (an extract of) the INPUT file I have:
f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
"B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
"B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917,
1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872,
0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names =
c(2L,
9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))

# this is (an extract of) the OUTPUT file I would like to obtain:
f2 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
"B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
"B", "B", "B", "C", "C", "C"), v4 = c(17.83529, 3.43806,0.00295, 1.77918,
1.05786, 0.0002, 2.37232, 3.01835, 0, 1.13430, 0.92872,
0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names =
c(2L,
9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))

# please notice that while the aggregated v4 on v3 has changed …
aggregate(f1[,c("v4")],list(f1$v3),sum)
aggregate(f2[,c("v4")],list(f2$v3),sum)

# … the aggregated v4 over v1xv2 has remained unchanged:
aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum)
aggregate(f2[,c("v4")],list(f2$v1,f2$v2),sum)

Thank you very much in advance for your assitance.

Luca

2015-03-21 13:18 GMT+01:00 Bert Gunter :

> 1. Still not sure what you mean, but maybe look at ?ave and ?tapply,
> for which ave() is a wrapper.
>
> 2. You still need to heed the rest of Jeff's advice.
>
> Cheers,
> Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> Clifford Stoll
>
>
>
>
> On Sat, Mar 21, 2015 at 4:53 AM, Luca Meyer  wrote:
> > Hi Jeff & other R-experts,
> >
> > Thank you for your note. I have tried myself to solve the issue without
> > success.
> >
> > Following your suggestion, I am providing a sample of the dataset I am
> > using below (also downloadble in plain text from
> > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0):
> >
> > #this is an extract of the overall dataset (n=1200 cases)
> > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
> > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
> > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
> > "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, 3.43806581506388,
> > 0.002733567617055, 1.42917483425029, 1.05786640463504,
> > 0.000420548864162308,
> > 2.37232740842861, 3.01835841813241, 0, 1.13430282139936,
> 0.928725667117666,
> > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names
> =
> > c(2L,
> > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))
> >
> > I need to find a automated procedure that allows me to adjust v3
> marginals
> > while maintaining v1xv2 marginals unchanged.
> >
> > That is: modify the v4 values you can find by running:
> >
> &g

Re: [R] Joining two datasets - recursive procedure?

2015-03-21 Thread Luca Meyer
Hi Jeff & other R-experts,

Thank you for your note. I have tried myself to solve the issue without
success.

Following your suggestion, I am providing a sample of the dataset I am
using below (also downloadble in plain text from
https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0):

#this is an extract of the overall dataset (n=1200 cases)
f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B",
"B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A",
"B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C",
"B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, 3.43806581506388,
0.002733567617055, 1.42917483425029, 1.05786640463504,
0.000420548864162308,
2.37232740842861, 3.01835841813241, 0, 1.13430282139936, 0.928725667117666,
0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names =
c(2L,
9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L))

I need to find a automated procedure that allows me to adjust v3 marginals
while maintaining v1xv2 marginals unchanged.

That is: modify the v4 values you can find by running:

aggregate(f1[,c("v4")],list(f1$v3),sum)

while maintaining costant the values you can find by running:

aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum)

Now does it make sense?

Please notice I have tried to build some syntax that tries to modify values
within each v1xv2 combination by computing sum of v4, row percentage in
terms of v4, and there is where my effort is blocked. Not really sure how I
should proceed. Any suggestion?

Thanks,

Luca


2015-03-19 2:38 GMT+01:00 Jeff Newmiller :

> I don't understand your description. The standard practice on this list is
> to provide a reproducible R example [1] of the kind of data you are working
> with (and any code you have tried) to go along with your description. In
> this case, that would be two dputs of your input data frames and a dput of
> an output data frame (generated by hand from your input data frame).
> (Probably best to not use the full number of input values just to keep the
> size down.) We could then make an attempt to generate code that goes from
> input to output.
>
> Of course, if you post that hard work using HTML then it will get
> corrupted (much like the text below from your earlier emails) and we won't
> be able to use it. Please learn to post from your email software using
> plain text when corresponding with this mailing list.
>
> [1]
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> ---
> Jeff Newmiller    The .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> ---
> Sent from my phone. Please excuse my brevity.
>
> On March 18, 2015 9:05:37 AM PDT, Luca Meyer  wrote:
> >Thanks for you input Michael,
> >
> >The continuous variable I have measures quantities (down to the 3rd
> >decimal level) so unfortunately are not frequencies.
> >
> >Any more specific suggestions on how that could be tackled?
> >
> >Thanks & kind regards,
> >
> >Luca
> >
> >
> >===
> >
> >Michael Friendly wrote:
> >I'm not sure I understand completely what you want to do, but
> >if the data were frequencies, it sounds like task for fitting a
> >loglinear model with the model formula
> >
> >~ V1*V2 + V3
> >
> >On 3/18/2015 2:17 AM, Luca Meyer wrote:
> >>* Hello,
> >*>>* I am facing a quite challenging task (at least to me) and I was
> >wondering
> >*>* if someone could advise how R could assist me to speed the task up.
> >*>>* I am dealing with a dataset with 3 discrete variables and one
> >continuous
> >*>* variable. The discrete variables are:
> >*>>* V1: 8 modalities
> >*>* V2: 13 modalities
> >*>* V3: 13 modalities
> >*>>* The continuous variable V4 is a decimal number always greater than
> >zero in
> >*>* the marginals of each of the 3 variables but it is 

[R] Joining two datasets - recursive procedure?

2015-03-18 Thread Luca Meyer
Thanks for you input Michael,

The continuous variable I have measures quantities (down to the 3rd
decimal level) so unfortunately are not frequencies.

Any more specific suggestions on how that could be tackled?

Thanks & kind regards,

Luca


===

Michael Friendly wrote:
I'm not sure I understand completely what you want to do, but
if the data were frequencies, it sounds like task for fitting a
loglinear model with the model formula

~ V1*V2 + V3

On 3/18/2015 2:17 AM, Luca Meyer wrote:
>* Hello,
*>>* I am facing a quite challenging task (at least to me) and I was wondering
*>* if someone could advise how R could assist me to speed the task up.
*>>* I am dealing with a dataset with 3 discrete variables and one continuous
*>* variable. The discrete variables are:
*>>* V1: 8 modalities
*>* V2: 13 modalities
*>* V3: 13 modalities
*>>* The continuous variable V4 is a decimal number always greater than zero in
*>* the marginals of each of the 3 variables but it is sometimes equal to zero
*>* (and sometimes negative) in the joint tables.
*>>* I have got 2 files:
*>>* => one with distribution of all possible combinations of V1xV2 (some of
*>* which are zero or neagtive) and
*>* => one with the marginal distribution of V3.
*>>* I am trying to build the long and narrow dataset V1xV2xV3 in such a way
*>* that each V1xV2 cell does not get modified and V3 fits as closely as
*>* possible to its marginal distribution. Does it make sense?
*>>* To be even more specific, my 2 input files look like the following.
*>>* FILE 1
*>* V1,V2,V4
*>* A, A, 24.251
*>* A, B, 1.065
*>* (...)
*>* B, C, 0.294
*>* B, D, 2.731
*>* (...)
*>* H, L, 0.345
*>* H, M, 0.000
*>>* FILE 2
*>* V3, V4
*>* A, 1.575
*>* B, 4.294
*>* C, 10.044
*>* (...)
*>* L, 5.123
*>* M, 3.334
*>>* What I need to achieve is a file such as the following
*>>* FILE 3
*>* V1, V2, V3, V4
*>* A, A, A, ???
*>* A, A, B, ???
*>* (...)
*>* D, D, E, ???
*>* D, D, F, ???
*>* (...)
*>* H, M, L, ???
*>* H, M, M, ???
*>>* Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I
*>* recover exactly FILE 1 and that if I aggregate on V3 I can recover a file
*>* as close as possible to FILE 3 (ideally the same file).
*>>* Can anyone suggest how I could do that with R?
*>>* Thank you very much indeed for any assistance you are able to provide.
*>>* Kind regards,
*>>* Luca*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Joining two datasets - recursive procedure?

2015-03-17 Thread Luca Meyer
Hello,

I am facing a quite challenging task (at least to me) and I was wondering
if someone could advise how R could assist me to speed the task up.

I am dealing with a dataset with 3 discrete variables and one continuous
variable. The discrete variables are:

V1: 8 modalities
V2: 13 modalities
V3: 13 modalities

The continuous variable V4 is a decimal number always greater than zero in
the marginals of each of the 3 variables but it is sometimes equal to zero
(and sometimes negative) in the joint tables.

I have got 2 files:

=> one with distribution of all possible combinations of V1xV2 (some of
which are zero or neagtive) and
=> one with the marginal distribution of V3.

I am trying to build the long and narrow dataset V1xV2xV3 in such a way
that each V1xV2 cell does not get modified and V3 fits as closely as
possible to its marginal distribution. Does it make sense?

To be even more specific, my 2 input files look like the following.

FILE 1
V1,V2,V4
A, A, 24.251
A, B, 1.065
(...)
B, C, 0.294
B, D, 2.731
(...)
H, L, 0.345
H, M, 0.000

FILE 2
V3, V4
A, 1.575
B, 4.294
C, 10.044
(...)
L, 5.123
M, 3.334

What I need to achieve is a file such as the following

FILE 3
V1, V2, V3, V4
A, A, A, ???
A, A, B, ???
(...)
D, D, E, ???
D, D, F, ???
(...)
H, M, L, ???
H, M, M, ???

Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I
recover exactly FILE 1 and that if I aggregate on V3 I can recover a file
as close as possible to FILE 3 (ideally the same file).

Can anyone suggest how I could do that with R?

Thank you very much indeed for any assistance you are able to provide.

Kind regards,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to verify char variables contain at least one value

2014-01-02 Thread Luca Meyer
Hi Jim,

Thank you, it works indeed :)

Luca


2014/1/2 Jim Lemon 

> On 01/02/2014 05:17 PM, Luca Meyer wrote:
>
>> Happy new year fellows,
>>
>> I am trying to do something I believe should be fairly straightforward but
>> I cannot find my way out.
>>
>> My dataset d2 is 26 rows by 245 columns, exclusively char variables. I
>> would like to check whether at least one column from V13 till V239 (they
>> are in numerical sequence) has been filled in, so I try
>>
>> d2$check<- c(d2$V13:d2$V239)
>>
>> and/or
>>
>> d2$check<- paste(d2$V13:d2$V239,sep="")
>>
>> but I get (translated from Italian):
>>
>> Error in d2$V13:d2$V239 : argument NA/NaN
>>
>> I have tried nchar but the same error occurs. I have also tried to run the
>> above functions on a smaller variable subset (V13, V14, V15, see below for
>> details) just to double check in case some variable would erroneously be
>> in
>> another format, but the same occur.
>>
>>  d2$V13
>>>
>>   [1] """"""
>> """""""da -5.1% a -10%"
>> ""
>>   [9] """"""
>> """"""""
>> ""
>> [17] """"""
>> """"""""
>> ""
>> [25] """"
>>
>>> d2$V14
>>>
>>   [1] "" "" ""
>> "" "" "" "da -10.1% a
>> -15%"
>> ""
>>   [9] "" "" ""
>> "" "" "" ""
>> ""
>> [17] "" "" ""
>> "" "" "" ""
>> ""
>> [25] "" ""
>>
>>> d2$V15
>>>
>>   [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
>> "" "" ""
>>
>> Can anyone suggest an alternative function for me to create a variable
>> that
>> checks whether there is at least one value for each of the 26 records I
>> need to analyze?
>>
>>  Hi Luca,
> Perhaps you are looking for something like this:
>
> d2check<-unlist(apply(as.matrix(d2[,paste("V",13:239,sep="")]),1,nchar))
> # to test for any non empty rows
> any(d2check)
>
> Jim
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to verify char variables contain at least one value

2014-01-01 Thread Luca Meyer
Happy new year fellows,

I am trying to do something I believe should be fairly straightforward but
I cannot find my way out.

My dataset d2 is 26 rows by 245 columns, exclusively char variables. I
would like to check whether at least one column from V13 till V239 (they
are in numerical sequence) has been filled in, so I try

d2$check <- c(d2$V13:d2$V239)

and/or

d2$check <- paste(d2$V13:d2$V239,sep="")

but I get (translated from Italian):

Error in d2$V13:d2$V239 : argument NA/NaN

I have tried nchar but the same error occurs. I have also tried to run the
above functions on a smaller variable subset (V13, V14, V15, see below for
details) just to double check in case some variable would erroneously be in
another format, but the same occur.

> d2$V13
 [1] """"""
"""""""da -5.1% a -10%"
""
 [9] """"""
""""""""
""
[17] """"""
""""""""
""
[25] """"
> d2$V14
 [1] "" "" ""
"" "" "" "da -10.1% a -15%"
""
 [9] "" "" ""
"" "" "" ""
""
[17] "" "" ""
"" "" "" ""
""
[25] "" ""
> d2$V15
 [1] "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ""
"" "" ""

Can anyone suggest an alternative function for me to create a variable that
checks whether there is at least one value for each of the 26 records I
need to analyze?

Thank you in advance,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sourcing from 2 different computers R code

2013-11-12 Thread Luca Meyer
Hi,

Thanks for the advise. I have solved by following one option on the drop
down menu which I did not see earlier and got:

source("c:\\Users\\...\\filename.R")

Thank you for the prompt reply,

Luca


2013/11/12 Pascal Oettli 

> Hello,
>
> What is the result when you use source("C:/Users/...R")?
>
> Regards,
> Pascal
>
>
> On 12 November 2013 15:13, Luca Meyer  wrote:
>
>> Hi,
>>
>> I have a piece of code sitting on a dropbox directory and haev installed R
>> 3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc.
>>
>> Now, when I use
>>
>> source("/Users/R")
>>
>> to call the script from the Mac no problems, but when I use
>>
>> source("C:\Users\...R")
>>
>> to call the script from the Sony Vaio I get the following:
>>
>> Error: '\U' used without hex digits in character string starting "'C:\U"
>>
>> What am I doing wrong?
>>
>> Thanks in advance,
>>
>> Luca
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Pascal Oettli
> Project Scientist
> JAMSTEC
> Yokohama, Japan
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sourcing from 2 different computers R code

2013-11-11 Thread Luca Meyer
Hi,

I have a piece of code sitting on a dropbox directory and haev installed R
3.0.2 on 2 machines: one MacBook Pro and one Sony Vaio pc.

Now, when I use

source("/Users/R")

to call the script from the Mac no problems, but when I use

source("C:\Users\...R")

to call the script from the Sony Vaio I get the following:

Error: '\U' used without hex digits in character string starting "'C:\U"

What am I doing wrong?

Thanks in advance,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Uploading Google Spreadsheet data into R

2013-11-08 Thread Luca Meyer
It does indeed.
Thank you David,
Luca


2013/11/8 David Carlson 

> Stripping down to the bare essentials seems to get it. In
> particular making the query just "select *" instead of "select *
> where B!=''" works. You don't need the processing that the more
> complicated Guardian web page requires. After loading the RCurl
> package and creating the gsqAPI function:
>
> >
> tmp=gsqAPI("0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE","selec
> t *", 0)
> > str(tmp)
> 'data.frame':   9 obs. of  3 variables:
>  $ COL1: chr  "25/10/2013" "25/10/2013" "31/10/2013"
> "31/10/2013" ...
>  $ COL2: int  50 10 16 18 25 34 56 47 50
>  $ COL3: chr  "TEXT" "TEXT TEXT" "TEXT" "TEXT" ...
> > tmp
> COL1 COL2  COL3
> 1 25/10/2013   50  TEXT
> 2 25/10/2013   10 TEXT TEXT
> 3 31/10/2013   16  TEXT
> 4 31/10/2013   18  TEXT
> 5 31/10/2013   25 TEXT TEXT
> 6 31/10/2013   34  TEXT
> 7 31/10/2013   56  TEXT
> 8 31/10/2013   47  TEXT
> 9 31/10/2013   50  TEXT
>
> -----
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -Original Message-
> From: r-help-boun...@r-project.org
> [mailto:r-help-boun...@r-project.org] On Behalf Of Luca Meyer
> Sent: Friday, November 8, 2013 1:33 AM
> To: r-help@r-project.org
> Subject: [R] Uploading Google Spreadsheet data into R
>
> Hello,
>
> I am trying to upload data I have on a Google Spreadsheet within
> R to
> perform some analysis. I regularly update such data and need to
> perform
> data analysis in the quickiest possible way - i.e. without need
> to publish
> the data, so I was wondering how to make work this piece of code
> (source
> http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets
> -from-a-google-spreadsheet-guardian-2014-university-guide-data/)
> with my dataset (see
> https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGh
> UdmNJZ0dOdGJLTVRjTkRhTkE#gid=0
> ):
>
> library(RCurl)
> gsqAPI = function(key,query,gid=0){
>   tmp=getURL( paste(
> sep="",'https://spreadsheets.google.com/tq?',
> 'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=',
> gid),
> ssl.verifypeer = FALSE )
>   return( read.csv( textConnection( tmp ),  stringsAsFactors=F )
> )
> }
> handler=function(key,i){
>   tmp=gsqAPI(key,"select * where B!=''", i)
>   subject=sub(".Rank",'',colnames(tmp)[1])
>   colnames(tmp)[1]="Subject.Rank"
>   tmp$subject=subject
>   tmp
> }
> key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE'
> gdata=handler(key,0)
>
> The code is currently returning  the following:
>
> Error in `$<-.data.frame`(`*tmp*`, "subject", value = "COL1") :
>   replacement has 1 row, data has 0
>
> Thank you in advance,
> Luca
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible
> code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Uploading Google Spreadsheet data into R

2013-11-07 Thread Luca Meyer
Hello,

I am trying to upload data I have on a Google Spreadsheet within R to
perform some analysis. I regularly update such data and need to perform
data analysis in the quickiest possible way - i.e. without need to publish
the data, so I was wondering how to make work this piece of code (source
http://www.r-bloggers.com/datagrabbing-commonly-formatted-sheets-from-a-google-spreadsheet-guardian-2014-university-guide-data/)
with my dataset (see
https://docs.google.com/spreadsheet/ccc?key=0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE#gid=0
):

library(RCurl)
gsqAPI = function(key,query,gid=0){
  tmp=getURL( paste( sep="",'https://spreadsheets.google.com/tq?',
'tqx=out:csv','&tq=', curlEscape(query), '&key=', key, '&gid=', gid),
ssl.verifypeer = FALSE )
  return( read.csv( textConnection( tmp ),  stringsAsFactors=F ) )
}
handler=function(key,i){
  tmp=gsqAPI(key,"select * where B!=''", i)
  subject=sub(".Rank",'',colnames(tmp)[1])
  colnames(tmp)[1]="Subject.Rank"
  tmp$subject=subject
  tmp
}
key='0AkvLBhzbLcz5dHljNGhUdmNJZ0dOdGJLTVRjTkRhTkE'
gdata=handler(key,0)

The code is currently returning  the following:

Error in `$<-.data.frame`(`*tmp*`, "subject", value = "COL1") :
  replacement has 1 row, data has 0

Thank you in advance,
Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading .gsheet within R

2012-12-01 Thread Luca Meyer
Thank you Henrik & the others that have commented. Accessing the actual online 
data is what I would need, but apparently this is not yet feasible…
Luca


Il giorno 01/dic/2012, alle ore 04:53, Henrik Bengtsson  
ha scritto:

> On Fri, Nov 30, 2012 at 9:43 AM, Luca Meyer  wrote:
>> Hello R-experts,
>> 
>> I would like to know if there is a solution to read files with extension 
>> .gsheet directly into R - see http://www.fileinfo.com/extension/gsheet for 
>> more info on this file format.
> 
> AFAIK, those files (*.gsheet, *.gdoc, *.gslides) are just tiny JSON
> files containing references to the online/cloud resource (specifying
> the "url" and the "resource_id").  There are several packages on CRAN
> for parsing JSON files.  Accessing the actual online data is a
> different story...
> 
> My $0.02
> 
> /Henrik
> 
>> 
>> Thank you,
>> Luca
>> 
>> Mr. Luca Meyer
>> www.lucameyer.com
>> R 2.15.1
>> Mac OS X 10.8.2
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading .gsheet within R

2012-11-30 Thread Luca Meyer
Hello R-experts,

I would like to know if there is a solution to read files with extension 
.gsheet directly into R - see http://www.fileinfo.com/extension/gsheet for more 
info on this file format.

Thank you,
Luca

Mr. Luca Meyer
www.lucameyer.com
R 2.15.1
Mac OS X 10.8.2







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Leading plus in numeric fields

2012-08-29 Thread Luca Meyer
Hello R experts,

I have go this data frame:

'data.frame':   1 obs. of  20 variables:
 $ Anno  : chr "PREVISIONI VS TARGET"
 $ OreTot: num 41
 $ GioTot: logi NA
 $ OrGTot: logi NA
 $ OreCli: num 99
 $ GioCli: logi NA
 $ OrGCli: logi NA
 $ OreFor: num -27
 $ GioFor: logi NA
 $ OrGFor: logi NA
 $ OreOrt: num -18
 $ GioOrt: logi NA
 $ OrGOrt: logi NA
 $ OreSpo: num -6
 $ GioSpo: logi NA
 $ OrGSpo: logi NA
 $ OreUff: num -7
 $ GioUff: logi NA
 $ OrGUff: logi NA
 $ temp  : num 0

Is there any way I can format the numeric fields so that I get a leading "+" 
whenever the value is > 0? In the specific case I would need something like:

'data.frame':   1 obs. of  20 variables:
 $ Anno  : chr "PREVISIONI VS TARGET"
 $ OreTot: num +41
 $ GioTot: logi NA
 $ OrGTot: logi NA
 $ OreCli: num +99
 $ GioCli: logi NA
 $ OrGCli: logi NA
 $ OreFor: num -27
 $ GioFor: logi NA
 $ OrGFor: logi NA
 $ OreOrt: num -18
 $ GioOrt: logi NA
 $ OrGOrt: logi NA
 $ OreSpo: num -6
 $ GioSpo: logi NA
 $ OrGSpo: logi NA
 $ OreUff: num -7
 $ GioUff: logi NA
 $ OrGUff: logi NA
 $ temp  : num 0

Thank you in advance,

Luca

Mr. Luca Meyer
www.lucameyer.com
R version 2.15.1
Mac OS X 10.8







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regexpr with accents

2012-08-06 Thread Luca Meyer
Thanks Arun,

It works all right, I just found out that my problem was not with accents but 
with the correct spelling of "some text".

Kind regards,

Luca

Il giorno 06/ago/2012, alle ore 15.01, arun ha scritto:

> 
> 
> Hi,
> 
> Here, the string with in the quotes are read exactly like that.  So, you may 
> have to use the symbol instead of "friendly" or "numeric" from the link.  Or 
> you have to convert those.
> 
> d1 <- data.frame(V1 = 1:4,
> V2 = c("some text = 9", "some tèxt = 9", "some tèxt = 9", "some 
> tèxt = 9"))
> 
> d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9
>  d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9
> d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9
> 
> d1
>   V1      V2
> 1  1   some text = 9
> 2  9 some tèxt = 9
> 3  9   some tèxt = 9
> 4  9   some tèxt = 9
> 
> A.K.
> 
> 
> - Original Message -
> From: Luca Meyer 
> To: r-help@r-project.org
> Cc: 
> Sent: Monday, August 6, 2012 8:25 AM
> Subject: [R]  regexpr with accents
> 
> Sorry but my previous email did not go through properly. Instead of the ? you 
> should really read an è or è according to 
> http://www.lookuptables.com/.
> 
> So there are extended ASCII characters I need to deal with.
> 
> I have tried
> 
> d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9
> and 
> 
> d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9
> 
> without success...
> 
> Thanks,
> Luca
> 
> 
> 
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regexpr with accents

2012-08-06 Thread Luca Meyer
Sorry but my previous email did not go through properly. Instead of the ? you 
should really read an è or è according to 
http://www.lookuptables.com/.

So there are extended ASCII characters I need to deal with.

I have tried

d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9
and 

d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9

without success...

Thanks,
Luca




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] regexpr with accents

2012-08-05 Thread Luca Meyer
Hello,

I have build a syntax to find out if a given substring is included in a larger 
string that works like this:

d1$V1[regexpr("some text = 9",d1$V2)>0] <- 9

and this works all right till "some text" contains standard ASCII set. However, 
it does not work when accents are included as the following:

d1$V1[regexpr("some tèxt = 9",d1$V2)>0] <- 9

I have tried to substitute "è" with several wildcards but it did not work, can 
anyone suggest how to have the syntax parse the string ignoring the accent?

Thank you in advance,

Luca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple lines for each record: how do I handle that

2012-02-22 Thread Luca Meyer
Hi Jorge,

The method you suggest is indeed working fine on the small sample data set. 
When I apply to a larger dataset (714 rows by 160 columns) it transforms some 
variables from "factor" to "list", how can I change it back to their original 
class in an automatic way?

Thanks,
Luca

Il giorno 22/feb/2012, alle ore 21.05, Jorge I Velez ha scritto:

> Hi Luca,
> 
> Thank you for the example.  Here is one way of doing what you want (of course 
> there are many of them!):
> 
> # data
> d0 <- structure(list(id = c(1, 1, 2, 2, 2, 3), v1 = c(NA, 1, NA, 1, 
> NA, 1), v2 = structure(c(3L, 1L, 2L, 1L, 1L, 3L), .Label = c("", 
> "no", "yes"), class = "factor"), v3 = structure(c(NA, 1L, NA, 
> NA, 3L, 2L), .Label = c("1", "2", "3"), class = "factor")), .Names = c("id", 
> "v1", "v2", "v3"), row.names = c(NA, -6L), class = "data.frame")
> 
> # processing
> out <- lapply(split(d0, d0$id), function(l) apply(l[,-1], 2, function(x) 
> x[!is.na(x) & x != ""]))
> out <- data.frame(do.call(rbind, out))
> 
> # output
> cbind(id = unique(d0$id), out)
> 
> Perhaps plyr would be a better way ;-)
> 
> HTH,
> Jorge.-
> 
> 
> On Wed, Feb 22, 2012 at 2:49 PM, Luca Meyer <> wrote:
> Sure, I am sorry I have not done that in the first place.
> 
> The datasets I have looks like:
> 
> id <- c(1,1,2,2,2,3)
> v1 <- c(NA,1,NA,1,NA,1)
> v2 <- as.character(c("yes","","no","","","yes"))
> v3 <- as.factor(c(NA,1,NA,NA,3,2))
> d0 <- data.frame(id,v1,v2,v3)
> d0
> 
> What I would need is to derive a dataset that looks like:
> 
> id <- c(1,2,3)
> v1 <- c(1,1,1)
> v2 <- as.character(c("yes","no","yes"))
> v3 <- as.factor(c(1,3,2))
> d1 <- data.frame(id,v1,v2,v3)
> d1
> 
> The issue is related to the need to have an automated procedure that reads in 
> the different variable types and aggregates them accordingly as every dataset 
> will be different from the previous in terms of number of variables and 
> records involved.
> 
> Thank you,
> Luca
> 
> Il giorno 22/feb/2012, alle ore 20.26, Sarah Goslee ha scritto:
> 
> > If you provide a small reproducible example of your data format and
> > expected output, I'm sure someone here can offer a useful solution.
> >
> > Without knowing what your data look like, not so easy.
> >
> > Sarah
> >
> > On Wed, Feb 22, 2012 at 2:22 PM, Luca Meyer <> wrote:
> >> Hi Folks,
> >>
> >> I just discovered that my dataset (coming from QuestionPro platform) has 
> >> got multiple lines for each respondent id, but what I would really need is 
> >> a "regular" data matrix where each respondent's data is shown on a single 
> >> line.
> >>
> >> Does anyone has already develop a procedure that automatically takes the 
> >> multiple lines and aggregates them into a single line?
> >>
> >> Thank you in advance,
> >> Luca
> >>
> >> Mr. Luca Meyer
> >> www.lucameyer.com
> >> R version 2.14.1 (2011-12-22)
> >> Mac OS X 10.6.8
> >>
> >>
> > --
> > Sarah Goslee
> > http://www.functionaldiversity.org
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple lines for each record: how do I handle that

2012-02-22 Thread Luca Meyer
Sure, I am sorry I have not done that in the first place.

The datasets I have looks like:

id <- c(1,1,2,2,2,3)
v1 <- c(NA,1,NA,1,NA,1)
v2 <- as.character(c("yes","","no","","","yes"))
v3 <- as.factor(c(NA,1,NA,NA,3,2))
d0 <- data.frame(id,v1,v2,v3)
d0

What I would need is to derive a dataset that looks like:

id <- c(1,2,3)
v1 <- c(1,1,1)
v2 <- as.character(c("yes","no","yes"))
v3 <- as.factor(c(1,3,2))
d1 <- data.frame(id,v1,v2,v3)
d1

The issue is related to the need to have an automated procedure that reads in 
the different variable types and aggregates them accordingly as every dataset 
will be different from the previous in terms of number of variables and records 
involved.

Thank you,
Luca

Il giorno 22/feb/2012, alle ore 20.26, Sarah Goslee ha scritto:

> If you provide a small reproducible example of your data format and
> expected output, I'm sure someone here can offer a useful solution.
> 
> Without knowing what your data look like, not so easy.
> 
> Sarah
> 
> On Wed, Feb 22, 2012 at 2:22 PM, Luca Meyer  wrote:
>> Hi Folks,
>> 
>> I just discovered that my dataset (coming from QuestionPro platform) has got 
>> multiple lines for each respondent id, but what I would really need is a 
>> "regular" data matrix where each respondent's data is shown on a single line.
>> 
>> Does anyone has already develop a procedure that automatically takes the 
>> multiple lines and aggregates them into a single line?
>> 
>> Thank you in advance,
>> Luca
>> 
>> Mr. Luca Meyer
>> www.lucameyer.com
>> R version 2.14.1 (2011-12-22)
>> Mac OS X 10.6.8
>> 
>> 
> -- 
> Sarah Goslee
> http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple lines for each record: how do I handle that

2012-02-22 Thread Luca Meyer
Hi Folks,

I just discovered that my dataset (coming from QuestionPro platform) has got 
multiple lines for each respondent id, but what I would really need is a 
"regular" data matrix where each respondent's data is shown on a single line.

Does anyone has already develop a procedure that automatically takes the 
multiple lines and aggregates them into a single line?

Thank you in advance,
Luca

Mr. Luca Meyer
www.lucameyer.com
R version 2.14.1 (2011-12-22)
Mac OS X 10.6.8







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Row percentage labels within mosaic graph

2011-09-21 Thread Luca Meyer
Hi, 

I have the following sample data 

x <- c(1,2,1,1,2,3,1,2,3,2,3,1,2,2,1,3,2,3,1,3)
y <- c(1,2,1,2,2,2,1,2,1,1,1,1,2,2,1,2,2,1,2,1)
w <- c(1, 1, 1.5, 1, 1.2, 0.8, 0.9, 1.7,  1, 1.3, 1, 1, 0.7, 0.8, 1.4, 1.3, 1, 
1, 0.9, 0.7)
d1 <- data.frame(x,y,w)

and I wish to build a x,y mosaic graphs that shows as labels both row 
percentages and number of cases. So far I have been using this script:

require(gmodels)
require(vcd)

d2 <- xtabs(~ x + y, data=d1)
mosaic  (
d2,
gp = shading_max,
labeling_args = list(
gp_labels = gpar(fontsize = 10, fontface = 1),
rot_labels = c(0,90,90,0),
gp_varnames = gpar(fontsize = 0, fontface = 2)
),
main= "title", main_gp = gpar(fontsize = 16, fontface = 2),
pop=FALSE
)
d3 <- CrossTable(d1$x,d1$y)
etichette <- ifelse(d2 < 5, "<5", paste(round(d3$prop.row*100, 
digits=1),"%\n(n=",d2,")", sep=""))
labeling_cells(text = etichette, clip = FALSE, gp_text=gpar(fontsize=10))(d2)

This works just fine but now I have to apply the weight w to the computation. I 
have modified the first part of the above script to

d2 <-round(xtabs(w ~ x + y, data=d1), digits=0)
mosaic  (
d2,
gp = shading_max,
labeling_args = list(
gp_labels = gpar(fontsize = 10, fontface = 1),
rot_labels = c(0,90,90,0),
gp_varnames = gpar(fontsize = 0, fontface = 2)
),
main= "title", main_gp = gpar(fontsize = 16, fontface = 2),
pop=FALSE
)

but I have some difficulty with the labeling part.  I can show number of 
observation in the labels using:

d3 <- as.list(round(xtabs(w ~ x + y, data=d1)), digits=0) 
etichette <- ifelse(d2 < 5, "<5", paste("(n=",d3,")", sep=""))
labeling_cells(text = etichette, clip = FALSE, gp_text=gpar(fontsize=10))(d2)

but I would need to show row proportions, do you know how I can do that?

Thanks,
Luca

Mr. Luca Meyer
www.lucameyer.com
R version 2.13.1 (2011-07-08)
Mac OS X 10.6.8







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ordering rows within CrossTable

2011-09-10 Thread Luca Meyer
Hi,

I am running the following -masked- code:

set.seed(23)
city <- sample(c("C1","C2"),size=100,replace=T)
reason <- sample(c("R1","R2","R3","R4"),size=100,replace=T)
df <- data.frame(city,reason)
library(gmodels)
CrossTable(df$reason,df$city,prop.r=F,prop.c=F,prop.t=F,prop.chisq=F)

And I get the following output:

 | df$city 
   df$reason |C1 |C2 | Row Total | 
-|---|---|---|
  R1 | 4 |13 |17 | 
-|---|---|---|
  R2 |19 |10 |29 | 
-|---|---|---|
  R3 |12 |13 |25 | 
-|---|---|---|
  R4 |11 |18 |29 | 
-|---|---|---|
Column Total |46 |54 |   100 | 
-|---|---|---|

I would like to have the df$reason sorted by decreasing count on the Row Total 
- that is showing R2, R4, R3 and finally R1 - how can I do that?

Thanks,

Luca


Mr. Luca Meyer
www.lucameyer.com
R version 2.13.1 (2011-07-08)
Mac OS X 10.6.8







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Showing zero frequencies with xtabs

2011-08-30 Thread Luca Meyer
Thanks Peter & Petr,

It was indeed an issue of having some character variables in there. Now it 
works just fine.

Cheers,
Luca

Il giorno 30/ago/2011, alle ore 10.15, peter dalgaard ha scritto:

> 
> On Aug 30, 2011, at 10:04 , Luca Meyer wrote:
> 
>> Hi,
>> 
>> Does anyone know how to show zero frequencies variable levels with the xtabs 
>> command? They show with the table(x,y) command but I need to apply weight to 
>> frequency tables and I also need to cbind several tables together, which 
>> implies that they all need to show the same number of rows. 
> 
> Are you sure you are doing the same thing as with table(). I'd expect it to 
> work if you ensure that the variables are factors:
> 
>> library(ISwR)
>> xtabs(~sex+menarche,data=juul)
>   menarche
> sex   1   2
>  2 369 335
> 
>> juul$sex <- factor(juul$sex,levels=1:2)
>> xtabs(~sex+menarche,data=juul)
>   menarche
> sex   1   2
>  1   0   0
>  2 369 335
> 
> 
> 
>> 
>> Alternatively, do you know how to column bind tables with different number 
>> of rows? I cannot use merge as it requires daata.frame and that modifies the 
>> look of the banner table I am trying to create...
>> 
>> Thanks,
>> Luca
>> 
>> 
>> Mr. Luca Meyer
>> www.lucameyer.com
>> R version 2.13.1 (2011-07-08)
>> Mac OS X 10.6.8
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard, Professor
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Showing zero frequencies with xtabs

2011-08-30 Thread Luca Meyer
Hi,

Does anyone know how to show zero frequencies variable levels with the xtabs 
command? They show with the table(x,y) command but I need to apply weight to 
frequency tables and I also need to cbind several tables together, which 
implies that they all need to show the same number of rows. 

Alternatively, do you know how to column bind tables with different number of 
rows? I cannot use merge as it requires daata.frame and that modifies the look 
of the banner table I am trying to create...

Thanks,
Luca


Mr. Luca Meyer
www.lucameyer.com
R version 2.13.1 (2011-07-08)
Mac OS X 10.6.8







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I get a weighted frequency table?

2011-08-29 Thread Luca Meyer
Thank you, that's works just fine.
Luca

Il giorno 29/ago/2011, alle ore 23.48, H. T. Reynolds ha scritto:

> Hi,
> 
> I use xtabs with the weight variable on the left hand side of the formula as 
> in
> 
>   xtabs(weight ~ opinion + gender + ...)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I get a weighted frequency table?

2011-08-29 Thread Luca Meyer
Hi David,

Unfortunately I need to use the "should have been" frequencies if the sample 
corresponded perfectly in terms of some "reference" variables to the 
population. 

That is, if in my sample I observe V1_R1=10%, V1_R2=50%, V3_R3=40% while the 
same known population distribution is V1_R1=20%, V1_R2=30%, V3_R3=50% then I 
would like to see what V2*V3, V2*V4, ... , V2* VN, V3*V4, ... ,VN-1 * VN would 
have been had the sample perfectly reflect the population in terms of V1.

I hope that clarifies what I am trying to achieve...

Thanks,
Luca

Il giorno 29/ago/2011, alle ore 16.29, David L Carlson ha scritto:

> If you are talking about weights that are the frequencies in each cell, you
> can use xtabs():
> 
> df <- data.frame(Var1=c("Absent", "Present", "Absent", "Present"), 
> Var2=c("Absent", "Absent", "Present", "Present"), Freq=c(17, 6, 3, 12))
> df
> xtabs(Freq~Var1+Var2, data=df)
> 
> --
> David L Carlson
> Associate Professor of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
> 
> 
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Leandro Marino
> Sent: Sunday, August 28, 2011 12:15 PM
> To: Luca Meyer
> Cc: r-help@r-project.org
> Subject: Re: [R] How do I get a weighted frequency table?
> 
> *Luca,
> *
> 
> 
> you may use survey package. You have to declare the design with design
> function and than you can you svytotal, svyby, svymean functions to do your
> tabulations.
> 
> Regards,
> Leandro
> 
> 
> 
> Atenciosamente,
> Leandro Marino
> http://www.leandromarino.com.br (Fotsgrafo)
> http://est.leandromarino.com.br/Blog (Estatmstico)
> Cel.: + 55 21 9845-7707
> Cel.: + 55 21 8777-7907
> 
> 
> 
> 2011/8/28 Luca Meyer 
> 
>> Hello,
>> 
>> I have to run a set of crosstabulations to which I need to apply some 
>> weights. I am currently doing an unweighted version of such crosstabs 
>> using table(x,y).
>> 
>> I am used with SPSS to create a weighting variable and to use WEIGHT 
>> BY VAR before running the CTABLES, is there a similar procedure in R?
>> 
>> Thanks,
>> Luca
>> 
>> Mr. Luca Meyer
>> www.lucameyer.com
>> R version 2.13.1 (2011-07-08)
>> Mac OS X 10.6.8
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>   [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
>   [[alternative HTML version deleted]]
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I get a weighted frequency table?

2011-08-28 Thread Luca Meyer
Hello,

I have to run a set of crosstabulations to which I need to apply some weights. 
I am currently doing an unweighted version of such crosstabs using table(x,y). 

I am used with SPSS to create a weighting variable and to use WEIGHT BY VAR 
before running the CTABLES, is there a similar procedure in R? 

Thanks,
Luca

Mr. Luca Meyer
www.lucameyer.com
R version 2.13.1 (2011-07-08)
Mac OS X 10.6.8







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding labels into lattice's barchart

2011-02-14 Thread Luca Meyer
Thanks Deepayan,

What you suggest is quite fine, but provides the overall number of cases for 
the entire dataset splitted into V2 levels. 

What about if I need to show panel specific's values? For instance I want to 
show not the total number of Female but the total number of Female in 1st Class.

In other worlds, take your example and suppose I have:

barchart(V2 ~ Freq | V1, data = tdf, groups = V3, layout=c(1,4), stack=TRUE,
   ylim = sprintf("%s (n=%g)", names(numByV2), numByV2))

and now what I would like to show is the result of

with(tdf, tapply(Freq, list(V2,V1), sum))

next to each stacked bar. 

In the previous example, I would need show in the Crew panel Female (n=23), in 
the 3rd Class panel Female (n=196), etc...

Can I do that?

Thanks,
Luca



Il giorno 14/feb/2011, alle ore 11.43, Deepayan Sarkar ha scritto:

> On Wed, Feb 9, 2011 at 11:04 PM, Luca Meyer  wrote:
>> *** APOLOGIZES FOR THOSE READING THE LIST THROUGH NABBLE THIS WAS ALREADY 
>> POSTED THERE BUT NOT FORWARDED TO THE LIST FOR SOME UNKNOWN REASON ***
>> 
>> I have a dataset that looks like:
>> 
>> $ V1: factor with 4 levels
>> $ V2: factor with 4 levels
>> $ V3: factor with 2 levels
>> $ V4: num (summing up to 100 within V3 levels)
>> $ V5: num (nr of cases for each unique combination of V1*V2*V3 levels)
>> 
>> Quite new to lattice - I've started reading Deepayan's book a few days ago - 
>> I have written the following:
>> 
>> barchart(V2 ~ V4 | V1,
>>data=d1,
>>groups=V3,
>>stack=TRUE,
>>auto.key= list(space="top"),
>>layout = c(1,4),
>>xlab=" "
>>)
>> 
>> which works just fine as a stacked bar chart with bars adding up to 100%. 
>> Now what I would like to see is the number of cases showing next to the 4 
>> x-axis's labels - i.e. V2_L1, ... V2_L4.
>> 
>> In other words now I see something like:
>> 
>> *** V1_L1 ***
>> V2_L4 AAAVVV
>> V2_L3 AA
>> V2_L2 AV
>> V2_L1 AA
>> *** V1_L2 ***
>> V2_L4 AA
>> V2_L3 AV
>> etc...
>> 
>> But what I am looking for is something like:
>> *** V1_L1 ***
>> V2_L4 (n=60) AAAVVV
>> V2_L3 (n=10) AA
>> V2_L2 (n=52) AV
>> V2_L1 (n=15) AA
>> *** V1_L2 ***
>> V2_L4 (n=18) AA
>> V2_L3 (n=74) AV
>> etc...
>> 
>> How can I do that? I have tried:
>> 
>> V6 <- paste(V2," (n",V5,")")
> 
> What you really want is to compute the total sum of V5 per level of V2
> (and add that to the labels of V2). There are many ways of doing so,
> one is tapply().
> 
> In the absence of a reproducible example, here is an approximation:
> 
> tdf <- as.data.frame.table(apply(Titanic, c(1, 2, 4), sum))
> names(tdf)[1:3] <- paste("V", 1:3, sep = "")
> 
> str(tdf)
> 
> barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE)
> 
> with(tdf, tapply(Freq, V2, sum))
> 
> numByV2 <- with(tdf, tapply(Freq, V2, sum))
> 
> barchart(V2 ~ Freq | V1, data = tdf, groups = V3, stack=TRUE,
>ylim = sprintf("%s (n=%g)", names(numByV2), numByV2))
> 
> ## or
> 
> levels(tdf$V2) <- sprintf("%s (n=%g)", levels(tdf$V2), numByV2)
> barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE)
> 
> -Deepayan
> 
>> 
>> but what i get when I run
>> 
>> barchart(V6 ~ V4 | V1,
>>data=d1,
>>groups=V3,
>>stack=TRUE,
>>auto.key= list(space="top"),
>>layout = c(1,4),
>>xlab=" "
>>)
>> 
>> is a bunch of empty bars due to the fact that the unique combinations have 
>> risen.
>> 
>> Any help would be appreciated.
>> 
>> Thanks,
>> Luca
>> 
>> Mr. Luca Meyer
>> www.lucameyer.com
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding labels into lattice's barchart

2011-02-14 Thread Luca Meyer
Thanks Deepayan,

What you suggest is quite fine, but provides the overall number of cases for 
the entire dataset splitted into V2 levels. 

What about if I need to show panel specific's values? For instance I want to 
show not the total number of Female but the total number of Female in 1st Class.

In other worlds, take your example and suppose I have:

barchart(V2 ~ Freq | V1, data = tdf, groups = V3, layout=c(1,4), stack=TRUE,
ylim = sprintf("%s (n=%g)", names(numByV2), numByV2))

and now what I would like to show is the result of

with(tdf, tapply(Freq, list(V2,V1), sum))

next to each stacked bar. 

In the previous example, I would need show in the Crew panel Female (n=23), in 
the 3rd Class panel Female (n=196), etc...

Can I do that?

Thanks,
Luca



Il giorno 14/feb/2011, alle ore 11.43, Deepayan Sarkar ha scritto:

> On Wed, Feb 9, 2011 at 11:04 PM, Luca Meyer  wrote:
>> *** APOLOGIZES FOR THOSE READING THE LIST THROUGH NABBLE THIS WAS ALREADY 
>> POSTED THERE BUT NOT FORWARDED TO THE LIST FOR SOME UNKNOWN REASON ***
>> 
>> I have a dataset that looks like:
>> 
>> $ V1: factor with 4 levels
>> $ V2: factor with 4 levels
>> $ V3: factor with 2 levels
>> $ V4: num (summing up to 100 within V3 levels)
>> $ V5: num (nr of cases for each unique combination of V1*V2*V3 levels)
>> 
>> Quite new to lattice - I've started reading Deepayan's book a few days ago - 
>> I have written the following:
>> 
>> barchart(V2 ~ V4 | V1,
>> data=d1,
>> groups=V3,
>> stack=TRUE,
>> auto.key= list(space="top"),
>> layout = c(1,4),
>> xlab=" "
>> )
>> 
>> which works just fine as a stacked bar chart with bars adding up to 100%. 
>> Now what I would like to see is the number of cases showing next to the 4 
>> x-axis's labels - i.e. V2_L1, ... V2_L4.
>> 
>> In other words now I see something like:
>> 
>> *** V1_L1 ***
>> V2_L4 AAAVVV
>> V2_L3 AA
>> V2_L2 AV
>> V2_L1 AA
>> *** V1_L2 ***
>> V2_L4 AA
>> V2_L3 AV
>> etc...
>> 
>> But what I am looking for is something like:
>> *** V1_L1 ***
>> V2_L4 (n=60) AAAVVV
>> V2_L3 (n=10) AA
>> V2_L2 (n=52) AV
>> V2_L1 (n=15) AA
>> *** V1_L2 ***
>> V2_L4 (n=18) AA
>> V2_L3 (n=74) AV
>> etc...
>> 
>> How can I do that? I have tried:
>> 
>> V6 <- paste(V2," (n",V5,")")
> 
> What you really want is to compute the total sum of V5 per level of V2
> (and add that to the labels of V2). There are many ways of doing so,
> one is tapply().
> 
> In the absence of a reproducible example, here is an approximation:
> 
> tdf <- as.data.frame.table(apply(Titanic, c(1, 2, 4), sum))
> names(tdf)[1:3] <- paste("V", 1:3, sep = "")
> 
> str(tdf)
> 
> barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE)
> 
> with(tdf, tapply(Freq, V2, sum))
> 
> numByV2 <- with(tdf, tapply(Freq, V2, sum))
> 
> barchart(V2 ~ Freq | V1, data = tdf, groups = V3, stack=TRUE,
> ylim = sprintf("%s (n=%g)", names(numByV2), numByV2))
> 
> ## or
> 
> levels(tdf$V2) <- sprintf("%s (n=%g)", levels(tdf$V2), numByV2)
> barchart(V2 ~ Freq | V1, data=tdf, groups=V3, stack=TRUE)
> 
> -Deepayan
> 
>> 
>> but what i get when I run
>> 
>> barchart(V6 ~ V4 | V1,
>> data=d1,
>> groups=V3,
>> stack=TRUE,
>> auto.key= list(space="top"),
>> layout = c(1,4),
>> xlab=" "
>> )
>> 
>> is a bunch of empty bars due to the fact that the unique combinations have 
>> risen.
>> 
>> Any help would be appreciated.
>> 
>> Thanks,
>> Luca
>> 
>> Mr. Luca Meyer
>> www.lucameyer.com
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adding labels into lattice's barchart

2011-02-09 Thread Luca Meyer
*** APOLOGIZES FOR THOSE READING THE LIST THROUGH NABBLE THIS WAS ALREADY 
POSTED THERE BUT NOT FORWARDED TO THE LIST FOR SOME UNKNOWN REASON ***

I have a dataset that looks like: 

$ V1: factor with 4 levels 
$ V2: factor with 4 levels 
$ V3: factor with 2 levels 
$ V4: num (summing up to 100 within V3 levels) 
$ V5: num (nr of cases for each unique combination of V1*V2*V3 levels) 

Quite new to lattice - I've started reading Deepayan's book a few days ago - I 
have written the following: 

barchart(V2 ~ V4 | V1, 
 data=d1, 
 groups=V3, 
 stack=TRUE, 
 auto.key= list(space="top"), 
 layout = c(1,4), 
 xlab=" " 
 ) 

which works just fine as a stacked bar chart with bars adding up to 100%. Now 
what I would like to see is the number of cases showing next to the 4 x-axis's 
labels - i.e. V2_L1, ... V2_L4. 

In other words now I see something like: 

*** V1_L1 *** 
V2_L4 AAAVVV 
V2_L3 AA 
V2_L2 AV 
V2_L1 AA 
*** V1_L2 *** 
V2_L4 AA 
V2_L3 AV 
etc... 

But what I am looking for is something like: 
*** V1_L1 *** 
V2_L4 (n=60) AAAVVV 
V2_L3 (n=10) AA 
V2_L2 (n=52) AV 
V2_L1 (n=15) AA 
*** V1_L2 *** 
V2_L4 (n=18) AA 
V2_L3 (n=74) AV 
etc... 

How can I do that? I have tried: 

V6 <- paste(V2," (n",V5,")") 

but what i get when I run 

barchart(V6 ~ V4 | V1, 
 data=d1, 
 groups=V3, 
 stack=TRUE, 
 auto.key= list(space="top"), 
 layout = c(1,4), 
 xlab=" " 
 ) 

is a bunch of empty bars due to the fact that the unique combinations have 
risen. 

Any help would be appreciated. 

Thanks, 
Luca 

Mr. Luca Meyer 
www.lucameyer.com  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Panel title: mfrow() or ?

2011-01-10 Thread Luca Meyer
Hi,

I am trying to build a 3 rows by 2 columns panel using 

par(mfrow=c(3,2))

The 6 graphs are coming out quite all right, but now I would like to put a 
title on top of the page - i.e. something that is common for all 6 graphs - how 
can I do that?

Thanks,
Luca


Mr. Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Getting total bar's label & value labels in a barplot

2011-01-09 Thread Luca Meyer
Hi,

I have been trying to get the label under the total column - i.e. a mean value 
of columns 2 to 6 - in a barplot I generate with this script:

t1 <- tapply(A, B, sum)
t1[8] <- mean(t1[2:6])
t1 <- as.table(t1)
barplot(t1, ylim=c(0,3000))
mtext("Var1", side = 1, line = 3)
mtext("Var2", side = 2, line = 3)

I have been trying to use

axis(1, at=1:8, labels=c("1","2","3","4","5","6","7","8"))

but I get labels not standing underneat the columns...can someone help me out 
on this one?

Also, I would like to plot onto each bar the corresponding numerical value - 
e.g. "1824" on the first bar, ecc...

Please notice that str(t1) would look like:

 Named num [1:8] 1824 2339 2492 2130 2360 ...
 - attr(*, "names")= chr [1:8] "1" "2" "3" "4" ...

Thanks,
Luca


Mr. Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in calling source(): invalid multibyte character in parser

2011-01-04 Thread Luca Meyer
How would I go by doing that? I have tried with:

source("file.R", encoding="it_IT.UTF-8")

But I get

Error in file(file, "r", encoding = encoding) : 
  unsupported conversion from 'it_IT.UTF-8' to ''

Thanks,
Luca

PS:  "it_IT.UTF-8" is what I get under locale when I run sessionInfo() 

Il giorno 03/gen/2011, alle ore 09.48, Prof Brian Ripley ha scritto:

> On Mon, 3 Jan 2011, peter dalgaard wrote:
> 
>> 
>> On Jan 3, 2011, at 08:32 , Luca Meyer wrote:
>> 
>>> Being italians when writing comments/instructions we use accented letters - 
>>> like à, ò, è, etc when running R scripts using such characters I get 
>>> and error saying:
>>> 
>>> invalid multibyte character in parser
>>> 
>>> I have been looking at the help and searched the r-help archives but I 
>>> haven't find anything that I could intelligibly apply to my case.
>>> 
>>> Can anyone suggest a fix for this error?
>> 
>> The most likely cause is that your scripts are written in an "8 bit ASCII" 
>> encoding (Latin-1 or -9, most likely), while R is running in a UTF8 locale. 
>> If that is the cause, the fix is to standardize things to use the same 
>> locale. You can convert the encoding of your source file using the iconv 
>> utility (in a Terminal window).
> 
> Or use the 'encoding' argument of source() to tell R what the encoding is, 
> e.g. encoding="latin1" or "latin-9" (the inconsistency being in the iconv 
> used on Macs, not in R).
> 
>> 
>> -pd
>> 
>>> 
>>> Thanks,
>>> Luca
>>> 
>>> Mr. Luca Meyer
>>> www.lucameyer.com
>>> IBM SPSS Statistics release 19.0.0
>>> R version 2.12.1 (2010-12-16)
>>> Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> -- 
>> Peter Dalgaard
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email: pd@cbs.dk  Priv: pda...@gmail.com
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> -- 
> Brian D. Ripley,  rip...@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error in calling source(): invalid multibyte character in parser

2011-01-04 Thread Luca Meyer
It works fine, thanks. 
I was just wondering is there is anyway to include automatically the command 
you suggest as a default when I open R.
Thanks,
Luca

Il giorno 03/gen/2011, alle ore 08.36, Phil Spector ha scritto:

> Luca -
>   What happens why you type
> 
> Sys.setlocale('LC_ALL','C')
> 
>   before issuing the source command?
> 
>   - Phil Spector
>Statistical Computing Facility
>Department of Statistics
>UC Berkeley
>    spec...@stat.berkeley.edu
> 
> 
> On Mon, 3 Jan 2011, Luca Meyer wrote:
> 
>> Being italians when writing comments/instructions we use accented letters - 
>> like à, ò, è, etc when running R scripts using such characters I get and 
>> error saying:
>> 
>> invalid multibyte character in parser
>> 
>> I have been looking at the help and searched the r-help archives but I 
>> haven't find anything that I could intelligibly apply to my case.
>> 
>> Can anyone suggest a fix for this error?
>> 
>> Thanks,
>> Luca
>> 
>> Mr. Luca Meyer
>> www.lucameyer.com
>> IBM SPSS Statistics release 19.0.0
>> R version 2.12.1 (2010-12-16)
>> Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error in calling source(): invalid multibyte character in parser

2011-01-02 Thread Luca Meyer
Being italians when writing comments/instructions we use accented letters - 
like à, ò, è, etc when running R scripts using such characters I get and 
error saying:

invalid multibyte character in parser

I have been looking at the help and searched the r-help archives but I haven't 
find anything that I could intelligibly apply to my case.

Can anyone suggest a fix for this error? 

Thanks,
Luca

Mr. Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Passing parameter to a function

2010-12-20 Thread Luca Meyer
Hi Duncan,

Yes, A and B are columns in D. Having said that I and trying to avoid 

tab(D$A,D$B)

and I would prefer:

tab(A,B)

Unfortunately the syntax you suggest is giving me the same error:

Error in eval(expr, envir, enclos) : object "A" not found

I have tried to add some deparse() but I have got the error over again. The 
last version I have tried:

function(x,y){
z <- substitute(time ~ x + y, list(x = deparse(substitute(x)), y = 
deparse(substitute(y
xtabs(z, data=D)

gives me another error:

Error in terms.formula(formula, data = data) : 
  formula models not valid in ExtractVars 

Any idea on how I should modify the function to make it work?

Thanks,
Luca


Il giorno 20/dic/2010, alle ore 19.28, Duncan Murdoch ha scritto:

> On 20/12/2010 1:13 PM, Luca Meyer wrote:
>> I am trying to pass a couple of variable names to a xtabs formula:
>> 
>> >  tab<- function(x,y){
>> xtabs(time~x+y, data=D)
>> }
>> 
>> But when I run:
>> 
>> >  tab(A,B)
>> 
>> I get:
>> 
>> Error in eval(expr, envir, enclos) : object "A" not found
>> 
>> I am quite sure that there is some easy way out, but I have tried with 
>> different combinations of deparse(), substitute(), eval(), etc without 
>> success, can someone help?
> 
> I assume that A and B are columns in D?  If so, you could use
> 
> tab(D$A, D$B)
> 
> to get what you want.  If you really want tab(A,B) to work, you'll need to do 
> messy work with substitute, e.g. in the tab function, something like
> 
> fla <- substitute(time ~ x + y, list(x = substitute(x), y = substitute(y))
> xtabs(fla, data=D)
> 
> Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to extended recode sintax? Bug?

2010-12-20 Thread Luca Meyer
Yes, I am seeing that at the end of 2010-beginning 2011. Try:

weekdays(as.POSIXct("2010-12-25")+(0:20)*24*60*60)
week(as.POSIXct("2010-12-25")+(0:20)*24*60*60)

Week 1 (2011) is made up of 6 days

Luca

Il giorno 20/dic/2010, alle ore 17.54, David Winsemius ha scritto:

> 
> On Dec 20, 2010, at 10:58 AM, Luca Meyer wrote:
> 
>> Right, I appreciate the first day of the year start date. I am just 
>> wondering why then the cut off day is not the same for the rest of the 
>> year...but it's all right to use other packages.
> 
> Are you saying it shifts within the year? I am not seeing that:
> 
> require(lubridate)
> 
> > weekdays(as.POSIXct("2010-01-01")+(0:8)*24*60*60)
> [1] "Friday""Saturday"  "Sunday""Monday""Tuesday"   "Wednesday"
> [7] "Thursday"  "Friday""Saturday"
> > week(as.POSIXct("2010-01-01")+(0:8)*24*60*60)
> [1] 1 1 1 1 1 1 2 2 2
> 
> Looks to be incrementing weeks between Wed and Thurs at the beginning of the 
> year just as it did in your example. I admit that I thought that it should be 
> shifting at the Thursday - Friday divide, but setting a zero point can be 
> ambiguous. I thought if it were  Midnight Thursday-Friday that all of 
> Thurdays would be in week 1. But at least it appears consistent.
> 
> 
>> Thanks,
>> Luca
>> 
>> Il giorno 20/dic/2010, alle ore 14.16, David Winsemius ha scritto:
>> 
>>> 
>>> On Dec 20, 2010, at 12:54 AM, Luca Meyer wrote:
>>> 
>>>> All right, I get it now: lubridate's week() define weeks from Thursday 
>>>> till the following Wednesday. You'd probably agree with me that it's a bit 
>>>> strange what it is going to do over the turn of the year:
>>>> 
>>>>> y <- 
>>>>> as.POSIXct(c("2010-12-27","2010-12-28","2010-12-29","2010-12-30","2010-12-31","2011-01-01","2011-01-02","2011-01-03","2011-01-04","2011-01-05","2011-01-06","2011-01-07","2011-01-08","2011-01-09","2011-01-10","2011-01-11","2010-01-12","2010-01-13","2010-01-14"))
>>>>> week(y)
>>>> [1] 52 52 52 53 53  1  1  1  1  1  1  2  2  2  2  2  2  2  3
>>>> 
>>>> Why would the first week of the year be made of 6 days and the turn from 
>>>> week 1 to week 2 on the night between Thursday and Friday and not 
>>>> Wednesday and Friday like every other week?
>>> 
>>> weeks in lubridate start on whatever day of the week is the first of that 
>>> year.
>>> 
>>> If you want a Monday starting day (or the option to change to another 
>>> starting day), then package chron has such facilities.
>>> 
>>> 
>>>> 
>>>> Cheers,
>>>> Luca
>>>> 
>>>> 
>>>> 
>>>> Il giorno 19/dic/2010, alle ore 18.14, Uwe Ligges ha scritto:
>>>> 
>>>>> 
>>>>> 
>>>>> On 19.12.2010 13:20, David Winsemius wrote:
>>>>>> 
>>>>>> On Dec 19, 2010, at 5:11 AM, Luca Meyer wrote:
>>>>>> 
>>>>>>> Something goes wrong with the week function of the lubridate package:
>>>>>>> 
>>>>>>>> x= as.POSIXct(factor(c("2010-12-15 17:28:27",
>>>>>>> + "2010-12-15 17:32:34",
>>>>>>> + "2010-12-15 18:48:39",
>>>>>>> + "2010-12-15 19:25:00",
>>>>>>> + "2010-12-16 08:00:00",
>>>>>>> + "2010-12-16 08:25:49",
>>>>>>> + "2010-12-16 09:00:00")))
>>>>>>>> require(lubridate)
>>>>>> 
>>>>>>>> weekdays(x)
>>>>>>> [1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì"
>>>>>>> "Giovedì" "Giovedì"
>>>>>>>> week(x)
>>>>>>> [1] 50 50 50 50 51 51 51
>>>>>> 
>>>>>> But 2010-12-15 is a Wednesday and 2010-12-16 is a Thursday.
>>>>>> 
>>>>> 
>>>>> 
>>>>> Together with the description of ?week this shows that lubridate's week() 
>>>>> function works as documented rather than as expected by Luca Meyer.
>>>>> 
>>>>> Uwe Ligges
>>>> 
>>> 
>>> David Winsemius, MD
>>> West Hartford, CT
>>> 
>> 
> 
> David Winsemius, MD
> West Hartford, CT
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Passing parameter to a function

2010-12-20 Thread Luca Meyer
I am trying to pass a couple of variable names to a xtabs formula:

> tab <- function(x,y){
xtabs(time~x+y, data=D)
}

But when I run:

> tab(A,B)

I get:

Error in eval(expr, envir, enclos) : object "A" not found

I am quite sure that there is some easy way out, but I have tried with 
different combinations of deparse(), substitute(), eval(), etc without success, 
can someone help?

Thanks,
Luca

Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to extended recode sintax? Bug?

2010-12-20 Thread Luca Meyer
Right, I appreciate the first day of the year start date. I am just wondering 
why then the cut off day is not the same for the rest of the year...but it's 
all right to use other packages.
Thanks,
Luca

Il giorno 20/dic/2010, alle ore 14.16, David Winsemius ha scritto:

> 
> On Dec 20, 2010, at 12:54 AM, Luca Meyer wrote:
> 
>> All right, I get it now: lubridate's week() define weeks from Thursday till 
>> the following Wednesday. You'd probably agree with me that it's a bit 
>> strange what it is going to do over the turn of the year:
>> 
>>> y <- 
>>> as.POSIXct(c("2010-12-27","2010-12-28","2010-12-29","2010-12-30","2010-12-31","2011-01-01","2011-01-02","2011-01-03","2011-01-04","2011-01-05","2011-01-06","2011-01-07","2011-01-08","2011-01-09","2011-01-10","2011-01-11","2010-01-12","2010-01-13","2010-01-14"))
>>> week(y)
>> [1] 52 52 52 53 53  1  1  1  1  1  1  2  2  2  2  2  2  2  3
>> 
>> Why would the first week of the year be made of 6 days and the turn from 
>> week 1 to week 2 on the night between Thursday and Friday and not Wednesday 
>> and Friday like every other week?
> 
> weeks in lubridate start on whatever day of the week is the first of that 
> year.
> 
> If you want a Monday starting day (or the option to change to another 
> starting day), then package chron has such facilities.
> 
> 
>> 
>> Cheers,
>> Luca
>> 
>> 
>> 
>> Il giorno 19/dic/2010, alle ore 18.14, Uwe Ligges ha scritto:
>> 
>>> 
>>> 
>>> On 19.12.2010 13:20, David Winsemius wrote:
>>>> 
>>>> On Dec 19, 2010, at 5:11 AM, Luca Meyer wrote:
>>>> 
>>>>> Something goes wrong with the week function of the lubridate package:
>>>>> 
>>>>>> x= as.POSIXct(factor(c("2010-12-15 17:28:27",
>>>>> + "2010-12-15 17:32:34",
>>>>> + "2010-12-15 18:48:39",
>>>>> + "2010-12-15 19:25:00",
>>>>> + "2010-12-16 08:00:00",
>>>>> + "2010-12-16 08:25:49",
>>>>> + "2010-12-16 09:00:00")))
>>>>>> require(lubridate)
>>>> 
>>>>>> weekdays(x)
>>>>> [1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì"
>>>>> "Giovedì" "Giovedì"
>>>>>> week(x)
>>>>> [1] 50 50 50 50 51 51 51
>>>> 
>>>> But 2010-12-15 is a Wednesday and 2010-12-16 is a Thursday.
>>>> 
>>> 
>>> 
>>> Together with the description of ?week this shows that lubridate's week() 
>>> function works as documented rather than as expected by Luca Meyer.
>>> 
>>> Uwe Ligges
>> 
> 
> David Winsemius, MD
> West Hartford, CT
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to extended recode sintax? Bug?

2010-12-20 Thread Luca Meyer
All right, I get it now: lubridate's week() define weeks from Thursday till the 
following Wednesday. You'd probably agree with me that it's a bit strange what 
it is going to do over the turn of the year:

> y <- 
> as.POSIXct(c("2010-12-27","2010-12-28","2010-12-29","2010-12-30","2010-12-31","2011-01-01","2011-01-02","2011-01-03","2011-01-04","2011-01-05","2011-01-06","2011-01-07","2011-01-08","2011-01-09","2011-01-10","2011-01-11","2010-01-12","2010-01-13","2010-01-14"))
> week(y)
 [1] 52 52 52 53 53  1  1  1  1  1  1  2  2  2  2  2  2  2  3

Why would the first week of the year be made of 6 days and the turn from week 1 
to week 2 on the night between Thursday and Friday and not Wednesday and Friday 
like every other week?

Cheers,
Luca



Il giorno 19/dic/2010, alle ore 18.14, Uwe Ligges ha scritto:

> 
> 
> On 19.12.2010 13:20, David Winsemius wrote:
>> 
>> On Dec 19, 2010, at 5:11 AM, Luca Meyer wrote:
>> 
>>> Something goes wrong with the week function of the lubridate package:
>>> 
>>>> x= as.POSIXct(factor(c("2010-12-15 17:28:27",
>>> + "2010-12-15 17:32:34",
>>> + "2010-12-15 18:48:39",
>>> + "2010-12-15 19:25:00",
>>> + "2010-12-16 08:00:00",
>>> + "2010-12-16 08:25:49",
>>> + "2010-12-16 09:00:00")))
>>>> require(lubridate)
>> 
>>>> weekdays(x)
>>> [1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì"
>>> "Giovedì" "Giovedì"
>>>> week(x)
>>> [1] 50 50 50 50 51 51 51
>> 
>> But 2010-12-15 is a Wednesday and 2010-12-16 is a Thursday.
>> 
> 
> 
> Together with the description of ?week this shows that lubridate's week() 
> function works as documented rather than as expected by Luca Meyer.
> 
> Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tabulating 2 factors weighting by a third var

2010-12-19 Thread Luca Meyer
Hi,

This must be an easy one but so far I haven't find a way out...

I have a data frame such as:

$ v1: Factor w/ 5 levels
$ v2: Factor w/ 2 levels
$ v3: Class 'difftime'  atomic [1:]

basically v1 and v2 are factors, while v3 is a variable containing the duration 
of certain activities (values ranging from 11 to 45000 sec, no missing values)

How can I get a table such that v1 levels will show as rows, v2 levels as 
columns and v3 is the weight by which table(v1,v2) is weighted? That is, 
instead of getting the count of occurences in each of the 10 cells of 
table(v1,v2) I would like to get the sum(v3), how can it be done?

Thanks,
Luca

Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ifelse stability problems?

2010-12-19 Thread Luca Meyer
%) 45.9  
134 B (10-20%) 24.9  
135  C (5-15%)  0.2  
136   D (1-5%)  1.6  
137 E (10-20%) 27.4  
138 A (50-67%) 61.5  
139 B (10-20%)  8.7  
140  C (5-15%) 22.5  
141   D (1-5%)  0.1  
142 E (10-20%)  7.2  
143 A (50-67%) 64.1  
144 B (10-20%)  0.9  
145  C (5-15%) 14.4ok
146   D (1-5%) 11.2ok
147 E (10-20%)  9.4ok
148 A (50-67%) 67.8  
149 B (10-20%) 11.7ok
150  C (5-15%) 10.6ok
151   D (1-5%)  1.3  
152 E (10-20%)  8.6  
153 A (50-67%) 65.9  
154 B (10-20%)  9.9ok
155  C (5-15%) 11.3ok
156   D (1-5%)  1.6  
157 E (10-20%) 11.3ok
158 A (50-67%) 77.0  
159 B (10-20%)  5.3  
160  C (5-15%)  8.6  
161   D (1-5%)  2.6  
162 E (10-20%)  6.5  
163 A (50-67%) 77.5  
164 B (10-20%)  5.7  
165  C (5-15%)  8.1  
166   D (1-5%)  4.6  
167 E (10-20%)  4.2  
168 A (50-67%) 40.1  
169 B (10-20%) 12.9ok
170  C (5-15%) 33.2  
171   D (1-5%)  0.3  
172 E (10-20%) 13.6ok
173 A (50-67%) 53.9  
174 B (10-20%) 10.1ok
175  C (5-15%)  8.4  
176   D (1-5%)  4.2  
177 E (10-20%) 23.4  
178 A (50-67%) 94.3  
179  C (5-15%)  1.7  
180 E (10-20%)  4.0  
181 A (50-67%) 62.1  
182 B (10-20%) 12.3ok
183  C (5-15%)  5.3  
184   D (1-5%)  7.3  
185 E (10-20%) 13.0ok
186 A (50-67%) 49.2  
187 B (10-20%) 14.1ok
188  C (5-15%)  7.9  
189   D (1-5%)  8.9  
190 E (10-20%) 20.0ok
191 A (50-67%) 63.6  
192 B (10-20%) 10.4ok
193  C (5-15%) 11.9ok
194   D (1-5%)  2.4  
195 E (10-20%) 11.7ok
196 A (50-67%) 55.1  
197 B (10-20%) 13.5ok
198  C (5-15%) 11.2ok
199   D (1-5%)  4.8  
200 E (10-20%) 15.5ok
201 A (50-67%) 68.6  
202 B (10-20%)  3.1  
203  C (5-15%)  8.2  
204   D (1-5%)  9.2ok
205 E (10-20%) 10.8ok
206 A (50-67%) 45.0  
207 B (10-20%)  4.8  
208  C (5-15%)  7.1  
209   D (1-5%)  4.9  
210 E (10-20%) 38.2  
211 A (50-67%) 85.2  
212 B (10-20%)  3.1  
213  C (5-15%)  4.4  
214   D (1-5%)  0.4  
215 E (10-20%)  6.9  
216 A (50-67%) 60.5  
217 B (10-20%) 10.1ok
218  C (5-15%) 11.1ok
219   D (1-5%)  1.8  
220 E (10-20%) 16.5ok
221 A (50-67%) 58.7  
222 B (10-20%)  7.0  
223  C (5-15%) 10.5ok
224   D (1-5%)  5.2  
225 E (10-20%) 18.7ok
226 A (50-67%) 90.0  
227  C (5-15%)  5.6  
228   D (1-5%)  0.7  
229 E (10-20%)  3.8  
230 A (50-67%) 62.5  
231 B (10-20%) 13.7ok
232  C (5-15%)  9.7ok
233   D (1-5%)  2.6  
234 E (10-20%) 11.6ok
235 A (50-67%) 55.6  
236 B (10-20%) 17.6ok
237  C (5-15%) 11.8ok
238   D (1-5%)  2.6  
239 E (10-20%) 12.4ok
240 A (50-67%) 85.2  
241 B (10-20%)  0.6  
242  C (5-15%)  2.1  
243   D (1-5%)  2.3  
244 E (10-20%)  9.8ok
245 A (50-67%) 87.4  
246 B (10-20%)  0.4  
247  C (5-15%)  2.9  
248   D (1-5%)  2.8  
249 E (10-20%)  6.4  
250 A (50-67%) 73.0  
251 B (10-20%)  4.0  
252  C (5-15%) 15.6ok
253   D (1-5%)  0.7  
254 E (10-20%)  6.7  
255 A (50-67%) 90.4  
256  C (5-15%)  2.4  
257   D (1-5%)  2.5  
258 E (10-20%)  4.7  
259 A (50-67%) 64.3  
260 B (10-20%)  6.6  
261  C (5-15%) 13.3ok
262   D (1-5%)  3.5  
263 E (10-20%) 12.3ok
264 A (50-67%) 65.5  
265 B (10-20%) 13.5ok
266  C (5-15%)  4.6  
267   D (1-5%)  0.9  
268 E (10-20%) 15.4ok
269 A (50-67%) 72.1  
270 B (10-20%)  6.4  
271  C (5-15%) 12.7ok
272   D (1-5%)  1.1  
273 E (10-20%)  7.7  
274 A (50-67%) 71.4  
275 B (10-20%)  0.9  
276  C (5-15%) 21.9  
277 E (10-20%)  5.7  
278 A (50-67%) 53.0  
279 B (10-20%)  3.6  
280  C (5-15%) 36.4  
281 E (10-20%)  7.0  

Can anyone explain why this might occur?

Thanks,
Luca


Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Alternative to extended recode sintax? Bug?

2010-12-19 Thread Luca Meyer
Something goes wrong with the week function of the lubridate package:

> x= as.POSIXct(factor(c("2010-12-15 17:28:27",
+ "2010-12-15 17:32:34",
+ "2010-12-15 18:48:39",
+ "2010-12-15 19:25:00",
+ "2010-12-16 08:00:00",
+ "2010-12-16 08:25:49",
+ "2010-12-16 09:00:00")))
> require(lubridate)
> weekdays(x)
[1] "Mercoledì" "Mercoledì" "Mercoledì" "Mercoledì" "Giovedì"   "Giovedì"   
"Giovedì"  
> week(x)
[1] 50 50 50 50 51 51 51
> 

Please notice Mercoledì=Wednesday and Giovedì=Thursday, why would the beginning 
of the week start on Thursday? Also please beware that on previous weeks this 
does not occur, that is all weeks till 49 will all begin on Mondays and end on 
Sundays as required.

Thanks,
Luca


Il giorno 18/dic/2010, alle ore 14.39, David Winsemius ha scritto:

> 
> On Dec 17, 2010, at 11:08 AM, Luca Meyer wrote:
> 
> x= factor(c("2009-03-30 00:00:00", "2009-04-06 00:00:00", "2009-04-13 
> 00:00:00", "2009-04-20 00:00:00", "2009-04-27 00:00:00", "2009-05-04 
> 00:00:00" ,"2009-05-11 00:00:00", "2009-05-18 00:00:00"))
> require(lubridate)
> xd=as.POSIXct(x)
> week(xd)
> # [1] 13 14 15 16 17 18 19 20
> year(xd)
> # [1] 2009 2009 2009 2009 2009 2009 2009 2009
> paste(year(xd), " W",week(xd), sep="")
> #[1] "2009 W13" "2009 W14" "2009 W15" "2009 W16" "2009 W17" "2009 W18" "2009 
> W19" "2009 W20"
> 
> 
> 
> David Winsemius, MD
> West Hartford, CT
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] testing with if: what I am doing wrong?

2010-12-19 Thread Luca Meyer
I am running this small program:

x <- factor(c("A","B","A","C"))
y <- c(1,2,3,4)
w <-data.frame(x,y)
if (w$x=="A"){
w$z=1
}
w
And I obtain:

  x y z
1 A 1 1
2 B 2 1
3 A 3 1
4 C 4 1

And not

  x y z
1 A 1 1
2 B 2 NA
3 A 3 1
4 C 4 NA

Like I should obtain. What am I doing wrong?

Please notice that I get a warning approximately saying - translated from 
italian:

In if (w$x == "A") { : the condition length > 1 and only the first element will 
be used 

Thanks,
Luca
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Alternative to extended recode sintax?

2010-12-18 Thread Luca Meyer
Dear R-users,

I have a factor variable within my data frame which I derive week after week 
from a POSIXct variable using the cut(var,"weeks") command I have found in the 
chron package. The levels() command gives me:

[1] "2009-03-30 00:00:00" "2009-04-06 00:00:00" "2009-04-13 00:00:00" 
"2009-04-20 00:00:00" "2009-04-27 00:00:00" "2009-05-04 00:00:00" "2009-05-11 
00:00:00" "2009-05-18 00:00:00"
 [9] "2009-05-25 00:00:00" "2009-06-01 00:00:00" "2009-06-08 00:00:00" 
"2009-06-15 00:00:00" "2009-06-22 00:00:00" "2009-06-29 00:00:00" "2009-07-06 
00:00:00" "2009-07-13 00:00:00"
[17] "2009-07-20 00:00:00" "2009-07-27 00:00:00" "2009-08-03 00:00:00" 
"2009-08-10 00:00:00" "2009-08-17 00:00:00" "2009-08-24 00:00:00" "2009-08-31 
00:00:00" "2009-09-07 00:00:00"
[25] "2009-09-14 00:00:00" "2009-09-21 00:00:00" "2009-09-28 00:00:00" 
"2009-10-05 00:00:00" "2009-10-12 00:00:00" "2009-10-19 00:00:00" "2009-10-25 
23:00:00" "2009-11-01 23:00:00"
[33] "2009-11-08 23:00:00" "2009-11-15 23:00:00" "2009-11-22 23:00:00" 
"2009-11-29 23:00:00" "2009-12-06 23:00:00" "2009-12-13 23:00:00" "2009-12-20 
23:00:00" "2009-12-27 23:00:00"
[41] "2010-01-03 23:00:00" "2010-01-10 23:00:00" "2010-01-17 23:00:00" 
"2010-01-24 23:00:00" "2010-01-31 23:00:00" "2010-02-07 23:00:00" "2010-02-14 
23:00:00" "2010-02-21 23:00:00"
[49] "2010-02-28 23:00:00" "2010-03-07 23:00:00" "2010-03-14 23:00:00" 
"2010-03-21 23:00:00" "2010-03-29 00:00:00" "2010-04-05 00:00:00" "2010-04-12 
00:00:00" "2010-04-19 00:00:00"
[57] "2010-04-26 00:00:00" "2010-05-03 00:00:00" "2010-05-10 00:00:00" 
"2010-05-17 00:00:00" "2010-05-24 00:00:00" "2010-05-31 00:00:00" "2010-06-07 
00:00:00" "2010-06-14 00:00:00"
[65] "2010-06-21 00:00:00" "2010-06-28 00:00:00" "2010-07-05 00:00:00" 
"2010-07-12 00:00:00" "2010-07-19 00:00:00" "2010-07-26 00:00:00" "2010-08-02 
00:00:00" "2010-08-09 00:00:00"
[73] "2010-08-16 00:00:00" "2010-08-23 00:00:00" "2010-08-30 00:00:00" 
"2010-09-06 00:00:00" "2010-09-13 00:00:00" "2010-09-20 00:00:00" "2010-09-27 
00:00:00" "2010-10-04 00:00:00"
[81] "2010-10-11 00:00:00" "2010-10-18 00:00:00" "2010-10-25 00:00:00" 
"2010-10-31 23:00:00" "2010-11-07 23:00:00" "2010-11-14 23:00:00" "2010-11-21 
23:00:00" "2010-11-28 23:00:00"
[89] "2010-12-05 23:00:00" "2010-12-12 23:00:00"

Now what I would like is to have more readable labels, such as 2010-W01 for the 
first week of 2010, 2009-W34 for the 34th week in 2009, etcis there an 
easier way to achieve that than having to write out the all recode sintax:

library(car)
dataset$newvar <- recode(dataset$oldvar, "
c('2009-03-30 00:00:00')='2009-W13';
c('2009-04-06 00:00:00')='2009-W14';
# etc...
c('2010-12-05 23:00:00')='2009-W48';
c('2010-12-12 23:00:00')='2009-W49';
# etc...this part should be updated with time unless I'll find some automatic 
procedure 
")

Thanks,
Luca


Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.1 (2010-12-16)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to use diff() with different variables?

2010-12-09 Thread Luca Meyer
Hi,

I first should say I am new to R. I have searched without success the 
R-archives to see if I could find an answer to what I am about to ask you.

My dataset is like:

  xfine
1 A 2010-12-09 07:57:33
2 B 2010-12-09 08:05:00
3 C 2010-12-08 20:42:00
...

that is:

'data.frame':   3 obs. of  2 variables:
 $ x   : Factor w/ 3 levels "A","B","C": 1 2 3
 $ fine: POSIXct, format: "2010-12-09 07:57:33" "2010-12-09 08:05:00" 
"2010-12-08 20:42:00"

What I am trying to do is to build another variable fine1 that should contain 
the lagged value for "fine", that is:

  xfine   fine1
1 A 2010-12-09 07:57:33 NA
2 B 2010-12-09 08:05:00 2010-12-09 07:57:33
3 C 2010-12-08 20:42:00 2010-12-09 08:05:00

How can I do that? 

Thanks,
Luca

Luca Meyer
www.lucameyer.com
IBM SPSS Statistics release 19.0.0
R version 2.12.0 (2010-10-15)
Mac OS X 10.6.5 (10H574) - kernel Darwin 10.5.0
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.