Re: [R] FW: R Statistics

2014-12-03 Thread Chel Hee Lee

Or, you may use this approach:

> attach(achtergrond)
> spits <- ifelse(uurenminuut >= 5.30 & uurenminuut < 9.30, "morning",
+ ifelse(uurenminuut >=16.30 & uurenminuut < 19.0, "evening",
+ "between"))
> table(spits)
spits
between evening morning
   1636 142 579
>

I personally like the approach presented by Bill Dunlap (in the previous 
message).  I think his approach is smart and nice.  You will see the 
same results as shown in the above:


> 
achtergrond$spits=cut(achtergrond$uurenminuut,c(-1.0,5.30,9.30,16.30,19.0,24.0),right=FALSE)
> levels(achtergrond$spits) <- 
c("between","morning","between","evening","between")

> table(achtergrond$spits)

between morning evening
   1636 579 142
>

You can also use function 'findInterval()' instead of using 'cut()'.  I 
hope this helps.


Chel Hee Lee


On 12/02/2014 02:04 PM, William Dunlap wrote:

You can do this in 2 steps - have cut() make a factor with a different
level for each time period
then use levels<-() to merge some of the levels.
> z <- cut(.5:3.5, breaks=c(0,1,2,3,4), labels=c("0-1", "1-2", "2-3",
"3-4"))
> levels(z)
[1] "0-1" "1-2" "2-3" "3-4"
> levels(z) <- c("between", "1-2", "between", "3-4") # or
levels(z)[c(1,3)] <- "between"
> str(z)
Factor w/ 3 levels "between","1-2",..: 1 2 1 3




Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Dec 2, 2014 at 11:32 AM, Dries David 
wrote:




Hey
I have a question about making a new variable in R. I have put my

dataset in attachment. I have to make a new variable "spits" where
spits=morning when uurenminuut (also a variabel) is between 5.30 and 9.30,
when uurenminuut is between 16.30 and 19.0 spits has to be equal to
evening. But here is my problem: for all the values not between 5.30- 9.30
and 16.30-19.0 spits must be equal to "between"


achtergrond$minuutdec=achtergrond$minuut/100
achtergrond$uurenminuut=achtergrond$uur+achtergrond$minuutdec


achtergrond$spits=cut(uurenminuut,c(-1.0,5.30,9.30,16.30,19.0,24.0),labels=c("between","morning","between","evening","between"),right=FALSE)


When I do this i get a warning message, because I use between more

than once as label. Between has to be one label that covers all values that
are not in morning and evening.


Could you help me with this?

Kind regards

Dries David






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Arimax Function

2014-12-03 Thread Paul Bernal
Hello Mark,

Thank you for your timely reply. All I want to know is if the xreg argument
must contain the same amount of historical observations as the dependent
variable and if newxreg  has to have a forecast of the exogenous variable
equal to the number of periods to be forecasted.

Best regards and thanks again,

Paul

2014-12-03 22:30 GMT-06:00 Mark Leeds :

> hi paul: I don't have time to answer all of your questions but below
> should help.
> there'sa  better one that actually how arima is better for forecasting
> than gls
> but I can't find it right now. if I find it, I'll send it.
>
>
> http://stats.stackexchange.com/questions/6469/simple-linear-model-with-autocorrelated-errors-in-r
>
>
> On Wed, Dec 3, 2014 at 11:12 PM, Paul Bernal 
> wrote:
>
>> Hello everyone,
>>
>> I am just trying to understand how the Arimax function works, so my
>> questions are:
>>
>> 1. If I have a univariate time series of sales and sales are dependent
>> upon, say inflation rates, then my xreg would be inflation rates right?
>>
>> 2. Now is I have historial data on sales (from january 2000 to december
>> 2010) then my xreg argument would be a historical time series of inflation
>> rates on the same time frame? (from january 2000 to december 2010?)
>>
>> 3. If I want to predit, say 5 years of monthly data (60 periods) using the
>> exogenous variable (in this particular case inflation rates) then that
>> means I would have to have a a 60 period forecast of inflation rates and
>> that would be my newxreg argument?
>>
>> Any guidance will be greatly appreciated,
>>
>> Best regards,
>>
>> Paul
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R installation

2014-12-03 Thread Chel Hee Lee
This question seems to be the problem specific to Ubuntu.  What if you 
post the message to ??  I hope you get 
answers from that mailing list.


Chel Hee Lee

On 12/02/2014 11:10 AM, VG wrote:

Hi everyone,

I was having trouble with R i installed some time ago on my local ubuntu
machine. So i removed R completely from my system in order to re install
it. I used these commands to install R

sudo apt-get install r-base r-base-dev

Then on the terminal I typed which R:
it returns
/usr/bin/R

When i launch R on the terminal by typing R it gives me this:

*/usr/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or directory*

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: i486-pc-linux-gnu (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

   Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


*Warning: namespace ‘DESeq’ is not available and has been replacedby
.GlobalEnv when processing object ‘data2*

To fix "*/usr/bin/R: line 236: /usr/lib/R/etc/ldpaths: No such file or
directory*"

I went to
*/usr/lib/R/etc/ and did *

*file ldpaths and it gave me*

ldpaths: broken symbolic link to `/etc/R/ldpaths'

How to fix this??
Also I need to fix warning

Regards
Varun

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R Arimax Function

2014-12-03 Thread Paul Bernal
Hello everyone,

I am just trying to understand how the Arimax function works, so my
questions are:

1. If I have a univariate time series of sales and sales are dependent
upon, say inflation rates, then my xreg would be inflation rates right?

2. Now is I have historial data on sales (from january 2000 to december
2010) then my xreg argument would be a historical time series of inflation
rates on the same time frame? (from january 2000 to december 2010?)

3. If I want to predit, say 5 years of monthly data (60 periods) using the
exogenous variable (in this particular case inflation rates) then that
means I would have to have a a 60 period forecast of inflation rates and
that would be my newxreg argument?

Any guidance will be greatly appreciated,

Best regards,

Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-12-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)
Hello Chel and David,

Thank you very much for providing new insights into this issue.  Here is one 
more question.  Why  does the mutate () give incorrect results here? 

# The following gives INCORRECT results - mutated()ed object
na.date.cases = ifelse(!is.na(oiddate),1,0)

# The following gives CORRECT results
new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)

###  reproducible example - slightly 
revised/modified  ###
library(dplyr)
# data object - description 

temp <- "id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA 
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04"

# read the data object

example.data <- read.table(textConnection(temp), 
colClasses=c("character", "Date", "Date", "Date", "Date"),  
header=TRUE, as.is=TRUE
)


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 <- example.data %>% 
 rowwise() %>%
  mutate(oiddate=as.Date(max(mrjdate,cocdate, inhdate, haldate, 
na.rm=TRUE), origin='1970-01-01'),
 na.date.cases = ifelse(!is.na(oiddate),1,0)
 )

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 <- example.data
new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')] {
max_d <- NA
  } else {
max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

new2$na.date.cases = ifelse(!is.na(new2$oiddate),1,0)


identical(new1, new2) 

table(new1$oiddate)
table(new2$oiddate)

# print records

print (new1); print(new2)

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: Chel Hee Lee [mailto:chl...@mail.usask.ca] 
Sent: Wednesday, December 03, 2014 8:48 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ); r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

The output in the object 'new1' are apparently same the output in the object 
'new2'.  Are you trying to compare the entries of two outputs 'new1' and 
'new2'?  If so, the function 'all()' would be useful:

 > all(new1 == new2, na.rm=TRUE)
[1] TRUE

If you are interested in the comparison of two objects in terms of class, then 
the function 'identical()' is useful:

 > attributes(new1)
$names
[1] "id"  "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"

$class
[1] "rowwise_df" "tbl_df" "tbl""data.frame"

$row.names
[1] 1 2 3 4 5 6 7

 > attributes(new2)
$names
[1] "id"  "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] "data.frame"

I hope this helps.

Chel Hee Lee

On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:
> Hello,
>
> Two alternative approaches - mutate() vs. sapply() - were used to get the 
> desired results (i.e., creating a new column of the most recent date  from 4 
> dates ) with help from Arun and Mark on this forum.  I now find that the two 
> data objects (created using two different approaches) are not identical 
> although results are exactly the same.
>
> identical(new1, new2)
> [1] FALSE
>
> Please see the reproducible example below.
>
> I don't understand why the code returns FALSE here.  Any hints/comments  will 
> be  appreciated.
>
> Thanks,
>
> Pradip
>
> #  reproducible example 
> 
> library(dplyr)
> # data object - description
>
> temp <- "id  mrjdate cocdate inhdate haldate
> 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2 NA NA NA NA
> 3 2009-10-24 NA 2011-10-13 NA
> 4 2007-10-10 NA NA NA
> 5 2006-09-01 2005-08-10 NA NA
> 6 2007-09-04 2011-10-05 NA NA
> 7 2005-10-25 NA NA 2011-11-04"
>
> # read the data object
>
> example.data <- read.table(textConnection(temp),
>  colClasses=c("character", "Date", "Date", "Date", 
> "Date"),
>  header=TRUE, as.is=TRUE
>  )
>
>
> # create a new column -dplyr solution (Acknowledgement: Arun)
>
> new1 <- example.data %>%
>   rowwise() %>%
>mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
> 
> na.rm=TRUE), origin='1970-01-01'))
>
> # create a new column - Base R solution (Acknowlegement: Mark Sharp)

Re: [R] coerce data to numeric

2014-12-03 Thread Chel Hee Lee

In your function 'nbars()', I see the line:

  tX = rbind(tX, as.data.frame(cbind(GId = " ",Grp = names(sG[n]),
 S = fm, T = fm)))

It seems that you wish to have a data frame that has numeric variables 
'S' and 'T'.  The reason why you have character variables of 'S' and 'T' 
from your code is that you used a character vector when function 
'cbind()' is used.   Please see the following example:


> cbind(1:3, 4:6)
 [,1] [,2]
[1,]14
[2,]25
[3,]36
> cbind(1:3, LETTERS[1:3])
 [,1] [,2]
[1,] "1"  "A"
[2,] "2"  "B"
[3,] "3"  "C"
>

What do you see?  I see that numeric values are changed to characters. 
Hence, I guess that you will have the output that you want if you change 
your code as below:


  tX = rbind(tX, as.data.frame(cbind(GId =0,Grp = 0,
 S = fm, T = fm)))

Of course you have to do little more works with this change in order to 
get final bar plots.  I hope this helps.


Chel Hee Lee

On 12/03/2014 12:29 PM, Charles R Parker wrote:

I am trying to create groups of barplots from data that have different number 
of records in the groups, in such a way that all of the plots will have the 
same numbers and sizes of bars represented even when some of the groups will 
have some bars of zero height. The goal then would be to display multiple plots 
on a single page using split.screen or something similar. lattice does not seem 
suitable because of the data structure it operates on. A simple data structure 
that I operate on is given here:


dput(stplot)

structure(list(GId = structure(1:11, .Label = c("A1", "B1", "B2",
"B3", "B4", "B5", "C1", "C2", "D1", "D2", "D3"), class = "factor"),
 Grp = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 4L, 4L,
 4L), .Label = c("A", "B", "C", "D"), class = "factor"), S = c(12.3,
 23.8, 0, 7.6, 14.32, 1.9, 5.1, 0, 14.6, 10.1, 8.7), T = c(5L,
 12L, 2L, 1L, 4L, 1L, 1L, 9L, 5L, 6L, 3L)), .Names = c("GId",
"Grp", "S", "T"), class = "data.frame", row.names = c(NA, -11L
))


My code, which doesn't quite work is:


nbars <-

function(x){
   sG = summary(x$Grp)
   mG = max(sG)
   for(n in 1:length(sG)){
 tX = subset(x,x$Grp==names(sG[n]))
 if(nrow(tX) < mG){
   fm = as.numeric(rep(length = mG - nrow(tX), 0))
   tX = rbind(tX, as.data.frame(cbind(GId = " ",Grp = names(sG[n]),
  S = fm, T = fm)))
 }
#print(tX)
#dput(t(as.matrix(tX[,3:4])))
 barplot(t(as.matrix(tX[,3:4])),beside=TRUE, names.arg=tX$GId,
   col = c("navy","gray"))
   }
}


The function nbars first gets the list of group values with their counts 
'summary(x$Grp)'.
It then determines the maximum number of bar pairs in the largest of the groups 
'max(sG)', and uses this to determine how much each smaller group needs to be 
padded to fill out the proper number of bars in the ultimate barplots, using 
the for loop. If you uncomment the #print(tX) you can see that this 
works...sort of. The problem becomes apparent if you uncomment the #dput. This 
shows that the tX treats the S and T values as characters rather than as 
numeric values. This prevents the barplots from working. By changing the for 
loop to begin 'for(n in 2:length(sG)' the second plot will display correctly, 
but the third plot will fail.

I have tried various options to force the S and T variables to be numeric, but 
none of those have worked (as.numeric(fm), as.matrix(fm), as.vector(fm)) in the 
'if(nrow(tX) < mG)' loop, but these have not worked.

If there is a sure-fire way to solve the problem I would be grateful.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combining unequal dataframes based on a common grouping factor

2014-12-03 Thread Chel Hee Lee

> frame1
  ID GROUP PROP_AREA
1  1 A  0.33
2  2 A  0.33
3  3 A  0.33
4  4 B  0.50
5  5 B  0.50
6  6 C  1.00
7  7 D  1.00
> frame2
  GROUP VALUE1 VALUE2
1 A 10  5
2 B 20 10
3 C 30 15
4 D 40 20
>
> obj1 <- merge(x=frame1, y=frame2, by="GROUP")
> obj1$rval1 <- obj1$PROP_AREA * obj1$VALUE1
> obj1$rval2 <- obj1$PROP_AREA * obj1$VALUE2
> obj1
  GROUP ID PROP_AREA VALUE1 VALUE2 rval1 rval2
1 A  1  0.33 10  5   3.3  1.65
2 A  2  0.33 10  5   3.3  1.65
3 A  3  0.33 10  5   3.3  1.65
4 B  4  0.50 20 10  10.0  5.00
5 B  5  0.50 20 10  10.0  5.00
6 C  6  1.00 30 15  30.0 15.00
7 D  7  1.00 40 20  40.0 20.00
>
> idx <- match(x=frame1$GROUP, table=frame2$GROUP)
> rval1 <- frame1["PROP_AREA"] * frame2[idx, "VALUE1"]
> rval2 <- frame1["PROP_AREA"] * frame2[idx, "VALUE2"]
> cbind("ID"=frame1[idx, "ID"], rval1, rval2)
  ID PROP_AREA PROP_AREA
1  1   3.3  1.65
2  1   3.3  1.65
3  1   3.3  1.65
4  2  10.0  5.00
5  2  10.0  5.00
6  3  30.0 15.00
7  4  40.0 20.00
>

Is this what you are looking for?  I hope this helps.

Chel Hee Lee

On 12/03/2014 03:14 PM, Brock Huntsman wrote:

I apologize if this is a relatively easy problem, but have been stuck on
this issue for a few days. I am attempting to combine values from 2
separate dataframes. Each dataframe contains a shared identifier (GROUP).
Dataframe 1 (3272 rows x 3 columns) further divides this shared grouping
factor into unique identifiers (ID), as well as contains the proportion of
the GROUP area of which the unique identifier consists (PROP_AREA).
Dataframe 2 (291 x 14976) in addition to consisting of the shared
identifier, also has numerous columns consisting of values (VALUE1,
VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each
value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP factor,
constructing a final dataframe of size 3272 x 14976. An example of the data
frames are as follows:


frame1:

ID

GROUP

PROP_AREA

1

A

0.33

2

A

0.33

3

A

0.33

4

B

0.50

5

B

0.50

6

C

1.00

7

D

1.00



frame2:

GROUP

VALUE1

VALUE2

A

10

5

B

20

10

C

30

15

D

40

20



  Desired dataframe

frame3:

ID

VALUE1

VALUE2

1

3.3

1.65

2

3.3

1.65

3

3.3

1.65

4

10

5

5

10

5

6

30

15

7

40

20





I assume I would need to use the %in% function or if statements, but am
unsure how to write the code. I have attempted to construct a for loop with
an if statement, but have not been successful as of yet.


for(i in 1:nrow(frame1)) {

   for(j in 2:ncol(frame2)) {

 if (frame1$GROUP[i] == frame2$GROUP[i]) {

   frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1]

 }

   }

}


Any advice on suggested code or packages to read up on would be much
appreciated.

Brock

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-12-03 Thread Chel Hee Lee
The output in the object 'new1' are apparently same the output in the 
object 'new2'.  Are you trying to compare the entries of two outputs 
'new1' and 'new2'?  If so, the function 'all()' would be useful:


> all(new1 == new2, na.rm=TRUE)
[1] TRUE

If you are interested in the comparison of two objects in terms of 
class, then the function 'identical()' is useful:


> attributes(new1)
$names
[1] "id"  "mrjdate" "cocdate" "inhdate" "haldate" "oldflag"

$class
[1] "rowwise_df" "tbl_df" "tbl""data.frame"

$row.names
[1] 1 2 3 4 5 6 7

> attributes(new2)
$names
[1] "id"  "mrjdate" "cocdate" "inhdate" "haldate" "oiddate"

$row.names
[1] 1 2 3 4 5 6 7

$class
[1] "data.frame"

I hope this helps.

Chel Hee Lee

On 12/03/2014 04:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

Hello,

Two alternative approaches - mutate() vs. sapply() - were used to get the 
desired results (i.e., creating a new column of the most recent date  from 4 
dates ) with help from Arun and Mark on this forum.  I now find that the two 
data objects (created using two different approaches) are not identical 
although results are exactly the same.

identical(new1, new2)
[1] FALSE

Please see the reproducible example below.

I don't understand why the code returns FALSE here.  Any hints/comments  will 
be  appreciated.

Thanks,

Pradip

#  reproducible example 

library(dplyr)
# data object - description

temp <- "id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04"

# read the data object

example.data <- read.table(textConnection(temp),
 colClasses=c("character", "Date", "Date", "Date", "Date"),
 header=TRUE, as.is=TRUE
 )


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 <- example.data %>%
  rowwise() %>%
   mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
na.rm=TRUE), 
origin='1970-01-01'))

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 <- example.data
new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
   if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')] {
 max_d <- NA
   } else {
 max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')]), na.rm = TRUE)
   }
   max_d}),
   origin = "1970-01-01")

identical(new1, new2)

# print records

print (new1); print(new2)

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Sunday, November 09, 2014 6:11 AM
To: 'Mark Sharp'
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your 
help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Mark Sharp [mailto:msh...@txbiomed.org]
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of 
the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() 
function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
   if (all(is.na(unlist(data1[row, -1] {
 max_d <- NA
   } else {
 max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
   }
   max_d}),
   origin = "1970-01-01")

data3
   idmrjdatecocdateinhdatehaldateoidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   
3  3 2009-10-242011-10-132011-10-13
4  4 2007-10-10  2007-10-10
5  5 2006-09-01 2005-08-10   2006-09-01
6  6 2007-09-04 2011-10-05   2011-10-05
7  7 2005-10-25   2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Rese

Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-12-03 Thread David Winsemius

On Dec 3, 2014, at 2:10 PM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hello,
> 
> Two alternative approaches - mutate() vs. sapply() - were used to get the 
> desired results (i.e., creating a new column of the most recent date  from 4 
> dates ) with help from Arun and Mark on this forum.  I now find that the two 
> data objects (created using two different approaches) are not identical 
> although results are exactly the same.  
> 
> identical(new1, new2) 
> [1] FALSE
> 

You should have examined the output from dput() on both objects. I think you 
will find that dplyr is adding new attributes.

Notice the the "mutate()-ed" object now has this class:

class = c("rowwise_df", "tbl_df", "tbl", "data.frame")

Moral: Never rely on the the print representation.

-- 
David.


> Please see the reproducible example below.
> 
> I don't understand why the code returns FALSE here.  Any hints/comments  will 
> be  appreciated.
> 
> Thanks,
> 
> Pradip
> 
> #  reproducible example 
> 
> library(dplyr)
> # data object - description 
> 
> temp <- "id  mrjdate cocdate inhdate haldate
> 1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
> 2 NA NA NA NA 
> 3 2009-10-24 NA 2011-10-13 NA
> 4 2007-10-10 NA NA NA
> 5 2006-09-01 2005-08-10 NA NA
> 6 2007-09-04 2011-10-05 NA NA
> 7 2005-10-25 NA NA 2011-11-04"
> 
> # read the data object
> 
> example.data <- read.table(textConnection(temp), 
>colClasses=c("character", "Date", "Date", "Date", "Date"), 
>  
>header=TRUE, as.is=TRUE
>)
> 
> 
> # create a new column -dplyr solution (Acknowledgement: Arun)
> 
> new1 <- example.data %>% 
> rowwise() %>%
>  mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
>   na.rm=TRUE), 
> origin='1970-01-01'))
> 
> # create a new column - Base R solution (Acknowlegement: Mark Sharp)
> 
> new2 <- example.data
> new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
>  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
> 'haldate')] {
>max_d <- NA
>  } else {
>max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
> 'haldate')]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
> 
> identical(new1, new2) 
> 
> # print records
> 
> print (new1); print(new2)
> 
> Pradip K. Muhuri
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
> 
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
> Sent: Sunday, November 09, 2014 6:11 AM
> To: 'Mark Sharp'
> Cc: r-help@r-project.org
> Subject: Re: [R] Getting the most recent dates in a new column from dates in 
> four columns using the dplyr package (mutate verb)
> 
> Hi Mark,
> 
> Your code has also given me the results I expected.  Thank you so much for 
> your help.
> 
> Regards,
> 
> Pradip
> 
> Pradip K. Muhuri, PhD
> SAMHSA/CBHSQ
> 1 Choke Cherry Road, Room 2-1071
> Rockville, MD 20857
> Tel: 240-276-1070
> Fax: 240-276-1260
> 
> 
> -Original Message-
> From: Mark Sharp [mailto:msh...@txbiomed.org] 
> Sent: Sunday, November 09, 2014 3:01 AM
> To: Muhuri, Pradip (SAMHSA/CBHSQ)
> Cc: r-help@r-project.org
> Subject: Re: [R] Getting the most recent dates in a new column from dates in 
> four columns using the dplyr package (mutate verb)
> 
> Pradip,
> 
> mutate() works on the entire column as a vector so that you find the maximum 
> of the entire data set.
> 
> I am almost certain there is some nice way to handle this, but the sapply() 
> function is a standard approach.
> 
> max() does not want a dataframe thus the use of unlist().
> 
> Using your definition of data1:
> 
> data3 <- data1
> data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
>  if (all(is.na(unlist(data1[row, -1] {
>max_d <- NA
>  } else {
>max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
>  }
>  max_d}),
>  origin = "1970-01-01")
> 
> data3
>  idmrjdatecocdateinhdatehaldateoidflag
> 1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
> 2  2   
> 3  3 2009-10-242011-10-132011-10-13
> 4  4 2007-10-10  2007-10-10
> 5  5 2006-09-01 2005-08-10   2006-09-01
> 6  6 2007-09-04 2011-10-05   2011-10-05
> 7  7 2005-10-25   2011-11-04 2011-11-04
> 
> 
> 
> R. Mark Sharp, Ph.D.
> Director of Primate Records Database
> Southwest National Primate Research Center Texas Biomedical Research 
> Institute P.O. Box 760549 San Antonio, TX 78245-0549
> Telephone: (210)258-9476
> e-mail: msh...@txbiomed.org
> 

Re: [R] RcppArmadillo compilation errors (Scientific Linux 6.5)

2014-12-03 Thread stephen sefick
solved; sorry for the spam.

library(devtools)
install_github("RcppCore/RcppArmadillo")

On Wed, Dec 3, 2014 at 4:51 PM, stephen sefick  wrote:

> I would appreciate any help that you may be able to give. Please let me
> know if any more information is required.
>
> I get a the following error when I try to install RcppArmadillo in a
> session started with R --vanilla using the
> install.packages("RcppArmadillo") command.
>
> make: *** [RcppArmadillo.o] Error 1
> ERROR: compilation failed for package ‘RcppArmadillo’
> * removing
> ‘/home/ssefick/R/x86_64-unknown-linux-gnu-library/3.1/RcppArmadillo’
>
> The downloaded source packages are in
> ‘/tmp/RtmpdYI41j/downloaded_packages’
> Warning message:
> In install.packages("RcppArmadillo") :
>   installation of package ‘RcppArmadillo’ had non-zero exit status
>
>
> OS: Scientific Linux 6.5
>
> R version 3.1.0 Patched (2014-06-15 r65949)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
>  [5] LC_MONETARY=en_US.utf8LC_MESSAGES=en_US.utf8
>  [7] LC_PAPER=en_US.utf8   LC_NAME=C
>  [9] LC_ADDRESS=C  LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
>
> --
> Stephen Sefick
> **
> Auburn University
> Biological Sciences
> 331 Funchess Hall
> Auburn, Alabama
> 36849
> **
> sas0...@auburn.edu
> http://www.auburn.edu/~sas0025
> **
>
> Let's not spend our time and resources thinking about things that are so
> little or so large that all they really do for us is puff us up and make us
> feel like gods.  We are mammals, and have not exhausted the annoying little
> problems of being mammals.
>
> -K. Mullis
>
> "A big computer, a complex algorithm and a long time does not equal
> science."
>
>   -Robert Gentleman
>
>


-- 
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods.  We are mammals, and have not exhausted the annoying little
problems of being mammals.

-K. Mullis

"A big computer, a complex algorithm and a long time does not equal
science."

  -Robert Gentleman

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with

2014-12-03 Thread David Winsemius
You should at least look at the facilities in the 'circular' package. Also the  
Envirometrics Task View: 
http://cran.r-project.org/web/views/Environmetrics.html which mention another 
package that up until today I had not heard of: 
http://cran.r-project.org/web/packages/CircStats/index.html

-- 
David.


On Dec 3, 2014, at 3:40 AM, Dries David wrote:

> Hey
> 
> In my data set i have two variables: month (march or april) and wind 
> direction (N,NE,E,SE,S,SW,W,NW). I have to know if there is a difference in 
> wind direction between these months. What kind of test statistic should i use?
> 
> Kind regards
> 
> Dries David
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RcppArmadillo compilation errors (Scientific Linux 6.5)

2014-12-03 Thread stephen sefick
I would appreciate any help that you may be able to give. Please let me
know if any more information is required.

I get a the following error when I try to install RcppArmadillo in a
session started with R --vanilla using the
install.packages("RcppArmadillo") command.

make: *** [RcppArmadillo.o] Error 1
ERROR: compilation failed for package ‘RcppArmadillo’
* removing
‘/home/ssefick/R/x86_64-unknown-linux-gnu-library/3.1/RcppArmadillo’

The downloaded source packages are in
‘/tmp/RtmpdYI41j/downloaded_packages’
Warning message:
In install.packages("RcppArmadillo") :
  installation of package ‘RcppArmadillo’ had non-zero exit status


OS: Scientific Linux 6.5

R version 3.1.0 Patched (2014-06-15 r65949)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8   LC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


-- 
Stephen Sefick
**
Auburn University
Biological Sciences
331 Funchess Hall
Auburn, Alabama
36849
**
sas0...@auburn.edu
http://www.auburn.edu/~sas0025
**

Let's not spend our time and resources thinking about things that are so
little or so large that all they really do for us is puff us up and make us
feel like gods.  We are mammals, and have not exhausted the annoying little
problems of being mammals.

-K. Mullis

"A big computer, a complex algorithm and a long time does not equal
science."

  -Robert Gentleman

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting the most recent dates in a new column from dates in four columns using the dplyr package (mutate verb)

2014-12-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)
Hello,

Two alternative approaches - mutate() vs. sapply() - were used to get the 
desired results (i.e., creating a new column of the most recent date  from 4 
dates ) with help from Arun and Mark on this forum.  I now find that the two 
data objects (created using two different approaches) are not identical 
although results are exactly the same.  
 
identical(new1, new2) 
[1] FALSE
 
Please see the reproducible example below.

I don't understand why the code returns FALSE here.  Any hints/comments  will 
be  appreciated.

Thanks,

Pradip

#  reproducible example 

library(dplyr)
# data object - description 

temp <- "id  mrjdate cocdate inhdate haldate
1 2004-11-04 2008-07-18 2005-07-07 2007-11-07
2 NA NA NA NA 
3 2009-10-24 NA 2011-10-13 NA
4 2007-10-10 NA NA NA
5 2006-09-01 2005-08-10 NA NA
6 2007-09-04 2011-10-05 NA NA
7 2005-10-25 NA NA 2011-11-04"

# read the data object

example.data <- read.table(textConnection(temp), 
colClasses=c("character", "Date", "Date", "Date", "Date"),  
header=TRUE, as.is=TRUE
)


# create a new column -dplyr solution (Acknowledgement: Arun)

new1 <- example.data %>% 
 rowwise() %>%
  mutate(oldflag=as.Date(max(mrjdate,cocdate, inhdate, haldate,
   na.rm=TRUE), 
origin='1970-01-01'))

# create a new column - Base R solution (Acknowlegement: Mark Sharp)

new2 <- example.data
new2$oiddate <- as.Date(sapply(seq_along(new2$id), function(row) {
  if (all(is.na(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')] {
max_d <- NA
  } else {
max_d <- max(unlist(example.data[row, c('mrjdate','cocdate', 'inhdate', 
'haldate')]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

identical(new1, new2) 

# print records

print (new1); print(new2)

Pradip K. Muhuri
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Muhuri, Pradip (SAMHSA/CBHSQ)
Sent: Sunday, November 09, 2014 6:11 AM
To: 'Mark Sharp'
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Hi Mark,

Your code has also given me the results I expected.  Thank you so much for your 
help.

Regards,

Pradip

Pradip K. Muhuri, PhD
SAMHSA/CBHSQ
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857
Tel: 240-276-1070
Fax: 240-276-1260


-Original Message-
From: Mark Sharp [mailto:msh...@txbiomed.org] 
Sent: Sunday, November 09, 2014 3:01 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: r-help@r-project.org
Subject: Re: [R] Getting the most recent dates in a new column from dates in 
four columns using the dplyr package (mutate verb)

Pradip,

mutate() works on the entire column as a vector so that you find the maximum of 
the entire data set.

I am almost certain there is some nice way to handle this, but the sapply() 
function is a standard approach.

max() does not want a dataframe thus the use of unlist().

Using your definition of data1:

data3 <- data1
data3$oidflag <- as.Date(sapply(seq_along(data3$id), function(row) {
  if (all(is.na(unlist(data1[row, -1] {
max_d <- NA
  } else {
max_d <- max(unlist(data1[row, -1]), na.rm = TRUE)
  }
  max_d}),
  origin = "1970-01-01")

data3
  idmrjdatecocdateinhdatehaldateoidflag
1  1 2004-11-04 2008-07-18 2005-07-07 2007-11-07 2008-07-18
2  2   
3  3 2009-10-242011-10-132011-10-13
4  4 2007-10-10  2007-10-10
5  5 2006-09-01 2005-08-10   2006-09-01
6  6 2007-09-04 2011-10-05   2011-10-05
7  7 2005-10-25   2011-11-04 2011-11-04



R. Mark Sharp, Ph.D.
Director of Primate Records Database
Southwest National Primate Research Center Texas Biomedical Research Institute 
P.O. Box 760549 San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msh...@txbiomed.org





NOTICE:  This E-Mail (including attachments) is confidential and may be legally 
privileged.  It is covered by the Electronic Communications Privacy Act, 18 
U.S.C.2510-2521.  If you are not the intended recipient, you are hereby 
notified that any retention, dissemination, distribution or copying of this 
communication is strictly prohibited.  Please reply to the sender that you have 
received this message in error, then delete it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provid

Re: [R] combining unequal dataframes based on a common grouping factor

2014-12-03 Thread Jeff Newmiller
Posting in HTML format doesn't work nearly as well as you think it does... Your 
email is pretty mixed up. Please use plain text format and use dput to make 
your data usable in R.

I expect the best answer to your problem is going to be to use the merge 
function instead of your for loops.. but the actual data can affect how well 
any solution works so giving us dput output is crucial.
---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On December 3, 2014 1:14:16 PM PST, Brock Huntsman  wrote:
>I apologize if this is a relatively easy problem, but have been stuck
>on
>this issue for a few days. I am attempting to combine values from 2
>separate dataframes. Each dataframe contains a shared identifier
>(GROUP).
>Dataframe 1 (3272 rows x 3 columns) further divides this shared
>grouping
>factor into unique identifiers (ID), as well as contains the proportion
>of
>the GROUP area of which the unique identifier consists (PROP_AREA).
>Dataframe 2 (291 x 14976) in addition to consisting of the shared
>identifier, also has numerous columns consisting of values (VALUE1,
>VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each
>value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP
>factor,
>constructing a final dataframe of size 3272 x 14976. An example of the
>data
>frames are as follows:
>
>
>frame1:
>
>ID
>
>GROUP
>
>PROP_AREA
>
>1
>
>A
>
>0.33
>
>2
>
>A
>
>0.33
>
>3
>
>A
>
>0.33
>
>4
>
>B
>
>0.50
>
>5
>
>B
>
>0.50
>
>6
>
>C
>
>1.00
>
>7
>
>D
>
>1.00
>
>
>
>frame2:
>
>GROUP
>
>VALUE1
>
>VALUE2
>
>A
>
>10
>
>5
>
>B
>
>20
>
>10
>
>C
>
>30
>
>15
>
>D
>
>40
>
>20
>
>
>
> Desired dataframe
>
>frame3:
>
>ID
>
>VALUE1
>
>VALUE2
>
>1
>
>3.3
>
>1.65
>
>2
>
>3.3
>
>1.65
>
>3
>
>3.3
>
>1.65
>
>4
>
>10
>
>5
>
>5
>
>10
>
>5
>
>6
>
>30
>
>15
>
>7
>
>40
>
>20
>
>
>
>
>
>I assume I would need to use the %in% function or if statements, but am
>unsure how to write the code. I have attempted to construct a for loop
>with
>an if statement, but have not been successful as of yet.
>
>
>for(i in 1:nrow(frame1)) {
>
>  for(j in 2:ncol(frame2)) {
>
>if (frame1$GROUP[i] == frame2$GROUP[i]) {
>
>  frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1]
>
>}
>
>  }
>
>}
>
>
>Any advice on suggested code or packages to read up on would be much
>appreciated.
>
>Brock
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining unequal dataframes based on a common grouping factor

2014-12-03 Thread Brock Huntsman
I apologize if this is a relatively easy problem, but have been stuck on
this issue for a few days. I am attempting to combine values from 2
separate dataframes. Each dataframe contains a shared identifier (GROUP).
Dataframe 1 (3272 rows x 3 columns) further divides this shared grouping
factor into unique identifiers (ID), as well as contains the proportion of
the GROUP area of which the unique identifier consists (PROP_AREA).
Dataframe 2 (291 x 14976) in addition to consisting of the shared
identifier, also has numerous columns consisting of values (VALUE1,
VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each
value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP factor,
constructing a final dataframe of size 3272 x 14976. An example of the data
frames are as follows:


frame1:

ID

GROUP

PROP_AREA

1

A

0.33

2

A

0.33

3

A

0.33

4

B

0.50

5

B

0.50

6

C

1.00

7

D

1.00



frame2:

GROUP

VALUE1

VALUE2

A

10

5

B

20

10

C

30

15

D

40

20



 Desired dataframe

frame3:

ID

VALUE1

VALUE2

1

3.3

1.65

2

3.3

1.65

3

3.3

1.65

4

10

5

5

10

5

6

30

15

7

40

20





I assume I would need to use the %in% function or if statements, but am
unsure how to write the code. I have attempted to construct a for loop with
an if statement, but have not been successful as of yet.


for(i in 1:nrow(frame1)) {

  for(j in 2:ncol(frame2)) {

if (frame1$GROUP[i] == frame2$GROUP[i]) {

  frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1]

}

  }

}


Any advice on suggested code or packages to read up on would be much
appreciated.

Brock

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with

2014-12-03 Thread Clint Bowman
I'd also suggest plotting a wind rose for each month (try openair) to 
understand the statistical test results.


Clint BowmanINTERNET:   cl...@ecy.wa.gov
Air Quality Modeler INTERNET:   cl...@math.utah.edu
Department of Ecology   VOICE:  (360) 407-6815
PO Box 47600FAX:(360) 407-7534
Olympia, WA 98504-7600

USPS:   PO Box 47600, Olympia, WA 98504-7600
Parcels:300 Desmond Drive, Lacey, WA 98503-1274

On Wed, 3 Dec 2014, Adams, Jean wrote:


This question is more about statistics than R.  I suggest that you post it
to Cross Validated instead, http://stats.stackexchange.com/.

Jean

On Wed, Dec 3, 2014 at 5:40 AM, Dries David  wrote:


 Hey

In my data set i have two variables: month (march or april) and wind
direction (N,NE,E,SE,S,SW,W,NW). I have to know if there is a difference in
wind direction between these months. What kind of test statistic should i
use?

Kind regards

Dries David

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with

2014-12-03 Thread Adams, Jean
This question is more about statistics than R.  I suggest that you post it
to Cross Validated instead, http://stats.stackexchange.com/.

Jean

On Wed, Dec 3, 2014 at 5:40 AM, Dries David  wrote:

>  Hey
>
> In my data set i have two variables: month (march or april) and wind
> direction (N,NE,E,SE,S,SW,W,NW). I have to know if there is a difference in
> wind direction between these months. What kind of test statistic should i
> use?
>
> Kind regards
>
> Dries David
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] if else for cumulative sum error

2014-12-03 Thread Jefferson Ferreira-Ferreira
Nice, David!!

Worked like a charm!!
Thank you very much.



Em Tue Dec 02 2014 at 19:22:48, David L Carlson 
escreveu:

> Let's try a different approach. You don't need a loop for this. First we
> need a reproducible example:
>
> > set.seed(42)
> > dadosmax <- data.frame(above=runif(150) + .5)
>
> Now compute your sums using cumsum() and diff() and then compute enchday
> using ifelse(). See the manual pages for each of these functions to
> understand how they work:
>
> > sums <- diff(c(0, cumsum(dadosmax$above)), 45)
> > dadosmax$enchday <- c(ifelse(sums >= 45, 1, 0), rep(NA, 44))
>
> > dadosmax$enchday
>   [1]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
> 1  1  1
>  [26]  1  1  1  1  1  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
> 0  0  0
>  [51]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
> 0  0  0
>  [76]  0  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
> 1  1  1
> [101]  1  1  1  1  1  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA
> [126] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA
>
> See the NA's? Those are what David Winsemius is talking about. For the
> 106th value, 106+44 is 150, but for the 107th value 107+144 is 151 which
> does not exist. Fortunately diff() understands that and stops at 106, but
> we have to add 44 NA's because that is the number of rows in your data
> frame.
>
> You might find this plot informative as well:
>
> > plot(sums, typ="l")
> > abline(h=45)
>
> Another way to get there is to use sapply() which will add the NA's for us:
>
> > sums <- sapply(1:150, function(x) sum(dadosmax$above[x:(x+44)]))
> > dadosmax$enchday <- ifelse(sums >= 45, 1, 0)
>
> But it won't be as fast if you have a large data set.
>
> -
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David
> Winsemius
> Sent: Tuesday, December 2, 2014 2:50 PM
> To: Jefferson Ferreira-Ferreira
> Cc: r-help@r-project.org
> Subject: Re: [R] if else for cumulative sum error
>
>
> On Dec 2, 2014, at 12:26 PM, Jefferson Ferreira-Ferreira wrote:
>
> > Thank you for replies.
> >
> > David,
> >
> > I tried your modified form
> >
> > for (i in 1:seq_along(rownames(dadosmax))){
>
>
> No. it is either 1:  or seq_along(...). in this case perhaps
> 1:(nrow(dadosmax)-44 would be safer
>
> You do not seem to have understood that you cannot use an index of i+44
> when i is going to be the entire set of rows of the dataframe. There is "no
> there there" to quote Gertrude Stein's slur against Oakland. In fact there
> is not there there at i+1 when you get to the end. You either need to only
> go to row
>
> >  dadosmax$enchday[i] <- if ( (sum(dadosmax$above[i:(i+44)])) >= 45) 1
> else
> > 0
> > }
> >
> > However, I'm receiving this warning:
> > Warning message:
> > In 1:seq_along(rownames(dadosmax)) :
> >  numerical expression has 2720 elements: only the first used
> >
> > I can't figure out why only the first row was calculated...
>
> You should of course read these, but the error is not from your
> if-statement but rahter you for-loop-indexing.
>
> ?'if'
> ?ifelse
>
>
> > Any ideas?
> >
> >
> >
> > Em Tue Dec 02 2014 at 15:22:25, John McKown <
> john.archie.mck...@gmail.com>
> > escreveu:
> >
> >> On Tue, Dec 2, 2014 at 12:08 PM, Jefferson Ferreira-Ferreira <
> >> jeco...@gmail.com> wrote:
> >>
> >>> Hello everybody;
> >>>
> >>> I'm writing a code where part of it is as follows:
> >>>
> >>> for (i in nrow(dadosmax)){
> >>>  dadosmax$enchday[i] <- if (sum(dadosmax$above[i:(i+44)]) >= 45) 1
> else 0
> >>> }
> >>>
> >>
> >> ​Without some test data for any validation, I would try the following
> >> formula
> >>
> >> dadosmax$enchday[i] <- if
> >> (sum(dadosmax$above[i:(min(i+44,nrow(dadosmax)))] >= 45) 1 else 0​
> >>
> >>
> >>
> >>>
> >>> That is for each row of my data frame, sum an specific column (0 or 1)
> of
> >>> that row plus 44 rows. If It is >=45 than enchday is 1 else 0.
> >>>
> >>> The following error is returned:
> >>>
> >>> Error in if (sum(dadosmax$above[i:(i + 44)]) >= 45) 1 else 0 :
> >>>  missing value where TRUE/FALSE needed
> >>>
> >>> I've tested the ifelse statement assigning different values to i and it
> >>> works. So I'm wondering if this error is due the fact that at the
> final of
> >>> my data frame there aren't 45 rows to sum anymore. I tried to use "try"
> >>> but
> >>> It's simply hide the error.
> >>>
> >>> How can I deal with this? Any ideas?
> >>> Thank you very much.
> >>>
> >>>[[alternative HTML version deleted]]
> >>>
> >>> __
> >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide c

[R] coerce data to numeric

2014-12-03 Thread Charles R Parker
I am trying to create groups of barplots from data that have different number 
of records in the groups, in such a way that all of the plots will have the 
same numbers and sizes of bars represented even when some of the groups will 
have some bars of zero height. The goal then would be to display multiple plots 
on a single page using split.screen or something similar. lattice does not seem 
suitable because of the data structure it operates on. A simple data structure 
that I operate on is given here:

> dput(stplot)
structure(list(GId = structure(1:11, .Label = c("A1", "B1", "B2", 
"B3", "B4", "B5", "C1", "C2", "D1", "D2", "D3"), class = "factor"), 
Grp = structure(c(1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 
4L), .Label = c("A", "B", "C", "D"), class = "factor"), S = c(12.3, 
23.8, 0, 7.6, 14.32, 1.9, 5.1, 0, 14.6, 10.1, 8.7), T = c(5L, 
12L, 2L, 1L, 4L, 1L, 1L, 9L, 5L, 6L, 3L)), .Names = c("GId", 
"Grp", "S", "T"), class = "data.frame", row.names = c(NA, -11L
))


My code, which doesn't quite work is:

> nbars <-
function(x){
  sG = summary(x$Grp)
  mG = max(sG)
  for(n in 1:length(sG)){
tX = subset(x,x$Grp==names(sG[n]))
if(nrow(tX) < mG){
  fm = as.numeric(rep(length = mG - nrow(tX), 0))
  tX = rbind(tX, as.data.frame(cbind(GId = " ",Grp = names(sG[n]), 
 S = fm, T = fm)))
}
#print(tX)
#dput(t(as.matrix(tX[,3:4])))
barplot(t(as.matrix(tX[,3:4])),beside=TRUE, names.arg=tX$GId, 
  col = c("navy","gray"))
  }
}


The function nbars first gets the list of group values with their counts 
'summary(x$Grp)'.
It then determines the maximum number of bar pairs in the largest of the groups 
'max(sG)', and uses this to determine how much each smaller group needs to be 
padded to fill out the proper number of bars in the ultimate barplots, using 
the for loop. If you uncomment the #print(tX) you can see that this 
works...sort of. The problem becomes apparent if you uncomment the #dput. This 
shows that the tX treats the S and T values as characters rather than as 
numeric values. This prevents the barplots from working. By changing the for 
loop to begin 'for(n in 2:length(sG)' the second plot will display correctly, 
but the third plot will fail.

I have tried various options to force the S and T variables to be numeric, but 
none of those have worked (as.numeric(fm), as.matrix(fm), as.vector(fm)) in the 
'if(nrow(tX) < mG)' loop, but these have not worked.

If there is a sure-fire way to solve the problem I would be grateful.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ASA John M. Chambers Statistical Software Award - 2015

2014-12-03 Thread Munjal, Aarti

John M. Chambers Statistical Software Award - 2015
Statistical Computing Section
American Statistical Association

The Statistical Computing Section of the American Statistical Association 
announces the competition for the John M. Chambers Statistical Software Award. 
In 1998 the Association for Computing Machinery presented its Software System 
Award to John Chambers for the design and development of S. Dr. Chambers 
generously donated his award to the Statistical Computing Section to endow an 
annual prize for statistical software written by, or in collaboration with, an 
undergraduate or graduate student. The prize carries with it a cash award of 
$1000, plus a substantial allowance for travel to the annual Joint Statistical 
Meetings (JSM) where the award will be presented.

Teams of up to 3 people can participate in the competition, with the cash award 
being split among team members. The travel allowance will be given to just one 
individual in the team, who will be presented the award at JSM. To be eligible, 
the team must have designed and implemented a piece of statistical software. 
The individual within the team indicated to receive the travel allowance must 
have begun the development while a student, and must either currently be a 
student, or have completed all requirements for her/his last degree after 
January 1, 2014. To apply for the award, teams must provide the following 
materials:

Current CV's of all team members.

A letter from a faculty mentor at the academic institution of the individual 
indicated to receive the travel award. The letter should confirm that the 
individual had substantial participation in the development of the software, 
certify her/his student status when the software began to be developed (and 
either the current student status or the date of degree completion), and 
briefly discuss the importance of the software to statistical practice.

A brief, one to two page description of the software, summarizing what it does, 
how it does it, and why it is an important contribution. If the team member 
competing for the travel allowance has continued developing the software after 
finishing her/his studies, the description should indicate what was developed 
when the individual was a student and what has been added since.

An installable software package with its source code for use by the award 
committee. It should be accompanied by enough information to allow the judges 
to effectively use and evaluate the software (including its design 
considerations.) This information can be provided in a variety of ways, 
including but not limited to a user manual (paper or electronic), a paper, a 
URL, and online help to the system.

All materials must be in English. We prefer that electronic text be submitted 
in Postscript or PDF. The entries will be judged on a variety of dimensions, 
including the importance and relevance for statistical practice of the tasks 
performed by the software, ease of use, clarity of description, elegance and 
availability for use by the statistical community. Preference will be given to 
those entries that are grounded in software design rather than calculation. The 
decision of the award committee is final.

All application materials must be received by 5:00pm EST, Tuesday, February 17, 
2015 at the address below. The winner will be announced in May and the award 
will be given at the 2015 Joint Statistical Meetings.

Student Paper Competition
c/o Aarti Munjal
Colorado School of Public Health
University of Colorado Denver
aarti.mun...@ucdenver.edu


--
Aarti Munjal, PhD
Assistant Research Professor
Department of Biostatistics and Informatics
Colorado School of Public Health
University of Colorado, Denver
Phone: 303-724-6273
aarti.mun...@ucdenver.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with

2014-12-03 Thread Dries David
 Hey
 
In my data set i have two variables: month (march or april) and wind direction 
(N,NE,E,SE,S,SW,W,NW). I have to know if there is a difference in wind 
direction between these months. What kind of test statistic should i use?
 
Kind regards
 
Dries David
  maand   dag uur minuut  weekdag achtergrondgeluid   windsnelheid
windrichtingwindhoeken
3   10  10  0   2   51.63114548 5.5 NE  28
3   10  10  10  2   50.66382217 5   NE  24
3   10  10  20  2   48.70236206 5.5 NE  24
3   10  8   10  2   50.96879959 6.1 NE  30
3   10  8   20  2   50.84553528 4.9 NE  30
3   10  8   30  2   49.04131699 4.7 NE  32
3   10  8   40  2   49.67300797 4.2 NE  36
3   10  8   50  2   48.51584625 4.1 NE  35
3   10  9   0   2   48.72542191 4.5 NE  35
3   10  9   10  2   47.47378159 4.6 NE  35
3   10  9   20  2   47.85709763 4.8 NE  29
3   10  9   30  2   52.63157272 4.6 NE  23
3   10  9   40  2   58.69786072 4.1 NE  23
3   10  9   50  2   54.00802231 4.6 NE  29
3   11  19  10  3   52.43230438 5.5 N   350
3   11  20  10  3   49.44572449 5.3 N   349
3   11  20  30  3   52.74159622 5.4 N   349
3   11  21  0   3   52.62139511 5.4 N   349
3   11  21  10  3   51.76690674 5.3 N   349
3   11  21  20  3   52.21736145 4.9 N   349
3   11  21  30  3   50.95881653 4.6 N   351
3   11  21  40  3   50.77206421 3.9 N   352
3   11  21  50  3   53.34156799 3.9 N   352
3   11  22  0   3   52.31845474 3.8 N   352
3   11  22  10  3   52.77373123 3.6 N   352
3   11  22  20  3   53.37749863 3.5 N   353
3   11  22  30  3   52.80809402 4   N   355
3   11  22  40  3   52.55833435 3.8 N   355
3   11  22  50  3   53.19112396 4.5 N   350
3   11  23  0   3   51.75883865 4.4 N   344
3   11  23  10  3   52.25294876 4.8 NW  336
3   11  23  20  3   51.43741226 4.7 NW  337
3   11  23  30  3   53.41985321 4.5 NW  337
3   11  23  50  4   51.39404678 4.3 NW  337
3   12  0   0   4   48.45637512 5   NW  337
3   12  0   10  4   48.88262177 5.1 NW  337
3   12  0   20  4   50.92547226 5.2 NW  336
3   12  0   30  4   49.4605484  5.1 NW  336
3   12  0   40  4   51.17975235 4.9 NW  336
3   12  0   50  4   51.35684586 4.5 NW  336
3   12  1   0   4   50.19583511 4.9 NW  336
3   12  10  0   4   53.33773804 3.2 SW  246
3   12  10  10  4   51.28180313 3.8 SW  244
3   12  10  20  4   51.6101265  3.5 SW  232
3   12  1   10  4   52.38395691 4.6 NW  334
3   12  1   20  4   51.5379715  4.5 NW  334
3   12  1   30  4   49.4364357  4.6 NW  334
3   12  1   40  4   50.96374512 4.9 NW  334
3   12  1   50  4   49.01436615 5   NW  334
3   12  16  20  4   53.15527725 4.4 SW  240
3   12  16  40  4   51.52048492 4.6 SW  242
3   12  16  50  4   52.98311996 4.7 SW  242
3   12  17  0   4   56.32850647 4.5 SW  244
3   12  17  10  4   56.34173965 4.6 SW  241
3   12  17  30  4   56.0198555  4.3 SW  240
3   12  17  40  4   54.85006332 4.3 SW  240
3   12  17  50  4   55.80421066 4.3 SW  245
3   12  18  0   4   53.33050156 4.6 SW  235
3   12  18  10  4   52.2504425  4.5 SW  231
3   12  18  20  4   53.56524658 4.1 SW  238
3   12  18  30  4   5

[R] Question on LIMMA analysis with covariates and some missing data

2014-12-03 Thread Bertrand

Hello,

I have a dataset of asthma patients for which white blood cells gene 
expression was measured with one-color Affymetrix microarrays (N~500, 
asthma is a factor with 4 levels: control, moderate, severe, severe & 
smokers).


I also have an extensive clinical dataset related, but with many missing 
values (for example, our controls don't have asthma exacerbations counts).


Our goal is to find DEGs between asthma groups, but we suspect that some 
of those clinical variables have an influence on gene expression, so we 
want to treat those as covariates in the model.


Now the question: can LIMMA handle missing data in the covariates and 
produce accurately corrected p-values for the genes ?


The model matrix is constructed like so (example with age and sex as 
covariates):


# Microarray data is in 'data' variable
asthma<-factor("Control", "Moderate", "Severe", "SevereSmokers")
design<-model.matrix(~0 + asthma + age + sex)
contrast.matrix<-makeContrasts(Control-Moderate, Control-Severe, 
Control-SevereSmokers, levels=design)

fit<-lmFit(data, design)
fit2<-contrasts.fit(fit, contrast.matrix)
fit2<-eBayes(fit2)

Many thanks,

Bertrand
--
EISBM logo  *Bertrand De Meulder
Researcher *
European Institute for Systems Biology and Medicine
Campus Charles Mérieux - Université de Lyon
CNRS - UCBL - ENS
*E-mail:*bdemeul...@eisbm.org 

*Office:* +33(0)4 37 28 74 41

*Office*
Université Claude Bernard
3^e étage plot 2
50 Avenue Tony Garnier
69366 Lyon cedex 07
France  *Laboratory*
LyonBioPôle - Centre d'Infectiologie
2^e étage Bât. Domilyon
321 Avenue Jean Jaurès
69007 Lyon
France

Follow us : EISBM eisbm.org  | Facebook Facebook 
 | Twitter Twitter 



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract AICc from model in glmulti object

2014-12-03 Thread Paul Tanger
> Hi,
>
> Is there an easy way to extract the AICc from a model within a glmulti
> object?  I see the AIC, but not AICc.  For example:
>
> data(mtcars)
> cardata = mtcars
> library(glmulti)
> # create models
> global = glm(mpg ~ ., data=mtcars)
> models = glmulti(global, level=1, crit="aicc", confsetsize=50, plotty=F)
> # the AICc are here
> tableofdata = weightable(models)
> # but can I get it for a specific model here?
> # Because I also want to get other data in a loop from these objects, such
> as coefficients..
> summary(models@objects[[1]])
>
> Should this post be in a SIG list? I couldn't figure out which one..
>
> Thanks!
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using win task to run Rscript somepackages can't work in script mode

2014-12-03 Thread Fábio Magalhães
Hi,

You didn't provide enough details (like the error message), but you
could start by calling Rscript with --vanilla option so that it
doesn't read Rprofile, nor environment files and see what happens.

#! Fábio


On Mon, Dec 1, 2014 at 12:09 AM, PO SU  wrote:
>
> Dear expeRts,
>  These days i want to run a .R file contained a package called gmailr to 
> do something about processing gmail , it works well in
> interactive
>  GUI  ,that is ,Rstudio. But i want to run it by a windows task, so i need to 
> run the .R file in Rscript mode, sadly, there seems different between the two 
> ways, the gmailr package fails work in the script mode.
> So, what the difference? why Rscript can't run some codes which works 
> well in Rstudio? How can i do something to make it suceess if there is some 
> options to set ?
>
>
>
>
>
> --
>
> PO SU
> mail: desolato...@163.com
> Majored in Statistics from SJTU
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Substitute initial guesses of parameters in a function

2014-12-03 Thread philippe massicotte
Thank you!


> Date: Wed, 3 Dec 2014 08:02:17 -0500
> From: murdoch.dun...@gmail.com
> To: pmassico...@hotmail.com; r-help@r-project.org
> Subject: Re: [R] Substitute initial guesses of parameters in a function
> 
> On 03/12/2014 7:37 AM, philippe massicotte wrote:
> > Hi everyone, I have a formula like this:
> >
> > f <- as.formula(y ~ p0a * exp(-0.5 * ((x - p1a)/p2a)^2))
> >
> > I would like to "dynamically" provide starting values for p0a, p1a, p2a. Is 
> > there a way to do it?
> 
> Just give a named vector of starting values.
> >
> > #Params estimates
> > p <- c(12, 10, 1)
> 
> Should be p <- c(p0a = 12, p1a = 10, p2a = 1)
> >
> > # This is where I have difficulties
> > mystart <- substitute(...)
> >
> > nls(formula = f, start = mystart)
> 
> Now start = p will work.  No need to mess with substitute.  (And no need 
> to use as.formula on the very first line; that's already a formula.)
> 
> Duncan Murdoch
> >
> > Regards,
> > Philippe
> > 
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to have NLME return when convergence not reached

2014-12-03 Thread Ramiro Barrantes
Ok.  I am trying to figure out if it's a bug in my code or in nlme, will let 
you know and send you a reproducible example to you, Martin, if the latter, or 
apologize profusely for blaming nlme :) if the former.

Thanks again for everyone's input.


From: Martin Maechler [maech...@stat.math.ethz.ch]
Sent: Wednesday, December 03, 2014 3:26 AM
To: Bert Gunter
Cc: Ramiro Barrantes; r-help@r-project.org
Subject: Re: [R] How to have NLME return when convergence not reached

> Bert Gunter 
> on Tue, 2 Dec 2014 14:03:44 -0800 writes:

> Yes, Bill almost always has helpful ideas.
> Just a comment: If indeed the process is gobbling up too much memory,
> that might indicate a problem with your function or implementation. I
> defer to real experts on this, however.

> Cheers,
> Bert

> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374


Yes, thank you Bert, it could be entirely Ramiro's function, but
we don't know as you have not given any indication about what
the non-linear function is in your nlme() formula.

The little you wrote _seems_ to indicate a bug in nlme, and if
that was the case, the bug should be reported and fixed,
and we (R core - the maintainer of the recommended package nlme)
would really like to to get a reproducible example.

If OTOH, it is your nonlinear function that "goes haywire",
the implicit blame on nlme would not be warranted.

Martin Maechler,
ETH Zurich and R Core team


> On Tue, Dec 2, 2014 at 1:59 PM, Ramiro Barrantes
>  wrote:
>> Thanks so much for your reply.  I am using try but nlme never returns!!  
and I think the process is getting killed by the system as it is taking over 
all the memory.  However, I do like William Dunlap's idea of using 
R.utils::withTimeout to limit the time.
>>
>> Thanks again for your help!
>> 
>> From: Bert Gunter [gunter.ber...@gene.com]
>> Sent: Tuesday, December 02, 2014 4:30 PM
>> To: Ramiro Barrantes
>> Cc: r-help@r-project.org
>> Subject: Re: [R] How to have NLME return when convergence not reached
>>
>> ?try
>> Or
>> ?tryCatch
>>
>> Bert
>>
>> Sent from my iPhone -- please excuse typos.
>>
>>> On Dec 2, 2014, at 12:57 PM, Ramiro Barrantes 
 wrote:
>>>
>>> Hello,
>>>
>>> I am trying to fit many hundreds of simulated datasets using NLME (it's 
all in a big loop in R).  Some don't seem to converge.  I am working on 
addressing the issues by perhaps adjusting my simulation, or tweaking iteration 
steps in nlme, etc.  However, when it doesn't converge, NLME just hangs, and my 
program either stalls for hours/days or takes over the computer memory and 
everything crashes eventually.  Is there a way to tell nlme to stop when it 
doesn't seem to be converging somehow?   I have been looking at the parameters 
in nlmeControl() but see nothing obvious.
>>>
>>> Thanks in advance,
>>>
>>> Ramiro
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] non-finite initial optimization function; was "help"

2014-12-03 Thread Prof J C Nash (U30A)
If you want this resolved, you are going to have to provide the full 
function in a reproducible example. Nearly a half-century with this type 
of problem suggests a probability of nearly 1 that nlogL will be poorly 
set up.


JN

On 14-12-03 06:00 AM, r-help-requ...@r-project.org wrote:

Message: 14
Date: Tue, 2 Dec 2014 12:38:03 -0300
From: Alejandra Chovar Vera
To:r-help@r-project.org
Subject: [R] help
Message-ID:

Content-Type: text/plain; charset="UTF-8"

Dear R

I have a big problem in my estimation process,  I try to estimate my
likelihood function with the option "optim", but R give me this message
"Error en optim(par = valores$par, nlogL, method = "BFGS", hessian = T,  :

   valor inicial en 'vmmin' no es finito " I know this is because my initial
values are out the interval, but i try with different initial values and
the problem persist.

I don't know what can i do.


I have this code, to obtain my initial values:


  valores<-
optim(c(-1,-1,1,1,1),nlogL,method="SANN",control=list(maxit=1000))

DCp <-
optim(par=valores$par,nlogL,method="BFGS",hessian=T,control=list(maxit=1000))



I found in this link"http://es.listoso.com/r-help/2012-02/msg02395.html";
something similar, but in this case there isn't answer.


If you need more information about my code, please tell me.


Sincerely


Alejandra


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Substitute initial guesses of parameters in a function

2014-12-03 Thread Duncan Murdoch

On 03/12/2014 7:37 AM, philippe massicotte wrote:

Hi everyone, I have a formula like this:

f <- as.formula(y ~ p0a * exp(-0.5 * ((x - p1a)/p2a)^2))

I would like to "dynamically" provide starting values for p0a, p1a, p2a. Is 
there a way to do it?


Just give a named vector of starting values.


#Params estimates
p <- c(12, 10, 1)


Should be p <- c(p0a = 12, p1a = 10, p2a = 1)


# This is where I have difficulties
mystart <- substitute(...)

nls(formula = f, start = mystart)


Now start = p will work.  No need to mess with substitute.  (And no need 
to use as.formula on the very first line; that's already a formula.)


Duncan Murdoch


Regards,
Philippe

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Substitute initial guesses of parameters in a function

2014-12-03 Thread philippe massicotte
Hi everyone, I have a formula like this:

f <- as.formula(y ~ p0a * exp(-0.5 * ((x - p1a)/p2a)^2))

I would like to "dynamically" provide starting values for p0a, p1a, p2a. Is 
there a way to do it?

#Params estimates
p <- c(12, 10, 1)

# This is where I have difficulties
mystart <- substitute(...)

nls(formula = f, start = mystart)

Regards,
Philippe
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Version 1.4.0 of the 'apcluster' package

2014-12-03 Thread Ulrich Bodenhofer

Dear colleagues,

This is to inform you that Version 1.4.0 of the R package 'apcluster' 
has been released on CRAN earlier this week. This is a major release 
that - apart from other important improvements - fulfills a long-term 
user request: the genuine support of sparse similarity matrices. For 
more details, see the package documentation and the following URLs:


http://www.bioinf.jku.at/software/apcluster/
http://cran.r-project.org/web/packages/apcluster/index.html


Best regards,
Ulrich Bodenhofer



*Dr. Ulrich Bodenhofer*
Associate Professor
Institute of Bioinformatics

*Johannes Kepler University*
Altenberger Str. 69
4040 Linz, Austria

Tel. +43 732 2468 4526
Fax +43 732 2468 4539
bodenho...@bioinf.jku.at 
http://www.bioinf.jku.at/ 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add group name in barchart

2014-12-03 Thread Michael Dewey

Comments in line

On 02/12/2014 23:27, Silong Liao wrote:

Dear ALL,

I have a dataset contains 2 variables: mate (mating groups) and ratio (ratio of 
number of mothers and fathers). And mate is an identifer which consists three 
components: year, flock (flk), and tag.

I am using command "barchart" under package "lattice" to generate plots of 
ratio against mate divided by each flock (See attachment). I want to put flock names onto the 
corresponding plots, but don't know how.

Cheers, Sid

load("ratiodata.Rdata")
attach(ratiodata)


in general it is a bad idea to use attach, use the data= parameter if 
available.



head(ratiodata)

 materatio1 2007.102.A 21.285712 2007.102.B 68.23 2007.102.C 
59.54 2007.102.D 19.35 2007.102.E 72.36 2007.102.F 35.5
str(ratiodata$mate) #factor
str(ratiodata$ratio) #num
unique(ratiodata$flk)

[1]  1022 2744 2747 2749  391 4357 4880 3001 3003 3004 3855 3658 4588 4591 
4631
library("lattice")
ratiodata$flk=read.table(text=as.character(ratiodata$mate),sep=".")[,2]


So what did this give you for flk?


barchart(ratiodata$ratio~ratiodata$mate|ratiodata$flk)


This could be

barchart(ratio ~ mate | flk, data = ratiodata)

which is much easier to read.
So what did it label the panels with?





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5577 / Virus Database: 4235/8673 - Release Date: 12/03/14




--
Michael
http://www.dewey.myzen.co.uk

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lmom package - Resending the email

2014-12-03 Thread Simon Zehnder
Katherine,

for a deeper understanding of differing values it makes sense to provide the 
list at least with an online description of the corresponding functions used in 
Minitab and SPSS…

Best 
Simon
On 03 Dec 2014, at 10:45, Katherine Gobin via R-help  
wrote:

> Dear R forum
> I sincerely apologize as my earlier mail with the captioned subject, since 
> all the values got mixed up and the email is not readable. I am trying to 
> write it again. 
> My problem is I have a set of data and I am trying to fit some distributions 
> to it. As a part of this exercise, I need to find out the parameter values of 
> various distributions e.g. Normal distribution, Log normal distribution etc. 
> I am using lmom package to do the same, however the parameter values obtained 
> using lmom pacakge differ to a large extent from the parameter values 
> obtained using say MINITAB and SPSS as given below -
> _
> 
> amounts =  
> c(38572.5599129508,11426.6705314315,21974.1571641187,118530.32782443,3735.43055996748,66309.5211176106,72039.2934132668,21934.8841708626,78564.9136114375,1703.65825161293,2116.89180930203,11003.495671332,19486.3296339113,1871.35861218795,6887.53851253407,148900.978055447,7078.56497101651,79348.1239806592,20157.6241066905,1259.99802108593,3934.45912233674,3297.69946631591,56221.1154121067,13322.0705174134,45110.2498756567,31910.3686613912,3196.71168501252,32843.0140437202,14615.1499458453,13013.9915051561,116104.176753387,7229.03056392023,9833.37962177814,2882.63239493673,165457.372543821,41114.066453219,47188.1677766245,25708.5883755617,82703.7378298092,8845.04197017415,844.28834047836,35410.8486123933,19446.3808445684,17662.2398792892,11882.8497070776,4277181.17817307,30239.0371267968,45165.7512343364,22102.8513746687,5988.69296597127,51345.0146170238,1275658.35495898,15260.4892854214,8861.76578480635,37647.1638704867,4979.53544046949,7012.48134772332,3385.20612391205,1911.03114395959,66886.5036605189,2223.47536156462,814.947809578378,234.028589468841,5397.4347625133,13346.3226579065,28809.3901352898,6387.69226236731,5639.42730553242,2011100.92675507,4150.63707173462,34098.7514446498,3437.10672573502,289710.315303182,8664.66947305203,13813.3867161134,208817.521491857,169317.624400274,9966.78447705792,37811.1721605562,2263.19211279927,80434.5581206454,19057.8093104899,24664.5067589624,25136.5042354789,3582.85741610706,6683.13898432794,65423.9991390846,134848.302304064,3018.55371579808,546249.641168158,172926.689143006,3074.15064180208,1521.70624812788,59012.4248281661,21226.928522236,17572.5682970983,226.646947337851,56232.2982652019,14641.0043361533,6997.94414914865)
> 
> library(lmom)
> lmom  =  samlmu(amounts)
> # __
> # Normal Distribution parameters
> parameters_of_NOR  <- pelnor(lmom); parameters_of_NOR
> 
>   mu  sigma 115148.4175945.8
>   Location   Scale Minitab 115148.4 
> 485173SPSS   115148.4 485173
> # __
> # Log Normal (3 Parameter) Distribution parameters
>zetamu   sigma 3225.7988909.114879 
>  2.240841
>   LocationScale   Shape
> MINITAB   9.73361 1.76298  75.51864SPSS   
>  9.73361.763  75.519   # 
> __
> 
> Besides Genaralized extreme Value distributions, all the other distributions 
> e.g. Gamma, Exponential (2 parameter) distributions etc give different 
> results than MINITAB and SPSS.
> Can some one guide me?
> 
> Regards
> Katherine
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmom package - Resending the email

2014-12-03 Thread Katherine Gobin via R-help
Dear R forum
I sincerely apologize as my earlier mail with the captioned subject, since all 
the values got mixed up and the email is not readable. I am trying to write it 
again. 
My problem is I have a set of data and I am trying to fit some distributions to 
it. As a part of this exercise, I need to find out the parameter values of 
various distributions e.g. Normal distribution, Log normal distribution etc. I 
am using lmom package to do the same, however the parameter values obtained 
using lmom pacakge differ to a large extent from the parameter values obtained 
using say MINITAB and SPSS as given below -
_

amounts =  
c(38572.5599129508,11426.6705314315,21974.1571641187,118530.32782443,3735.43055996748,66309.5211176106,72039.2934132668,21934.8841708626,78564.9136114375,1703.65825161293,2116.89180930203,11003.495671332,19486.3296339113,1871.35861218795,6887.53851253407,148900.978055447,7078.56497101651,79348.1239806592,20157.6241066905,1259.99802108593,3934.45912233674,3297.69946631591,56221.1154121067,13322.0705174134,45110.2498756567,31910.3686613912,3196.71168501252,32843.0140437202,14615.1499458453,13013.9915051561,116104.176753387,7229.03056392023,9833.37962177814,2882.63239493673,165457.372543821,41114.066453219,47188.1677766245,25708.5883755617,82703.7378298092,8845.04197017415,844.28834047836,35410.8486123933,19446.3808445684,17662.2398792892,11882.8497070776,4277181.17817307,30239.0371267968,45165.7512343364,22102.8513746687,5988.69296597127,51345.0146170238,1275658.35495898,15260.4892854214,8861.76578480635,37647.1638704867,4979.53544046949,7012.48134772332,3385.20612391205,1911.03114395959,66886.5036605189,2223.47536156462,814.947809578378,234.028589468841,5397.4347625133,13346.3226579065,28809.3901352898,6387.69226236731,5639.42730553242,2011100.92675507,4150.63707173462,34098.7514446498,3437.10672573502,289710.315303182,8664.66947305203,13813.3867161134,208817.521491857,169317.624400274,9966.78447705792,37811.1721605562,2263.19211279927,80434.5581206454,19057.8093104899,24664.5067589624,25136.5042354789,3582.85741610706,6683.13898432794,65423.9991390846,134848.302304064,3018.55371579808,546249.641168158,172926.689143006,3074.15064180208,1521.70624812788,59012.4248281661,21226.928522236,17572.5682970983,226.646947337851,56232.2982652019,14641.0043361533,6997.94414914865)

library(lmom)
lmom  =  samlmu(amounts)
# __
# Normal Distribution parameters
parameters_of_NOR  <- pelnor(lmom); parameters_of_NOR

      mu          sigma 115148.4    175945.8
                      Location       Scale     Minitab         115148.4     
485173SPSS           115148.4     485173
# __
# Log Normal (3 Parameter) Distribution parameters
       zeta                mu               sigma 3225.798890    9.114879      
2.240841
                              Location            Scale           Shape
MINITAB               9.73361             1.76298      75.51864SPSS             
       9.7336                1.763          75.519           # 
__

Besides Genaralized extreme Value distributions, all the other distributions 
e.g. Gamma, Exponential (2 parameter) distributions etc give different results 
than MINITAB and SPSS.
Can some one guide me?

Regards
Katherine











































[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to have NLME return when convergence not reached

2014-12-03 Thread Viechtbauer Wolfgang (STAT)
Here is a reproducible bug in nlme (reported in 2008) that still crashes R 
today:

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001425.html

Seems to be related to memory corruption (as diagnosed by Martin and William 
Dunlap at the time):

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001429.html
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001431.html

I don't know if that is related to the present case, but it sounds a bit like 
it.

Best,
Wolfgang

--   
Wolfgang Viechtbauer, Ph.D., Statistician   
Department of Psychiatry and Psychology   
School for Mental Health and Neuroscience   
Faculty of Health, Medicine, and Life Sciences   
Maastricht University, P.O. Box 616 (VIJV1)   
6200 MD Maastricht, The Netherlands   
+31 (43) 388-4170 | http://www.wvbauer.com   

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Martin
> Maechler
> Sent: Wednesday, December 03, 2014 09:26
> To: Bert Gunter
> Cc: r-help@r-project.org; Ramiro Barrantes
> Subject: Re: [R] How to have NLME return when convergence not reached
> 
> > Bert Gunter 
> > on Tue, 2 Dec 2014 14:03:44 -0800 writes:
> 
> > Yes, Bill almost always has helpful ideas.
> > Just a comment: If indeed the process is gobbling up too much
> memory,
> > that might indicate a problem with your function or implementation.
> I
> > defer to real experts on this, however.
> 
> > Cheers,
> > Bert
> 
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> > (650) 467-7374
> 
> 
> Yes, thank you Bert, it could be entirely Ramiro's function, but
> we don't know as you have not given any indication about what
> the non-linear function is in your nlme() formula.
> 
> The little you wrote _seems_ to indicate a bug in nlme, and if
> that was the case, the bug should be reported and fixed,
> and we (R core - the maintainer of the recommended package nlme)
> would really like to to get a reproducible example.
> 
> If OTOH, it is your nonlinear function that "goes haywire",
> the implicit blame on nlme would not be warranted.
> 
> Martin Maechler,
> ETH Zurich and R Core team
> 
> 
> > On Tue, Dec 2, 2014 at 1:59 PM, Ramiro Barrantes
> >  wrote:
> >> Thanks so much for your reply.  I am using try but nlme never
> returns!!  and I think the process is getting killed by the system as it
> is taking over all the memory.  However, I do like William Dunlap's idea
> of using R.utils::withTimeout to limit the time.
> >>
> >> Thanks again for your help!
> >> 
> >> From: Bert Gunter [gunter.ber...@gene.com]
> >> Sent: Tuesday, December 02, 2014 4:30 PM
> >> To: Ramiro Barrantes
> >> Cc: r-help@r-project.org
> >> Subject: Re: [R] How to have NLME return when convergence not
> reached
> >>
> >> ?try
> >> Or
> >> ?tryCatch
> >>
> >> Bert
> >>
> >> Sent from my iPhone -- please excuse typos.
> >>
> >>> On Dec 2, 2014, at 12:57 PM, Ramiro Barrantes
>  wrote:
> >>>
> >>> Hello,
> >>>
> >>> I am trying to fit many hundreds of simulated datasets using NLME
> (it's all in a big loop in R).  Some don't seem to converge.  I am
> working on addressing the issues by perhaps adjusting my simulation, or
> tweaking iteration steps in nlme, etc.  However, when it doesn't
> converge, NLME just hangs, and my program either stalls for hours/days or
> takes over the computer memory and everything crashes eventually.  Is
> there a way to tell nlme to stop when it doesn't seem to be converging
> somehow?   I have been looking at the parameters in nlmeControl() but see
> nothing obvious.
> >>>
> >>> Thanks in advance,
> >>>
> >>> Ramiro

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmom package

2014-12-03 Thread Katherine Gobin via R-help
Dear R Forum
I have a set of data say as given below and as an exercise of trying to fit 
statistical distribution to this data, I am estimating parameters. 
amounts =  
c(38572.5599129508,11426.6705314315,21974.1571641187,118530.32782443,3735.43055996748,66309.5211176106,72039.2934132668,21934.8841708626,78564.9136114375,1703.65825161293,2116.89180930203,11003.495671332,19486.3296339113,1871.35861218795,6887.53851253407,148900.978055447,7078.56497101651,79348.1239806592,20157.6241066905,1259.99802108593,3934.45912233674,3297.69946631591,56221.1154121067,13322.0705174134,45110.2498756567,31910.3686613912,3196.71168501252,32843.0140437202,14615.1499458453,13013.9915051561,116104.176753387,7229.03056392023,9833.37962177814,2882.63239493673,165457.372543821,41114.066453219,47188.1677766245,25708.5883755617,82703.7378298092,8845.04197017415,844.28834047836,35410.8486123933,19446.3808445684,17662.2398792892,11882.8497070776,4277181.17817307,30239.0371267968,45165.7512343364,22102.8513746687,5988.69296597127,51345.0146170238,1275658.35495898,15260.4892854214,8861.76578480635,37647.1638704867,4979.53544046949,7012.48134772332,3385.20612391205,1911.03114395959,66886.5036605189,2223.47536156462,814.947809578378,234.028589468841,5397.4347625133,13346.3226579065,28809.3901352898,6387.69226236731,5639.42730553242,2011100.92675507,4150.63707173462,34098.7514446498,3437.10672573502,289710.315303182,8664.66947305203,13813.3867161134,208817.521491857,169317.624400274,9966.78447705792,37811.1721605562,2263.19211279927,80434.5581206454,19057.8093104899,24664.5067589624,25136.5042354789,3582.85741610706,6683.13898432794,65423.9991390846,134848.302304064,3018.55371579808,546249.641168158,172926.689143006,3074.15064180208,1521.70624812788,59012.4248281661,21226.928522236,17572.5682970983,226.646947337851,56232.2982652019,14641.0043361533,6997.94414914865)
library(lmom)lmom           <- samlmu(amounts)

# 
# Normal distribution
parameters_of_NOR  <- pelnor(lmom); parameters_of_NOR

> parameters_of_NOR  <- pelnor(lmom); parameters_of_NOR      mu        sigma 
> 115148.4  175945.8 

# Minitab and SPSS parameter values                              Location       
             Scale
Minitab              115148.4                 485173SPSS                 
115148.4                 485173           
# __

# Log normal 3 parameter distribution parameters_of_LN3  <- pelln3(lmom); 
parameters_of_LN3

> parameters_of_LN3  <- pelln3(lmom); parameters_of_LN3
       zeta              mu                sigma 3225.798890    9.114879     
2.240841
                               Location             Scale                  
ShapeMinitab                  9.73361             1.76298               
75.51864SPSS                    9.7336                1.763                    
75.519         

Similarly besides Generalized extreme Value distribution, all the parameter 
values vary significantly than parameter values obtained using Minitab and 
SPSS. In case of Normal distribution, the dispersion parameter is simply sample 
standard deviation and excel also gives the parameter value 485172.8 and varies 
significantly than what we get from R.
And parameter values do differ even for many other distributions too viz. Gamma 
distribution etc.
Is there any different algorithm or logic used in R? Can someone please guide.?
Regards
Katherine


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to have NLME return when convergence not reached

2014-12-03 Thread Martin Maechler
> Bert Gunter 
> on Tue, 2 Dec 2014 14:03:44 -0800 writes:

> Yes, Bill almost always has helpful ideas.
> Just a comment: If indeed the process is gobbling up too much memory,
> that might indicate a problem with your function or implementation. I
> defer to real experts on this, however.

> Cheers,
> Bert

> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374


Yes, thank you Bert, it could be entirely Ramiro's function, but
we don't know as you have not given any indication about what
the non-linear function is in your nlme() formula.

The little you wrote _seems_ to indicate a bug in nlme, and if
that was the case, the bug should be reported and fixed,
and we (R core - the maintainer of the recommended package nlme)
would really like to to get a reproducible example.

If OTOH, it is your nonlinear function that "goes haywire", 
the implicit blame on nlme would not be warranted.

Martin Maechler, 
ETH Zurich and R Core team


> On Tue, Dec 2, 2014 at 1:59 PM, Ramiro Barrantes
>  wrote:
>> Thanks so much for your reply.  I am using try but nlme never returns!!  
and I think the process is getting killed by the system as it is taking over 
all the memory.  However, I do like William Dunlap's idea of using 
R.utils::withTimeout to limit the time.
>> 
>> Thanks again for your help!
>> 
>> From: Bert Gunter [gunter.ber...@gene.com]
>> Sent: Tuesday, December 02, 2014 4:30 PM
>> To: Ramiro Barrantes
>> Cc: r-help@r-project.org
>> Subject: Re: [R] How to have NLME return when convergence not reached
>> 
>> ?try
>> Or
>> ?tryCatch
>> 
>> Bert
>> 
>> Sent from my iPhone -- please excuse typos.
>> 
>>> On Dec 2, 2014, at 12:57 PM, Ramiro Barrantes 
 wrote:
>>> 
>>> Hello,
>>> 
>>> I am trying to fit many hundreds of simulated datasets using NLME (it's 
all in a big loop in R).  Some don't seem to converge.  I am working on 
addressing the issues by perhaps adjusting my simulation, or tweaking iteration 
steps in nlme, etc.  However, when it doesn't converge, NLME just hangs, and my 
program either stalls for hours/days or takes over the computer memory and 
everything crashes eventually.  Is there a way to tell nlme to stop when it 
doesn't seem to be converging somehow?   I have been looking at the parameters 
in nlmeControl() but see nothing obvious.
>>> 
>>> Thanks in advance,
>>> 
>>> Ramiro
>>> 
>>> [[alternative HTML version deleted]]
>>> 
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.