[R] Archer-Lemeshow Goodness of Fit Test for Survey Data with Log. Regression

2016-11-16 Thread Courtney Benjamin
?Hello R Experts,

I am trying to implement the Archer-Lemeshow GOF Test for survey data on a 
logistic regression model using the survey package based upon an R Help Archive 
post that I found where Dr. Thomas Lumley advised how to do it: 
http://r.789695.n4.nabble.com/Goodness-of-t-tests-for-Complex-Survey-Logistic-Regression-td4668233.html

Everything is going well until I get to the point where I have to add the 
objects 'r' and 'g' as variables to the data frame by either using the 
transform function or the update function to update the svrepdesign object.  
The log. regression model involved uses a subset of data and some of the values 
in the data frame are NA, so that is affecting my ability to add 'r' and 'g' as 
variables; I am getting an error because I only have 8397 rows for the new 
variables and 16197 in the data frame and svrepdesign object.  I am not sure 
how to overcome this error.

The following is a MRE:

##Archer Lemeshow Goodness of Fit Test for Complex Survey Data with Logistic 
Regression

library(RCurl)
library(survey)

data <- 
getURL("https://raw.githubusercontent.com/cbenjamin1821/careertech-ed/master/elsq1adj.csv";)
elsq1ch <- read.csv(text = data)

#Specifying the svyrepdesign object which applies the BRR weights
elsq1ch_brr<-svrepdesign(variables = elsq1ch[,1:16], repweights = 
elsq1ch[,18:217], weights = elsq1ch[,17], combined.weights = TRUE, type = "BRR")
elsq1ch_brr

##Resetting baseline levels for predictors
elsq1ch_brr <- update( elsq1ch_brr , F1HIMATH = relevel(F1HIMATH,"PreAlg or 
Less") )
elsq1ch_brr <- update( elsq1ch_brr , BYINCOME = relevel(BYINCOME,"0-25K") )
elsq1ch_brr <- update( elsq1ch_brr , F1RACE = relevel(F1RACE,"White") )
elsq1ch_brr <- update( elsq1ch_brr , F1SEX = relevel(F1SEX,"Male") )
elsq1ch_brr <- update( elsq1ch_brr , F1RTRCC = relevel(F1RTRCC,"Academic") )

#Log. Reg. model-all curric. concentrations including F1RTRCC as a predictor
allCC <- 
svyglm(formula=F3ATTAINB~F1PARED+BYINCOME+F1RACE+F1SEX+F1RGPP2+F1HIMATH+F1RTRCC,family="binomial",design=elsq1ch_brr,subset=BYSCTRL==1&G10COHRT==1,na.action=na.omit)
summary(allCC)

#Recommendations from Lumley (from R Help Archive) on implementing the Archer 
Lemeshow GOF test
r <- residuals(allCC, type="response")
f<-fitted(allCC)
g<- cut(f, c(-Inf, quantile(f,  (1:9)/10, Inf)))

# now create a new design object with r and g added as variables
#This is the area where I am having problems as my model involves a subset and 
some values are NA as well
#I am also not sure if I am naming/specifying the new variables of r and g 
properly
transform(elsq1ch,r=r,g=g)
elsq1ch_brr <- update(elsq1ch_brr,tag=g,tag=r)
#then:
decilemodel<- svyglm(r~g, design=newdesign)
regTermTest(decilemodel, ~g)
#is the F-adjusted mean residual test from the Archer Lemeshow paper

Thank you,
Courtney

?

Courtney Benjamin

Broome-Tioga BOCES

Automotive Technology II Teacher

Located at Gault Toyota

Doctoral Candidate-Educational Theory & Practice

State University of New York at Binghamton

cbenj...@btboces.org

607-763-8633

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Run a Python code from R

2016-11-16 Thread David Winsemius

> On Nov 16, 2016, at 4:53 PM, Nelly Reduan  wrote:
> 
> Thank you very much for your help !
> 
> 
> I 'm trying to use the package "rPithon" but I obtain this error message:

Are you sure you are not just misspelling rPython? If that's not the issue than 
you need to say where you got rPithon,


> 
> 
>> if (pithon.available())
> 
> + {
> 
> +   nRow <-50
> 
> +   nCol <-50
> 
> +   h <- 0.75
> 
> +
> 
> +   # this file contains the definition of function concat
> 
> +   pithon.load("C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py")
> 
> +   pithon.call( "mpd", nRow, nCol, h)
> 
> +
> 
> + } else {
> 
> +   print("Unable to execute python")
> 
> + }
> 
> Show Traceback
> 
> 
> 
> Rerun with Debug
> 
> 
> 
> Error in pithon.get("_r_call_return", instance.name = instname) :
> 
>  Couldn't retrieve variable: Traceback (most recent call last):
> 
>  File "C:/Users/Documents/R/win-library/3.3/rPithon/pythonwrapperscript.py", 
> line 110, in 
> 
>reallyReallyLongAndUnnecessaryPrefix.data = 
> json.dumps([eval(reallyReallyLongAndUnnecessaryPrefix.argData)])
> 
>  File "C:\Users\ANACON~1\lib\json\__init__.py", line 244, in dumps
> 
>return _default_encoder.encode(obj)
> 
>  File "C:\Users\ANACON~1\lib\json\encoder.py", line 207, in encode
> 
>chunks = self.iterencode(o, _one_shot=True)
> 
>  File "C:\Users\ANACON~1\lib\json\encoder.py", line 270, in iterencode
> 
>return _iterencode(o, 0)
> 
>  File "C:\Users\ANACON~1\lib\json\encoder.py", line 184, in default
> 
>raise TypeError(repr(o) + " is not JSON serializable")
> 
> TypeError: array([[ 0.36534654,  0.31962481,  0.44229946, ...,  0.11513079,
> 
> 0.07156331,  0.00286971],
> 
>   [ 0.41534291,  0.41333479,  0.48118995, ...,  0.19203674,
> 
> 0.04192771,  0.03679473],
> 
>   [ 0.5188
> 
> 
> 
> 
> 
> Nell
> 
> 
> 
> De : Wensui Liu 
> Envoy� : mercredi 16 novembre 2016 16:00:03
> � : Nelly Reduan
> Cc : r-help@r-project.org
> Objet : Re: [R] Run a Python code from R
> 
> take a look at rpython or rPithon package
> 
> On Wed, Nov 16, 2016 at 4:53 PM, Nelly Reduan  wrote:
>> Hello,
>> 
>> 
>> How can I run this Python code from R ?
>> 
>> 
> import nlmpy
> nlm = nlmpy.mpd(nRow=50, nCol=50, h=0.75)
> nlmpy.exportASCIIGrid("raster.asc", nlm)
>> 
>> 
>> Nlmpy is a Python package to build neutral landscape models
>> 
>> https://pypi.python.org/pypi/nlmpy . The example comes from this website. I 
>> tried to use the function system2 but I don't know how to use it.
>> 
>> 
>> path_script_python <- "C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py"
>> 
>> test <- system2("python", args = c(path_script_python, as.character(nRow), 
>> as.character(nCol), as.character(h)))
>> 
>> Thanks a lot for your help.
>> Nell
>> 
>> 
>> nlmpy 0.1.3 : Python Package Index
>> pypi.python.org
>> NLMpy. NLMpy is a Python package for the creation of neutral landscape 
>> models that are widely used in the modelling of ecological patterns and 
>> processes across ...
>> 
>> 
>> 
>>[[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Run a Python code from R

2016-11-16 Thread Nelly Reduan
Thank you very much for your help !


I 'm trying to use the package "rPithon" but I obtain this error message:


> if (pithon.available())

+ {

+   nRow <-50

+   nCol <-50

+   h <- 0.75

+

+   # this file contains the definition of function concat

+   pithon.load("C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py")

+   pithon.call( "mpd", nRow, nCol, h)

+

+ } else {

+   print("Unable to execute python")

+ }

 Show Traceback



 Rerun with Debug



Error in pithon.get("_r_call_return", instance.name = instname) :

  Couldn't retrieve variable: Traceback (most recent call last):

  File "C:/Users/Documents/R/win-library/3.3/rPithon/pythonwrapperscript.py", 
line 110, in 

reallyReallyLongAndUnnecessaryPrefix.data = 
json.dumps([eval(reallyReallyLongAndUnnecessaryPrefix.argData)])

  File "C:\Users\ANACON~1\lib\json\__init__.py", line 244, in dumps

return _default_encoder.encode(obj)

  File "C:\Users\ANACON~1\lib\json\encoder.py", line 207, in encode

chunks = self.iterencode(o, _one_shot=True)

  File "C:\Users\ANACON~1\lib\json\encoder.py", line 270, in iterencode

return _iterencode(o, 0)

  File "C:\Users\ANACON~1\lib\json\encoder.py", line 184, in default

raise TypeError(repr(o) + " is not JSON serializable")

TypeError: array([[ 0.36534654,  0.31962481,  0.44229946, ...,  0.11513079,

 0.07156331,  0.00286971],

   [ 0.41534291,  0.41333479,  0.48118995, ...,  0.19203674,

 0.04192771,  0.03679473],

   [ 0.5188





Nell



De : Wensui Liu 
Envoy� : mercredi 16 novembre 2016 16:00:03
� : Nelly Reduan
Cc : r-help@r-project.org
Objet : Re: [R] Run a Python code from R

take a look at rpython or rPithon package

On Wed, Nov 16, 2016 at 4:53 PM, Nelly Reduan  wrote:
> Hello,
>
>
> How can I run this Python code from R ?
>
>
 import nlmpy
 nlm = nlmpy.mpd(nRow=50, nCol=50, h=0.75)
 nlmpy.exportASCIIGrid("raster.asc", nlm)
>
>
> Nlmpy is a Python package to build neutral landscape models
>
> https://pypi.python.org/pypi/nlmpy . The example comes from this website. I 
> tried to use the function system2 but I don't know how to use it.
>
>
> path_script_python <- "C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py"
>
> test <- system2("python", args = c(path_script_python, as.character(nRow), 
> as.character(nCol), as.character(h)))
>
> Thanks a lot for your help.
> Nell
>
>
> nlmpy 0.1.3 : Python Package Index
> pypi.python.org
> NLMpy. NLMpy is a Python package for the creation of neutral landscape models 
> that are widely used in the modelling of ecological patterns and processes 
> across ...
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Run a Python code from R

2016-11-16 Thread Wensui Liu
take a look at rpython or rPithon package

On Wed, Nov 16, 2016 at 4:53 PM, Nelly Reduan  wrote:
> Hello,
>
>
> How can I run this Python code from R ?
>
>
 import nlmpy
 nlm = nlmpy.mpd(nRow=50, nCol=50, h=0.75)
 nlmpy.exportASCIIGrid("raster.asc", nlm)
>
>
> Nlmpy is a Python package to build neutral landscape models
>
> https://pypi.python.org/pypi/nlmpy . The example comes from this website. I 
> tried to use the function system2 but I don't know how to use it.
>
>
> path_script_python <- "C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py"
>
> test <- system2("python", args = c(path_script_python, as.character(nRow), 
> as.character(nCol), as.character(h)))
>
> Thanks a lot for your help.
> Nell
>
>
> nlmpy 0.1.3 : Python Package Index
> pypi.python.org
> NLMpy. NLMpy is a Python package for the creation of neutral landscape models 
> that are widely used in the modelling of ecological patterns and processes 
> across ...
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Run a Python code from R

2016-11-16 Thread Nelly Reduan
Hello,


How can I run this Python code from R ?


>>> import nlmpy
>>> nlm = nlmpy.mpd(nRow=50, nCol=50, h=0.75)
>>> nlmpy.exportASCIIGrid("raster.asc", nlm)


Nlmpy is a Python package to build neutral landscape models

https://pypi.python.org/pypi/nlmpy . The example comes from this website. I 
tried to use the function system2 but I don't know how to use it.


path_script_python <- "C:/Users/Anaconda2/Lib/site-packages/nlmpy/nlmpy.py"

test <- system2("python", args = c(path_script_python, as.character(nRow), 
as.character(nCol), as.character(h)))

Thanks a lot for your help.
Nell


nlmpy 0.1.3 : Python Package Index
pypi.python.org
NLMpy. NLMpy is a Python package for the creation of neutral landscape models 
that are widely used in the modelling of ecological patterns and processes 
across ...



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using pmax in presence of NAs

2016-11-16 Thread Dimitri Liakhovitski
Thank you, Peter!

On Wed, Nov 16, 2016 at 4:21 PM, peter dalgaard  wrote:
>
>> On 16 Nov 2016, at 21:58 , Dimitri Liakhovitski 
>>  wrote:
>>
>> Hello!
>>
>> I need to calculate the maximum of each row of a data frame.
>> This works:
>>
>>  x <- data.frame(a = 1:5, b=11:15, c=111:115)
>>  x
>>  do.call(pmax, x)
>> [1] 111 112 113 114 115
>>
>> However, how should I modify it if my data frame has NAs?
>> I'd like it to ignore NAs and return the maximum of all non-NAs in each row:
>>
>>  x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))
>>  x
>> I'd like it to return:
>> [1] 111 112 113 114 15
>>
>> Thanks a lot!
>
> The first thing to notice is that pmax allows na.rm=TRUE, so it works fine to 
> do
>
>> pmax(a = c(1:5), b=11:15, c=c(111:114,NA), na.rm=TRUE)
> [1] 111 112 113 114  15
>
> I.e., it is your desire to use do.call() that is the challenge. The 2nd 
> argument should be a list containing the arguments as in the above call, so 
> this works too
>
>> do.call(pmax, list(a = c(1:5), b=11:15, c=c(111:114,NA), na.rm=TRUE))
> [1] 111 112 113 114  15
>
> The form do.call(pmax,x) uses the fact that a data.frame is a kind of list. 
> What you need to do is to tack on the element na.rm=TRUE. This works
>
>> do.call(pmax, c(x, list(na.rm=TRUE)))
> [1] 111 112 113 114  15
>
> and it even works without the list()
>
>> do.call(pmax, c(x, na.rm=TRUE))
> [1] 111 112 113 114  15
>
> -pd
>
>
>>
>> --
>> Dimitri Liakhovitski
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
>
>
>
>
>
>
>
>



-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using pmax in presence of NAs

2016-11-16 Thread peter dalgaard

> On 16 Nov 2016, at 21:58 , Dimitri Liakhovitski 
>  wrote:
> 
> Hello!
> 
> I need to calculate the maximum of each row of a data frame.
> This works:
> 
>  x <- data.frame(a = 1:5, b=11:15, c=111:115)
>  x
>  do.call(pmax, x)
> [1] 111 112 113 114 115
> 
> However, how should I modify it if my data frame has NAs?
> I'd like it to ignore NAs and return the maximum of all non-NAs in each row:
> 
>  x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))
>  x
> I'd like it to return:
> [1] 111 112 113 114 15
> 
> Thanks a lot!

The first thing to notice is that pmax allows na.rm=TRUE, so it works fine to 
do 

> pmax(a = c(1:5), b=11:15, c=c(111:114,NA), na.rm=TRUE)
[1] 111 112 113 114  15

I.e., it is your desire to use do.call() that is the challenge. The 2nd 
argument should be a list containing the arguments as in the above call, so 
this works too

> do.call(pmax, list(a = c(1:5), b=11:15, c=c(111:114,NA), na.rm=TRUE))
[1] 111 112 113 114  15

The form do.call(pmax,x) uses the fact that a data.frame is a kind of list. 
What you need to do is to tack on the element na.rm=TRUE. This works

> do.call(pmax, c(x, list(na.rm=TRUE)))
[1] 111 112 113 114  15

and it even works without the list() 

> do.call(pmax, c(x, na.rm=TRUE))
[1] 111 112 113 114  15

-pd


> 
> -- 
> Dimitri Liakhovitski
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using pmax in presence of NAs

2016-11-16 Thread Dimitri Liakhovitski
Thanks a lot, Sarah.
I just had no idea where to put na.rm = T in the do.call call.
Appreciate it!
Dimitri

On Wed, Nov 16, 2016 at 4:06 PM, Sarah Goslee  wrote:
> pmax has a na.rm argument. Why not just use that?
>
> x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))
>
>> do.call(pmax, c(x, na.rm=TRUE))
> [1] 111 112 113 114  15
>
> On Wed, Nov 16, 2016 at 3:58 PM, Dimitri Liakhovitski
>  wrote:
>> Hello!
>>
>> I need to calculate the maximum of each row of a data frame.
>> This works:
>>
>>   x <- data.frame(a = 1:5, b=11:15, c=111:115)
>>   x
>>   do.call(pmax, x)
>> [1] 111 112 113 114 115
>>
>> However, how should I modify it if my data frame has NAs?
>> I'd like it to ignore NAs and return the maximum of all non-NAs in each row:
>>
>>   x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))
>>   x
>> I'd like it to return:
>> [1] 111 112 113 114 15
>>
>> Thanks a lot!
>>
>> --
>> Dimitri Liakhovitski
>>



-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using pmax in presence of NAs

2016-11-16 Thread Sarah Goslee
pmax has a na.rm argument. Why not just use that?

x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))

> do.call(pmax, c(x, na.rm=TRUE))
[1] 111 112 113 114  15

On Wed, Nov 16, 2016 at 3:58 PM, Dimitri Liakhovitski
 wrote:
> Hello!
>
> I need to calculate the maximum of each row of a data frame.
> This works:
>
>   x <- data.frame(a = 1:5, b=11:15, c=111:115)
>   x
>   do.call(pmax, x)
> [1] 111 112 113 114 115
>
> However, how should I modify it if my data frame has NAs?
> I'd like it to ignore NAs and return the maximum of all non-NAs in each row:
>
>   x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))
>   x
> I'd like it to return:
> [1] 111 112 113 114 15
>
> Thanks a lot!
>
> --
> Dimitri Liakhovitski
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using pmax in presence of NAs

2016-11-16 Thread Dimitri Liakhovitski
To clarify:
I know I could do:
apply(x, 1, max, na.rm = T)

But I was wondering if one can modify the pmax one...

On Wed, Nov 16, 2016 at 3:58 PM, Dimitri Liakhovitski
 wrote:
> Hello!
>
> I need to calculate the maximum of each row of a data frame.
> This works:
>
>   x <- data.frame(a = 1:5, b=11:15, c=111:115)
>   x
>   do.call(pmax, x)
> [1] 111 112 113 114 115
>
> However, how should I modify it if my data frame has NAs?
> I'd like it to ignore NAs and return the maximum of all non-NAs in each row:
>
>   x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))
>   x
> I'd like it to return:
> [1] 111 112 113 114 15
>
> Thanks a lot!
>
> --
> Dimitri Liakhovitski



-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] using pmax in presence of NAs

2016-11-16 Thread Dimitri Liakhovitski
Hello!

I need to calculate the maximum of each row of a data frame.
This works:

  x <- data.frame(a = 1:5, b=11:15, c=111:115)
  x
  do.call(pmax, x)
[1] 111 112 113 114 115

However, how should I modify it if my data frame has NAs?
I'd like it to ignore NAs and return the maximum of all non-NAs in each row:

  x <- data.frame(a = c(1:5), b=11:15, c=c(111:114,NA))
  x
I'd like it to return:
[1] 111 112 113 114 15

Thanks a lot!

-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GAMs: test of simple effects following a significant interaction

2016-11-16 Thread Marine Regis
Hello,
I am a novice in Generalized Additive Models (GAMs) and I would need some 
advice on these models. From capture data, I would like to assess the effect of 
longitudinal changes in proportion of forests on abundance of skunks. To test 
this, I built this GAM where the dependent variable is the number of unique 
skunks and the independent variables are the X coordinates of the centroids of 
trapping sites (called "X" in the GAM) and the proportion of forests within the 
trapping sites (called "prop_forest" in the GAM):
mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, 
family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method = "REML", 
select = TRUE)
summary(mod)

Family: Negative Binomial(13.446)
Link function: log

Formula:
nb_unique ~ s(x, prop_forest)

Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.020950.03896  -51.87   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
   edf Ref.df Chi.sq  p-value
s(x,prop_forest) 3.182 29  17.76 0.000102 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.37   Deviance explained =   49%
-REML = 268.61  Scale est. = 1 n = 58

Should I include the simple effects of independent variables "X" and 
"prop_forest" into the GAM when the interaction is significant? I ask this 
question because the longitude and latitude are often included as an 
interaction term in a GAM (i.e., s(X,Y)) without the simple effects (however, I 
tested for the simple effects and they were not significant in my case).
Is it correct to include the interaction between X and proportion of forests 
when my objective is to test longitudinal changes in proportion of forests?
Thanks a lot for your time.
Have a nice day.
Marine


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Course: "Statistical Data Analysis with R"

2016-11-16 Thread PD Dr. Pablo Emilio Verde
Dear list members,

I am very glad to announce the introductory course: "Statistical Data 
Analysis with R".

I have running this course for 10 years and previous participants have 
had a nice time learning R at the LinuxHotel.
You are invited to bring their own data to analyze during the course.

Below you find more information about this course. If you have any 
question don't hesitate to contact me.

Best regards,

Pablo

++
Course: Introduction to Statistical Data Analysis with R
Where: Linux Hotel, Essen-Horst, Germany
When: 15.12 - 17.12.2016 week 50

Instructor: PD Dr. rer. nat. Pablo Emilio Verde
++
*Target audience*
This course is for data analyst who are familiar with classical 
statistics software
like SPSS, SAS, etc. and they want to get a working knowledge in R.

This is a 3 days intensive training course with 8 hours per day 
including lecturing
and exercises. The course presentation is practical with many worked 
examples.
To attend the course you do NOT need experience with R. Lectures are given
in English. Discussions can be in English, German or Spanish.
++

Day 1
* Introduction to concepts in R
* Data structures, objects and classes
* Data management with R (indexing and other advanced techniques)
* Graphics with R (classical functions, lattice plots and ggplot2)

Day 2
* Introduction to statistical functions in R
* Bootstrap methods and Monte Carlo simulation inference in R
* Linear models and regression techniques in R
* Model checking and model diagnostics

Day 3
* Logistic regression, Lognormal models and GLM with R
* Exploratory multivariate analysis
* Exploratory regression analysis with Classification Trees and Random 
Forest
* Own projects
++
Costs:

Public sector and commercial: 3 days, 998.00 € + 19% vat = 1,187.62€
(three days course, full eight clock hours per day, complete set of 
literature,
WiFi, complimentary notebook, full board, drinks, pastries, homemade cakes,
sauna, social program)

Additionally acommodation on demand: 63,13 € shared double room
per night (incl. VAT) or 138,03 € single room per night (inc. VAT).

Student:
675 € three days course

Some of the courses are frequently fully booked. So please notice that 
you may
have to try several times, until you get a spare place.

For more information, please contact: i...@linuxhotel.de
++

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Variable 'A' is not a factor Error message

2016-11-16 Thread S Ellison
This looks like one of those 'please talk to a statistician' questions ...

You appear to have requested a 12-run placket-burman experiment, which is a 
design that requires up to 11 two-level factors. You then fitted (I think) 
simulated data to that design, using those factors converted to their integer 
representation - which is a completely different thing in model matrix terms. 
So your predictors are still two-level factors,  and your linear model has, 
after building a model matrix from the default contrasts,  probably got one 
coefficient for each upper level of your two level factors and one for the 
intercept. (I'm assuming default contrasts here).

You then decided to redefine your predictors as numeric variables with a very 
large number of levels given by rnorm, none of which (except by very rare 
coincidence)  were in your original design. So your model cannot possibly 
predict the output - it has no coefficients for all those new levels. To avoid 
that, R quite accurately told you that you can't do that.

If you want to fit a linear model with continuous variables, you need to set up 
your DOE data frame with (meaningful) numeric predictors, not factors. You will 
then get a numerical gradient for each factor instead of a single offset for 
each upper level. That isn't really what Placket and Burman had in mind, so I 
would not normally start with a P-B design if I wanted to do that. Consider a 
response surface model instead.

S Ellison


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of ahmed
> meftah
> Sent: 11 November 2016 17:52
> To: r-help@r-project.org
> Subject: [R] Variable 'A' is not a factor Error message
> 
> I am running a DOE with the following code   library(Rcmdr)
> library(RcmdrMisc)
> library(RcmdrPlugin.DoE)
> # Define plackett burman experiment
> PB.DOE <- pb(nruns= 12 ,n12.taguchi= FALSE ,nfactors= 12 -1, ncenter= 0 ,
>  replications= 1 ,repeat.only= FALSE ,randomize= TRUE ,seed= 
> 27241 ,
>  factor.names=list( 
> A=c(100,1000),B=c(100,200),C=c(1,3),D=c(1,1.7),
> E=c(1000,1500),G=c(-2,2) ) )
> 
> as.numeric2 <- function(x) as.numeric(as.character(x))
> 
> # Calculate response column
> IP <- with(PB.DOE,(as.numeric2(A)*as.numeric2(B)*(5000-
> 3000))/(141.2*as.numeric2(C)*as.numeric2(D)*(log(as.numeric2(E)/0.25)-
> (1/2)+as.numeric2(G
> # Combine response column with exp design table
> final_set <- within(PB.DOE, {
>   IP<- ((as.numeric2(A)*as.numeric2(B)*(5000-
> 3000))/(141.2*as.numeric2(C)*as.numeric2(D)*(log(as.numeric2(E)/0.25)-
> (1/2)+as.numeric2(G
> })I then ran a regression as follows:LinearModel.1 <- lm(IP ~ A + B + C + 
> D +
> E + G,
> data=final_set)
> summary(LinearModel.1)Following this i wanted to run a predict using
> specified values as predictors in a Monte Carlo:n = 1 # Define probability
> distributions of predictors A = rnorm(n,450,100) hist(A,col = "blue",breaks =
> 50)
> 
> B = rnorm(n, 150,10)
> hist(B,col = "blue",breaks = 50)
> 
> C = rnorm(n, 1.5, 0.5)
> hist(C,col = "blue",breaks = 50)
> 
> D = runif(n,1.2,1.7)
> hist(D,col = "blue",breaks = 50)
> 
> E = rnorm(n,1250,50)
> hist(E,col = "blue",breaks = 50)
> 
> G = rnorm(n,0,0.5)
> hist(G,col = "blue",breaks = 50)
> 
> MCtable <- data.frame(A=A,B=B,C=C,D=D,E=E,G=G)
> 
> for (n in 1:n) {
>   N=predict(LinearModel.1,MCtable)
> }
> 
> hist(N,col = "yellow",breaks = 10)I end up getting this error:"Warning in
> model.frame.default(Terms, newdata, na.action = na.action, xlev =
> object$xlevels) :
>   variable 'A' is not a factor"Using str() to get some info on the 
> LinearModel.1
> and from what I understand seems to indicates that since the predictors
> A,B,C etc are factors with 2 levels I have to convert my data.frame table to
> factors aswell. Is that correct?Doing this would mean I would also need to
> specify the number of levels which would mean that since I have set my n to
> 1 would mean 1 levels for each factor. How would I go about doing
> this? Is there a better solution? Any help would be appreciated.
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Some basic time series questions

2016-11-16 Thread Hall, Mark
Hi,

As I sit and learn how to work with time series, I've run into a problem
that is eluding a quick easy answer (and googling for answers seems to
really slow the process...)

Question #1--
In a simple example on R 3.3.1 (sorry my employer hasn't upgraded to 3.3.2
yet):

x=rnorm(26,0,1)
x.ts<-ts(x,start=c(2014,9),frequency=12)

inputting x.ts at the prompt gives me a table with the rooms denoted by
year and columns denoted by months and everything lines up wonderfully.

Now my problem comes when I type at the prompt

plot(x.ts)  or
plot(x.ts, xlab="") or
plot.ts(x.ts,xlab="")

I get a plot of the values, but my x-axis labels are 2015.0, 2015.5,
2016.0, and 2016.5 .  January 2015 is coming out as 2015.0...

Is there a way of getting a more intelligible x-axis labeling?  Even 2015.1
for Janaury, etc. would work, or even getting an index (either Septemebr
2014 representing 0 or 1 and it incrementally increasing each month).

Question #2--
If I have a time series of decadal events, how best should I set the
frequency.  It is historical data, in the form of say AD 610-619 5 events,
AD 620-629 7 events, etc.

Sorry for such a basic questions.  Any advice would be appreciated.

Thanks in advance, MEH



Mark E. Hall, PhD
Assistant Field Manager
Black Rock Field Office
Winnemucca District Office
775-623-1529.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results when converting a matrix to a data.frame

2016-11-16 Thread David Winsemius

> On Nov 16, 2016, at 8:43 AM, Jeff Newmiller  wrote:
> 
> I will start by admitting I don't know the answer to your question.
> 
> However, I am responding because I think this should not be an issue in real 
> life use of R. Data frames are lists of distinct vectors, each of which has 
> its own reason for being present in the data, and normally each has its own 
> storage mode. Your use of a matrix as a short cut way to create many columns 
> at once does not change this fundamental difference between data frames and 
> matrices. You should not be surprised that putting the finishing touches on 
> this transformation takes some personal attention. 
> 
> Normally you should give explicit names to each column using the argument 
> names in the data.frame function. When using a matrix as a shortcut, you 
> should either immediately follow the creation of the data frame with a 
> names(DF)<- assignment, or wrap it in a setNames function call. 
> 
> setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )
> 
> Note that using a matrix to create many columns is memory inefficient, 
> because you start by setting aside a single block of memory (the matrix) and 
> then you move that data column at a time to separate vectors for use in the 
> data frame. If working with large data you might want to consider allocating 
> each column separately from the beginning. 
> 
> N <- 2
> nms <- c( "A", "B" )
> as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )
> 
> which is not as convenient, but illustrates that data frames are truly 
> different than matrices.
> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On November 16, 2016 7:20:38 AM PST, g.maub...@weinwolf.de wrote:
>> Hi All,
>> 
>> I build an empty dataframe to fill it will values later. I did the 
>> following:
>> 
>> -- cut --
>> matrix(NA, 2, 2)
>>[,1] [,2]
>> [1,]   NA   NA
>> [2,]   NA   NA
>>> data.frame(matrix(NA, 2, 2))
>> X1 X2
>> 1 NA NA
>> 2 NA NA
>>> as.data.frame(matrix(NA, 2, 2))
>> V1 V2
>> 1 NA NA
>> 2 NA NA
>> -- cut --
>> 
>> Why does data.frame deliver different results than as.data.frame with 
>> regard to the variable names (V instead of X)?

They are two different functions:

It's fairly easy to see by looking at the code:

as.data.frame.matrix uses: names(value) <- paste0("V", ic)  when there are no 
column names and data.frame calls make.names which prepends an "X" as the first 
letter of invalid or missing names.


As to why the authors did it this way, I'm unable to comment.

>> 
>> Kind regards
>> 
>> Georg
>> 
>>  [[alternative HTML version deleted]]


David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different results when converting a matrix to a data.frame

2016-11-16 Thread Jeff Newmiller
I will start by admitting I don't know the answer to your question.

However, I am responding because I think this should not be an issue in real 
life use of R. Data frames are lists of distinct vectors, each of which has its 
own reason for being present in the data, and normally each has its own storage 
mode. Your use of a matrix as a short cut way to create many columns at once 
does not change this fundamental difference between data frames and matrices. 
You should not be surprised that putting the finishing touches on this 
transformation takes some personal attention. 

Normally you should give explicit names to each column using the argument names 
in the data.frame function. When using a matrix as a shortcut, you should 
either immediately follow the creation of the data frame with a names(DF)<- 
assignment, or wrap it in a setNames function call. 

setNames( data.frame(matrix(NA, 2, 2)), c( "ColA", "ColB" ) )

Note that using a matrix to create many columns is memory inefficient, because 
you start by setting aside a single block of memory (the matrix) and then you 
move that data column at a time to separate vectors for use in the data frame. 
If working with large data you might want to consider allocating each column 
separately from the beginning. 

N <- 2
nms <- c( "A", "B" )
as.data.frame( setNames( lapply( nms, function(n){ rep( NA, 2 ) } ), nms ) )

which is not as convenient, but illustrates that data frames are truly 
different than matrices.
-- 
Sent from my phone. Please excuse my brevity.

On November 16, 2016 7:20:38 AM PST, g.maub...@weinwolf.de wrote:
>Hi All,
>
>I build an empty dataframe to fill it will values later. I did the 
>following:
>
>-- cut --
>matrix(NA, 2, 2)
> [,1] [,2]
>[1,]   NA   NA
>[2,]   NA   NA
>> data.frame(matrix(NA, 2, 2))
>  X1 X2
>1 NA NA
>2 NA NA
>> as.data.frame(matrix(NA, 2, 2))
>  V1 V2
>1 NA NA
>2 NA NA
>-- cut --
>
>Why does data.frame deliver different results than as.data.frame with 
>regard to the variable names (V instead of X)?
>
>Kind regards
>
>Georg
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Different results when converting a matrix to a data.frame

2016-11-16 Thread G . Maubach
Hi All,

I build an empty dataframe to fill it will values later. I did the 
following:

-- cut --
matrix(NA, 2, 2)
 [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> data.frame(matrix(NA, 2, 2))
  X1 X2
1 NA NA
2 NA NA
> as.data.frame(matrix(NA, 2, 2))
  V1 V2
1 NA NA
2 NA NA
-- cut --

Why does data.frame deliver different results than as.data.frame with 
regard to the variable names (V instead of X)?

Kind regards

Georg

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Issues with the way Apply handled NA's

2016-11-16 Thread S Ellison


> -Original Message-
> 
> You can check for an empty vector as follows:
> ...
> vals <- apply(plabor[c("colA","colB","colC")],1,function(x) 
> length(na.omit(x)))
> vals # [1] 3 0 3 2 
> <- ifelse(vals>0, plabor$colD, NA) 
> 
plabor

A slightly more compact variant that avoids the intermediate 'vals' variable is 
to apply an anonymous function that does the check internally:

plabor$colD <- apply(plabor, 1, function(x) if(all(is.na(x))) NA else prod(x, 
na.rm=TRUE))

S Ellison




***
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Text categories based on the sentences

2016-11-16 Thread S Ellison
> I have data set contains one variable "*Description*"
> 
> *Description**  Category*
> 
> 1. i want ice cream food
> 2. i like banana very much  fruit
> 3. tomorrow i will eat chicken  food
> 4. yesterday i went to birthday partyfestival
> 5. i lost my mobile last week   mobile
> 
> Please remember that i have only "*Description*" Variables only.How can i
> get the categories column based on the sentences of *Description *column.

You could look at something like ReadMe (http://gking.harvard.edu/readme) to 
generate a classifier based on a suitable subsample of your data that then 
classifies the rest of your data set.

Alternatively you could do it the hard way; use a natural language parser to 
extract all the noun phrases (or just split oout the words),  list the unique 
noun phrases, manually classify all of them  using your own criteria to give a 
pair list (phrase->category), and then match classes back to the rows that 
contain each noun phrase.





***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extracting all different items in a matrix colum or vector

2016-11-16 Thread Michael Dewey
You say you are coming back to R after a pause. I think the key word in 
Sarah's response is re-reading. Why not start with the material you used 
before and re-read it? If you have never read the Introduction to R 
which comes with your installation that is worth a read too.


On 15/11/2016 20:47, bgnumis bgnum wrote:

Many Thanks Sarah

Really I´m going to do it.

If you can suggest me one complete and didactic I will be very gratefull.
Anyway many thanks for your answer.



2016-11-15 18:56 GMT+01:00 Sarah Goslee :


Your question strongly suggests that you need to reread at least one
introductory guide to R.

But the answer to your specific question is the unique() function.

Sarah

On Tue, Nov 15, 2016 at 7:33 AM, bgnumis bgnum  wrote:

Hi all,


>From many time ago, I have return to R, I have a matrix with this

values.


C(jose, pepe, jose, luis, pepe, raul)

I want to "read" this matrix or element and extract all different values,
so the output matrix (that I want to download is:

c(jose, pepe, luis, raul).

There is a function to do this?

Many thanks in advance.


--
Sarah Goslee
http://www.functionaldiversity.org



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.