Re: [R] svykappa using the survey package

2016-06-20 Thread Anthony Damico
hi pradip, this should give you what you want


library(foreign)
library(survey)

tf <- tempfile()

download.file( "
https://meps.ahrq.gov/mepsweb/data_files/pufs/h163ssp.zip; , tf , mode =
'wb' )

z <- unzip( tf , exdir = tempdir() )

x <- read.xport( z )

names( x ) <- tolower( names( x ) )

design <- svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
data=x, nest=TRUE)

# include missings as "No" values here
design <-
update(design,
xbpchek53 = ifelse(bpchek53 ==1,'yes','no or missing'),
xcholck53 = ifelse(cholck53 ==1, 'yes','no or missing')
)

# subset out records that were missing for either variable
svykappa( ~ xbpchek53 + xcholck53 , subset(design, bpchek53 > 0 &
cholck53 > 0 ) )


















On Mon, Jun 20, 2016 at 7:49 PM, Muhuri, Pradip (AHRQ/CFACT) <
pradip.muh...@ahrq.hhs.gov> wrote:

> Hello,
>
> My goal is to calculate the weighted kappa measure of agreement between
> two factors  using the R  survey package.  I am getting the following error
> message (the console is appended below; sorry no data provided).
>
> > # calculate survey Kappa
> > svykappa(~xbpchek53+xcholck53, design)
> Error in names(probs) <- nms :
>   'names' attribute [15] must be the same length as the vector [8]
>
> I have followed the following major steps:
>
> 1) Used the "haven" package to read the sas data set into R.
> 2) Used the dplyr mutate() to create 2 new variables and converted to
> factors [required for the svykappa()?].
> 3) Created an object (named design) using the survey design variables and
> the data file.
> 4) Used the svykappa() to compute the kappa measure of agreement.
>
> I will appreciate if someone could give me hints on how to resolve the
> issue.
>
> Thanks,
>
> Pradip Muhuri
>
> ###  The detailed console is appended below
> 
>
> > setwd ("U:/A_PSAQ")
> > library(haven)
> > library(dplyr)
> > library(survey)
> > library(srvyr)
> > library(Hmisc)
> > my_hc2013_data <- read_sas("pc2013.sas7bdat")
> >
> > # Function to convert var names in upper cases to var names in lower
> cases
> > lower <- function (df) {
> +   names(df) <- tolower(names(df))
> +   df
> + }
> > my_hc2013_data <- lower(my_hc2013_data)
> >
> > # Check the contents - Hmisc package (as above) required
> > # contents(my_hc2013_data)
> >
> > # create two new variables
> > my_hc2013_data <- mutate(my_hc2013_data,
> +  xbpchek53 = ifelse(bpchek53 ==1, 1,
> + ifelse(bpchek53 %in% 2:6, 2,NA)),
> +  xcholck53 = ifelse(cholck53 ==1, 1,
> +ifelse(cholck53 %in% 2:6, 2,NA)))
> >
> > # convert the numeric variables to factors for the kappa measure
> > my_hc2013_data$xbpchek53 <- as.factor(my_hc2013_data$xbpchek53)
> > my_hc2013_data$xcholck53 <- as.factor(my_hc2013_data$xcholck53)
> >
> > # check whether the variables are factors
> > is.factor(my_hc2013_data$xbpchek53)
> [1] TRUE
> > is.factor(my_hc2013_data$xcholck53)
> [1] TRUE
> >
> >
> > # check the data from the cross table
> > addmargins(with(my_hc2013_data, table(bpchek53,xbpchek53 )))
> xbpchek53
> bpchek53 1 2   Sum
>  -9  0 0 0
>  -8  0 0 0
>  -7  0 0 0
>  -1  0 0 0
>  1   19778 0 19778
>  2   0  2652  2652
>  3   0  1014  1014
>  4   0   538   538
>  5   0   737   737
>  6   0   623   623
>  Sum 19778  5564 25342
> > addmargins(with(my_hc2013_data, table(cholck53,xcholck53 )))
> xcholck53
> cholck53 1 2   Sum
>  -9  0 0 0
>  -8  0 0 0
>  -7  0 0 0
>  -1  0 0 0
>  1   14850 0 14850
>  2   0  3153  3153
>  3   0  1170  1170
>  4   0   696   696
>  5   0   909   909
>  6   0  3764  3764
>  Sum 14850  9692 24542
> > addmargins(with(my_hc2013_data, table(xbpchek53,xcholck53 )))
>  xcholck53
> xbpchek53 1 2   Sum
>   1   14667  4379 19046
>   2 163  5225  5388
>   Sum 14830  9604 24434
> >
> > # create an object with design variables and data
> > design<-svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f,
> data=my_hc2013_data, nest=TRUE)
> >
> > # calculate survey Kappa
> > svykappa(~xbpchek53+xcholck53, design)
> Error in names(probs) <- nms :
>   'names' attribute [15] must be the same length as the vector [8]
>
> #
>
> Pradip K. Muhuri,  AHRQ/CFACT
>  5600 Fishers Lane # 7N142A, Rockville, MD 20857
> Tel: 301-427-1564
>
>
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Muhuri,
> Pradip (AHRQ/CFACT)
> Sent: Thursday, June 16, 2016 2:06 PM
> To: David Winsemius
> Cc: r-help@r-project.org
> Subject: Re: [R] dplyr's arrange function - 3 solutions 

Re: [R] Data aggregation

2016-06-20 Thread Bert Gunter
?tapply

You should have encountered this already in most basic R tutorials.
Have you gone through any? If not, you should. In particular,you need
to learn about R's basic data structures (e.g. data frames).

Alternatively, the dplyr package has many elegant tools for this sort
of thing. You might do well to learn it instead or in addition to the
*apply type operations of base R.

Finally, I should ask: is this homework? This list tries to implement
a no homework policy.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jun 20, 2016 at 2:34 PM, Paolo Letizia  wrote:
> Dear All:
> I have a data frame with 3 columns: "Regime", "Industry", and "Cost".
> I want to sum the value of "Cost" for each industry and "Regime".
> Example:
>
> The data frame is:
> Regime, Industry, Cost
> 10, 01, 370
> 11, 01, 400
> 10, 02, 200
> 10, 01, 500
> 11, 02, 60
> 10, 02, 30
>
> I want the following output:
> 01, 10, 870
> 01, 11, 400
> 02, 10, 230
> 02, 11, 600
>
> Can you please help me on this? Paolo
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] svykappa using the survey package

2016-06-20 Thread Muhuri, Pradip (AHRQ/CFACT)
Hello,

My goal is to calculate the weighted kappa measure of agreement between two 
factors  using the R  survey package.  I am getting the following error message 
(the console is appended below; sorry no data provided).

> # calculate survey Kappa
> svykappa(~xbpchek53+xcholck53, design)
Error in names(probs) <- nms : 
  'names' attribute [15] must be the same length as the vector [8]

I have followed the following major steps:

1) Used the "haven" package to read the sas data set into R.
2) Used the dplyr mutate() to create 2 new variables and converted to factors 
[required for the svykappa()?].
3) Created an object (named design) using the survey design variables and the 
data file.
4) Used the svykappa() to compute the kappa measure of agreement. 

I will appreciate if someone could give me hints on how to resolve the issue.

Thanks,

Pradip Muhuri

###  The detailed console is appended below  

> setwd ("U:/A_PSAQ")
> library(haven)
> library(dplyr)
> library(survey)
> library(srvyr)
> library(Hmisc)
> my_hc2013_data <- read_sas("pc2013.sas7bdat")
> 
> # Function to convert var names in upper cases to var names in lower cases
> lower <- function (df) {
+   names(df) <- tolower(names(df))
+   df
+ }
> my_hc2013_data <- lower(my_hc2013_data)
> 
> # Check the contents - Hmisc package (as above) required
> # contents(my_hc2013_data)
> 
> # create two new variables
> my_hc2013_data <- mutate(my_hc2013_data, 
+  xbpchek53 = ifelse(bpchek53 ==1, 1,
+ ifelse(bpchek53 %in% 2:6, 2,NA)), 
+  xcholck53 = ifelse(cholck53 ==1, 1,
+ifelse(cholck53 %in% 2:6, 2,NA)))
> 
> # convert the numeric variables to factors for the kappa measure
> my_hc2013_data$xbpchek53 <- as.factor(my_hc2013_data$xbpchek53)
> my_hc2013_data$xcholck53 <- as.factor(my_hc2013_data$xcholck53)
> 
> # check whether the variables are factors
> is.factor(my_hc2013_data$xbpchek53)
[1] TRUE
> is.factor(my_hc2013_data$xcholck53)
[1] TRUE
> 
> 
> # check the data from the cross table
> addmargins(with(my_hc2013_data, table(bpchek53,xbpchek53 )))
xbpchek53
bpchek53 1 2   Sum
 -9  0 0 0
 -8  0 0 0
 -7  0 0 0
 -1  0 0 0
 1   19778 0 19778
 2   0  2652  2652
 3   0  1014  1014
 4   0   538   538
 5   0   737   737
 6   0   623   623
 Sum 19778  5564 25342
> addmargins(with(my_hc2013_data, table(cholck53,xcholck53 )))
xcholck53
cholck53 1 2   Sum
 -9  0 0 0
 -8  0 0 0
 -7  0 0 0
 -1  0 0 0
 1   14850 0 14850
 2   0  3153  3153
 3   0  1170  1170
 4   0   696   696
 5   0   909   909
 6   0  3764  3764
 Sum 14850  9692 24542
> addmargins(with(my_hc2013_data, table(xbpchek53,xcholck53 )))
 xcholck53
xbpchek53 1 2   Sum
  1   14667  4379 19046
  2 163  5225  5388
  Sum 14830  9604 24434
> 
> # create an object with design variables and data
> design<-svydesign(id=~varpsu,strat=~varstr, weights=~perwt13f, 
> data=my_hc2013_data, nest=TRUE)
> 
> # calculate survey Kappa
> svykappa(~xbpchek53+xcholck53, design)
Error in names(probs) <- nms : 
  'names' attribute [15] must be the same length as the vector [8]

#

Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564




-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Muhuri, Pradip 
(AHRQ/CFACT)
Sent: Thursday, June 16, 2016 2:06 PM
To: David Winsemius
Cc: r-help@r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New 
Question

Hello David,

Your revisions to the earlier code have given me desired results.

library("gtools")
mydata[ mixedorder(mydata$prevalence_c, decreasing=TRUE), c("indicator", 
"prevalence_c")  ]

Thanks,

Pradip


Pradip K. Muhuri,  AHRQ/CFACT
 5600 Fishers Lane # 7N142A, Rockville, MD 20857
Tel: 301-427-1564





-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net] 
Sent: Thursday, June 16, 2016 12:54 PM
To: Muhuri, Pradip (AHRQ/CFACT)
Cc: r-help@r-project.org
Subject: Re: [R] dplyr's arrange function - 3 solutions received - 1 New 
Question


> On Jun 16, 2016, at 6:12 AM, Muhuri, Pradip (AHRQ/CFACT) 
>  wrote:
> 
> Hello,
> 
> I got 3 solutions to my earlier code.  Thanks to the contributors.  May I 
> bring your attention to  a new question below (with respect to David's 
> solution)?
> 
> 1) Thanks to Daniel Nordlund  for the tips - replacing leading space with a 0 
>  in the data.
> 
> 2)  Thanks to David Winsemius for  his  solution with the gtools::mixedorder 
> function.   I  have added an argument to his.
> 
> mydata[ 

[R] Data aggregation

2016-06-20 Thread Paolo Letizia
Dear All:
I have a data frame with 3 columns: "Regime", "Industry", and "Cost".
I want to sum the value of "Cost" for each industry and "Regime".
Example:

The data frame is:
Regime, Industry, Cost
10, 01, 370
11, 01, 400
10, 02, 200
10, 01, 500
11, 02, 60
10, 02, 30

I want the following output:
01, 10, 870
01, 11, 400
02, 10, 230
02, 11, 600

Can you please help me on this? Paolo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] replacement has 0 rows, data has 2809

2016-06-20 Thread Jim Lemon
Hi Humberto,
It may simply be that the file is C(omma)SV format and the default
separator for read.delim is a TAB character. Try read.csv.

Jim


On Tue, Jun 21, 2016 at 2:14 AM, Humberto Munoz Barona
 wrote:
> Hi Jim,
> Thanks for your reply. length(lens) gives me 6, which is the size of lens in 
> the previous run with a shorter file. length(data1)=1, that means data1 is 
> not reading the data from the file DarkAerobic1.CSV, which contains the four 
> columns in this order Gene ID, Length, ReadCount, and Normalized Coverage. I 
> want the vector lens = Length and cnts = ReadCounts. How I can make this 
> import of data correctly?
>
>  > data1 <- read.delim("DarkAerobic1.CSV", check.names=FALSE, 
> stringsAsFactors=FALSE)
>> lenght(data1)
> Error: could not find function "lenght"
>> length(data1)
> [1] 1
>
> I need to calculate two normalizations with the vectors lens and cnts, and 
> have the two options for sorting the normalizations up or down.
>
> Thanks for any help you can give me to fix this issue.
>
> Humberto
>
>> On Jun 18, 2016, at 12:19 AM, Jim Lemon  wrote:
>>
>> Hi Humberto,
>> The "0 row" error usually arises from a calculation in which a
>> non-existent object is used. I see that you have created a vector with
>> the name "lens" and that may be where this is happening. Have a look
>> at:
>>
>> length(lens)
>>
>> or if it is not too long, just:
>>
>> lens
>>
>> If it is zero length, that is your problem. This might be due to
>> "data1" not having a column named "Length" or it may not contain
>> numeric values (i.e. a factor)..
>>
>> Jim
>>
>>
>> On Sat, Jun 18, 2016 at 9:53 AM, Humberto Munoz Barona
>>  wrote:
>>> I am running the following R-code
>>>
>>> countToTpm <- function(counts, effLen)
>>> {
>>>  rate <- log(counts) - log(effLen)
>>>  denom <- log(sum(exp(rate)))
>>>  exp(rate - denom + log(1e6))
>>> }
>>>
>>> countToFpkm <- function(counts, effLen)
>>> {
>>>  N <- sum(counts)
>>>  exp( log(counts) + log(1e9) - log(effLen) - log(N) )
>>> }
>>>
>>> fpkmToTpm <- function(fpkm)
>>> {
>>>  exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
>>> }
>>>
>>> countToEffCounts <- function(counts, len, effLen)
>>> {
>>>  counts * (len / effLen)
>>> }
>>> 
>>> # An example
>>> 
>>> data1 <- read.delim("Dark Aerobic1.csv", check.names=FALSE, 
>>> stringsAsFactors=FALSE)
>>> cnts <- data1['ReadCount']
>>> lens <- data1['Length']
>>> countDf <- data.frame(count = cnts, length = lens)
>>>
>>> # assume a mean(FLD) = 170.71
>>>
>>> countDf$effLength <- countDf$length - 170.71 + 1
>>> countDf$tpm <- with(countDf, countToTpm(count, effLength))
>>> countDf$fpkm <- with(countDf, countToFpkm(count, effLength))
>>> with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
>>> countDf$effCounts <- with(countDf, countToEffCounts(count, length, 
>>> effLength))
>>>
>>> I am receiving the errors
>>>
 countDf$effLength <- countDf$length - 170.71 + 1
>>> Error in `$<-.data.frame`(`*tmp*`, "effLength", value = numeric(0)) :
>>>  replacement has 0 rows, data has 2809
 countDf$tpm <- with(countDf, countToTpm(count, effLength))
>>> Error in countToTpm(count, effLength) : object 'count' not found
 countDf$fpkm <- with(countDf, countToFpkm(count, effLength))
>>> Error in countToFpkm(count, effLength) : object 'count' not found
 with(countDf, all.equal(tpm, fpkmToTpm(fpkm)))
>>> Error in all.equal(tpm, fpkmToTpm(fpkm)) : object 'tpm' not found
 countDf$effCounts <- with(countDf, countToEffCounts(count, length, 
 effLength))
>>> Error in countToEffCounts(count, length, effLength) :
>>>  object 'count' not found

>>>
>>> Thanks for any help to fix this error
>>>
>>> Humberto Munoz
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help contingency table

2016-06-20 Thread Jim Lemon
Hi Lucie,
You can visualize this using the sizetree function (plotrix). You
supply a data frame of the individual choice sequences.

# form a data frame of "random" choices
coltrans<-data.frame(choice1=sample(c("High","Medium","Low"),100,TRUE),
 choice2=sample(c("High","Medium","Low"),100,TRUE))
sizetree(coltrans,main="Random color choice transitions")
# test the two way table of transitions for independence
chisq.test(table(coltrans))
# now try a data frame of "habitual" choices
coltrans2<-data.frame(choice1=rep(c("High","Medium","Low"),c(33,33,34)),
 choice2=c(sample(c("High","Medium","Low"),33,TRUE,prob=c(0.6,0.2,0.2)),
 sample(c("High","Medium","Low"),33,TRUE,prob=c(0.2,0.6,0.2)),
 sample(c("High","Medium","Low"),34,TRUE,prob=c(0.2,0.2,0.6
sizetree(coltrans2,main="Habitual color choice transitions")
# test the table again
chisq.test(table(coltrans2))

This may be what you want.

Jim


On Mon, Jun 20, 2016 at 12:09 PM, Lucie Dupond  wrote:
> Hello,
> I'm sorry if my question is really basic, but I'm having some troubles with 
> the statistics for my thesis, and especially the khi square test and 
> contingency tables.
>
> For what I understood, there are two "kinds" of khisquare test, that are 
> quite similar :
> - Homogeneity, when we have one variable and we want to compare it with a 
> theorical distribution
> - Independence test, when we have 2 variable and we want to see if they are 
> linked
>
> -- -
>
> I'm working on color transitions, with 3 possible factors : « High » , « 
> Medium » and « Low »
> I want to know if an individual will go preferably from a color « High » to 
> another color « High », more than from a color « High » to a color « Medium » 
> (for example)
>
> I have this table :
>
> trans1<-c(51,17,27,12,21,13,37,15,60)
> transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T)
> rownames(transitions1) <- c("High"," Medium", "Low")
> colnames(transitions1) <- c("High"," Medium", "Low")
>
> The first colomn is showing the first color, and the second is showing the 
> second color of the transition
>
> It looks like I'm in the case of an Independence test, in order to see if the 
> variable "second color" is linked to the "first color".
>
> So I'm making the test :
>
> chisq.test(transitions1)
>
>
> (If I understood well, the test on the matrix is the independence  test, and 
> the test on the vector trans1 is the homogeneity test ?)
>
> The result is significatif, it means that some transitions are prefered.
>
> My problem is that I have other transition tables like this one (with other 
> individuals or other conditions)
> For example, I also have this one :
>
>
> trans2<-c(13,7,8,5,16,18,11,8,17)
> transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T)
> rownames(transitions2) <- c("High","Low", "Stick")
> colnames(transitions2) <- c("High","Low", "Stick")
>
> I want to know if the "prefered" transitions in the table 1 are the same in 
> the table 2.
> But if I try a khisquare test on those two matrix, R only takes the first one.
>
> How can I compare those tables
> Maybe with another test ?
>
> Thanks in advance !
>
> Kind regards
>
> Lucie S.
>
> [[alternative HTML version deleted]]
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help contingency table

2016-06-20 Thread Lucie Dupond
Thank you for your answer !

I'm sorry, i've made a mistake in the second matrix, they should have the same 
row/column labels, I just used another label vector by mistake.

My supervisor doesn't have a solution for this, and neither have every one I 
asked around me.

Thanks for your solution, but I'm afraid that I will loose the interaction 
between the variable "first color" and "second color" if I convert the matrix 
into a vector.


Thank you for your help




De : David L Carlson 
Envoy� : lundi 20 juin 2016 21:06
� : Lucie Dupond; r-help@r-project.org
Objet : RE: R help contingency table

You should consult with your adviser or someone at your institution who has 
more experience in statistical analysis than you do. You want to compare the 
matrices, but the row/column labels are different so you may be comparing 
completely different categories.

Technically, you need to convert the two matrices into a single matrix. You can 
do that by converting each into a vector with the c() function. BUT this will 
compare High with High, Medium with Low, and Low with Stick which seems 
inadvisable.

> rbind(c(transitions1), c(transitions2))
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]   51   12   37   17   21   15   27   13   60
[2,]   135   117   1688   18   17
> chisq.test(rbind(c(transitions1), c(transitions2)))

Pearson's Chi-squared test

data:  rbind(c(transitions1), c(transitions2))
X-squared = 22.411, df = 8, p-value = 0.004208

Warning message:
In chisq.test(rbind(c(transitions1), c(transitions2))) :
  Chi-squared approximation may be incorrect

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Lucie Dupond
Sent: Sunday, June 19, 2016 9:10 PM
To: r-help@r-project.org
Subject: [R] R help contingency table

Hello,
I'm sorry if my question is really basic, but I'm having some troubles with the 
statistics for my thesis, and especially the khi square test and contingency 
tables.

For what I understood, there are two "kinds" of khisquare test, that are quite 
similar :
- Homogeneity, when we have one variable and we want to compare it with a 
theorical distribution
- Independence test, when we have 2 variable and we want to see if they are 
linked

-- -

I'm working on color transitions, with 3 possible factors : ? High ? , ? Medium 
? and ? Low ?
I want to know if an individual will go preferably from a color ? High ? to 
another color ? High ?, more than from a color ? High ? to a color ? Medium ? 
(for example)

I have this table :

trans1<-c(51,17,27,12,21,13,37,15,60)
transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T)
rownames(transitions1) <- c("High"," Medium", "Low")
colnames(transitions1) <- c("High"," Medium", "Low")

The first colomn is showing the first color, and the second is showing the 
second color of the transition

It looks like I'm in the case of an Independence test, in order to see if the 
variable "second color" is linked to the "first color".

So I'm making the test :

chisq.test(transitions1)


(If I understood well, the test on the matrix is the independence  test, and 
the test on the vector trans1 is the homogeneity test ?)

The result is significatif, it means that some transitions are prefered.

My problem is that I have other transition tables like this one (with other 
individuals or other conditions)
For example, I also have this one :


trans2<-c(13,7,8,5,16,18,11,8,17)
transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T)
rownames(transitions2) <- c("High","Low", "Stick")
colnames(transitions2) <- c("High","Low", "Stick")

I want to know if the "prefered" transitions in the table 1 are the same in the 
table 2.
But if I try a khisquare test on those two matrix, R only takes the first one.

How can I compare those tables
Maybe with another test ?

Thanks in advance !

Kind regards

Lucie S.

[[alternative HTML version deleted]]


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] patterns in numeric vector

2016-06-20 Thread Bert Gunter
Oops  -- neglected to cc the list. Also note the correction at the
end, changing "starts" to "begins".

-- Bert




On Mon, Jun 20, 2016 at 2:33 PM, Bert Gunter  wrote:
 Thanks for the reproducible example -- it made your meaning clear.

This is the sort of thing for which rle() is useful. If you go through
the following step by step it should be clear what's going on.

 z<-c(7,223,42,55,30,25,61,5,70)
x <- 40
rl <- rle( z < x)  ## runs of TRUE and FALSE (logicals)
## Note that you may wish to change this to <=

lens <- rl$lengths## lengths of runs
ends <- cumsum(lens)   ##  indices where the runs end
begins <- c(1,ends[-length(ends)]+1)  ## indices where the runs begin

## now use logical indexing to pick out only the runs meeting the
condition that  z < x
vals <- rl$values
begins[vals]
ends[vals]


Note: This is the sort of query for which someone cleverer than I may
have a simpler or more efficient solution. If so, please post it so I
and others can learn from it.

 Cheers,
 Bert



 Bert Gunter

 "The trouble with having an open mind is that people keep coming along
 and sticking things into it."
 -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

>
> On Mon, Jun 20, 2016 at 11:58 AM, C Lin  wrote:
>> Hello,
>>
>> Can someone help me with this?
>>
>> I am trying to find the start and end positions in a vector where numbers 
>> less than x is surrounded by number(s) greater than x.
>>  For example:
>>  try = c(7,223,42,55,30,25,61,5,70)
>>  x=40
>>
>>  The desired output would be:
>>
>>> loc
>> start end
>> 1 5  6
>> 2 8   8
>>
>> So the numbers I am interested in finding is: 30, 25 and the start= 5 and 
>> end = 6
>> Also, 5 with the start=8 and end = 8
>>
>> Thank you in advance for your help.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ASA Conference on Statistical Practice - deadline Thursday

2016-06-20 Thread Adams, Jean
R users,

Abstracts are now being accepted for the
 ASA Conference on Statistical Practice
 February 23-25, 2017
 Jacksonville FL, USA

Past conference attendees have shown particular interest in R,
reproducibility, and data visualization.

The deadline for submission is June 23.  Presentations will be 35 minutes
long and fall into four broad themes:
 Communication, Collaboration, and Career Development
 Data Modeling and Analysis
 Big Data and Data Science
 Software, Programming, and Graphics

Abstracts may be submitted at
 http://www.amstat.org/meetings/csp/2017/submitabstract.cfm

Thank you.

Jean V. Adams
on behalf of the ASA-CSP 2017 Steering Committee



`·.,,  ><(((º>   `·.,,  ><(((º>   `·.,,  ><(((º>

Jean V. Adams
Statistician
U.S. Geological Survey
Great Lakes Science Center
223 East Steinfest Road
Antigo, WI 54409  USA
http://www.glsc.usgs.gov
http://profile.usgs.gov/jvadams

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] No reply from CRAN Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

2016-06-20 Thread Achim Zeileis

On Mon, 20 Jun 2016, Joseph Gama wrote:


Hi all,

I emailed a suggestion to Nicholas Lewin-Koh, the maintainer of the CRAN 
Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & 
Visualization. I got no reply, so I wonder, is he still maintaining that 
view? If not, then who else does or will maintain it?


To the best of my knowledge he is still maintaining it. I cc'ed Nicholas 
in this reply.



BR,

José Gama

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Generating input population for microsimulation

2016-06-20 Thread Dielia Ba
Hi everyone, 
I really need your help !! 
I am currently working on a micro-simulation project and I cannot find a 
package in R that does what I want. 
Here is the picture: I have macroeconomic variables such as 
income,consumption, household weight and I calculated the elasticities 
already. 
I also have two other data sets with income growth rates and population 
projection. What I want is to create a data set with an income variable for 
each year (from 2014 to 2030) and the same thing for consumption, based on 
the existing patterns in the input data sets. 
Do I really have to code my own R package to perform the micro- simulation 
? 
FYI: I tried almost all R packages related  to micro-simulation or 
simulation ( mostly spatial - demographic and health- survival designed 
tools) 
I would really appreciate any constructive comments and remarks.
Thanks a lot, 
Dielia 

Le mardi 13 décembre 2011 18:08:21 UTC-5, Emma Thomas a écrit :
>
> Hi all,
>
> I've been struggling with some code and was wondering if you all could 
> help.
>
> I am trying to generate a theoretical population of P people who are 
> housed within X different units. Each unit follows the same structure- 10 
> people per unit, 8 of whom are junior and two of whom are senior. I'd like 
> to create a unit ID and a unique identifier for each person (person ID, 
> PID) in the population so that I have a matrix that looks like:
>
>  unit_id pid senior
>   [1,]  1   1  0
>   [2,]  1   2  0
>   [3,]  1   3  0
>   [4,]  1   4  0
>   [5,]  1   5  0
>   [6,]  1   6  0
>   [7,]  1   7  0
>   [8,]  1   8  0
>   [9,]  1   9  1
>   [10,]1   10   1
> ...
>
> I came up with the following code, but am having some trouble getting it 
> to populate my matrix the way I'd like.
>
> world <- function(units, pop_size, unit_size){
> pid <- rep(0,pop_size) #person ID
> senior <- rep(0,pop_size) #senior in charge
> unit_id <- rep(0,pop_size) #unit ID
> 
> for (i in 1:pop_size){
> for (f in 1:units){  
> senior[i] = sample(c(1,1,0,0,0,0,0,0,0,0), 1, replace = FALSE)
> pid[i] = sample(c(1:10), 1, replace = FALSE)
> unit_id[i] <- f
> }}
> data <- cbind(unit_id, pid, senior)
> 
> return(data)
> }
>
> world(units = 10,pop_size = 100, unit_size = 10) #call the function
>
> The output looks like:
>  unit_id pid senior
>   [1,]  10   7  0
>   [2,]  10   4  0
>   [3,]  10  10  0
>   [4,]  10   9  1
>   [5,]  10  10  0
>   [6,]  10   1  1
> ...
>
> but what I really want is to generate is 10 different units with two 
> seniors per unit, and with each person in the population having a unique 
> identifier.
>
> I thought a nested for loop was one way to go about creating my data set 
> of people and families, but obviously I'm doing something (or many things) 
> wrong. Any suggestions on how to fix this? I had been focusing on creating 
> a person and assigning them to a unit, but perhaps I should create the 
> units and then populate the units with people?
>
> Thanks so much in advance.
>
> Emma
>
> __
> r-h...@r-project.org  mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] No reply from CRAN Task View: Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization

2016-06-20 Thread Joseph Gama
Hi all,

I emailed a suggestion to Nicholas Lewin-Koh, the maintainer of the CRAN
Task View: Graphic Displays & Dynamic Graphics & Graphic Devices &
Visualization. I got no reply, so I wonder, is he still maintaining that
view?
If not, then who else does or will maintain it?

BR,

José Gama

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R crashed on Mips64 on executing library for certain package

2016-06-20 Thread Shashank Tadisina
Hi All,

I was trying to get R cross-compiled for Mips64. The build system architecture 
I used is x86-64 while the system architecture to which R is cross-compiled to 
is Mips64
The below are the details of the mips64 system where I am running R

~ # uname -a
Linux (none) 2.6.32.27-Cavium-Octeon #3 SMP Tue Jun 14 11:06:49 PDT 2016 mips64 
GNU/Linux

My requirement was to run few ARIMA models on the mips64 system. So, I decided 
to use the forecast package which had dependencies on several other packages. 
So, I cross-compiled all the required packages along with the R-base. But when 
I try to load "timeDate" package, R crashes. Below is the output. If you see 
below output, several other packages are getting successfully loaded. What I 
also found was timeDate package took long time to load and eventually crashed. 
So, I thought timeout could be an issue and tried to change timeout value using 
options(timeout = 300). Still R crashed and I am pretty sure timeout is not the 
issue as R crashed within a minute.
I am clueless as to how to debug this issue. Any insights would really help. 
Thanks in advance.
> library("Rcpp")
> library("RcppArmadillo")
> library("fracdiff")
> library("timeDate")
Creating a generic function for 'sample' from package 'base' in package 
'timeDate'
Killed

Thanks
Shashank


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] patterns in numeric vector

2016-06-20 Thread C Lin
Hello,

Can someone help me with this?

I am trying to find the start and end positions in a vector where numbers less 
than x is surrounded by number(s) greater than x.
 For example:
 try = c(7,223,42,55,30,25,61,5,70)
 x=40

 The desired output would be:

> loc
    start end
1 5  6
2     8   8

So the numbers I am interested in finding is: 30, 25 and the start= 5 and end = 
6
Also, 5 with the start=8 and end = 8
  
Thank you in advance for your help.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help contingency table

2016-06-20 Thread David L Carlson
You should consult with your adviser or someone at your institution who has 
more experience in statistical analysis than you do. You want to compare the 
matrices, but the row/column labels are different so you may be comparing 
completely different categories.

Technically, you need to convert the two matrices into a single matrix. You can 
do that by converting each into a vector with the c() function. BUT this will 
compare High with High, Medium with Low, and Low with Stick which seems 
inadvisable. 

> rbind(c(transitions1), c(transitions2))
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]   51   12   37   17   21   15   27   13   60
[2,]   135   117   1688   18   17
> chisq.test(rbind(c(transitions1), c(transitions2)))

Pearson's Chi-squared test

data:  rbind(c(transitions1), c(transitions2))
X-squared = 22.411, df = 8, p-value = 0.004208

Warning message:
In chisq.test(rbind(c(transitions1), c(transitions2))) :
  Chi-squared approximation may be incorrect

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Lucie Dupond
Sent: Sunday, June 19, 2016 9:10 PM
To: r-help@r-project.org
Subject: [R] R help contingency table

Hello,
I'm sorry if my question is really basic, but I'm having some troubles with the 
statistics for my thesis, and especially the khi square test and contingency 
tables.

For what I understood, there are two "kinds" of khisquare test, that are quite 
similar :
- Homogeneity, when we have one variable and we want to compare it with a 
theorical distribution
- Independence test, when we have 2 variable and we want to see if they are 
linked

-- -

I'm working on color transitions, with 3 possible factors : � High � , � Medium 
� and � Low �
I want to know if an individual will go preferably from a color � High � to 
another color � High �, more than from a color � High � to a color � Medium � 
(for example)

I have this table :

trans1<-c(51,17,27,12,21,13,37,15,60)
transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T)
rownames(transitions1) <- c("High"," Medium", "Low")
colnames(transitions1) <- c("High"," Medium", "Low")

The first colomn is showing the first color, and the second is showing the 
second color of the transition

It looks like I'm in the case of an Independence test, in order to see if the 
variable "second color" is linked to the "first color".

So I'm making the test :

chisq.test(transitions1)


(If I understood well, the test on the matrix is the independence  test, and 
the test on the vector trans1 is the homogeneity test ?)

The result is significatif, it means that some transitions are prefered.

My problem is that I have other transition tables like this one (with other 
individuals or other conditions)
For example, I also have this one :


trans2<-c(13,7,8,5,16,18,11,8,17)
transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T)
rownames(transitions2) <- c("High","Low", "Stick")
colnames(transitions2) <- c("High","Low", "Stick")

I want to know if the "prefered" transitions in the table 1 are the same in the 
table 2.
But if I try a khisquare test on those two matrix, R only takes the first one.

How can I compare those tables
Maybe with another test ?

Thanks in advance !

Kind regards

Lucie S.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] loop testing unidentified columns

2016-06-20 Thread Brittany Demmitt
Thank you!

> On Jun 20, 2016, at 12:41 PM, David L Carlson  wrote:
> 
> It does not test the first column, but a vector must have consecutive 
> indices. Since you did not assign a value, R inserts a missing value. If you 
> don't want to see it use
> 
>> results.pc.all[, -1]
>  [,1] [,2]
> results.212
> results.323
> 
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Brittany 
> Demmitt
> Sent: Monday, June 20, 2016 12:15 PM
> To: r-help@r-project.org
> Subject: [R] loop testing unidentified columns
> 
> Hello,
> 
> I want to compare all of the columns of one data frame to another to see if 
> any of the columns are equivalent to one another. The first column in both of 
> my data frames are the sample IDs and do not need to be compared. Below is an 
> example of the loop I am using to compare the two data frames that counts the 
> number of equivalent values there between two columns. So in this example the 
> value of 3 means that all three observations for the two columns being 
> compared were equivalent. The loop works fine but I do not understand why it 
> tests the first column of the sample IDs providing “NA” for the sum of 
> matching when my loop is specifying to only test columns 2-3.  
> 
> Thank you!
> 
> 
> #create dataframe A 
> A = matrix(c("a",3,4,"b",5,7,"c",3,7),nrow=3, ncol=3,byrow = TRUE)
> A <- as.data.frame(A)
> A$V2 <- as.numeric(A$V2)
> A$V3 <- as.numeric(A$V3)
> str(A)
> 
> #create dataframe B
> B = matrix(c("a",1,1,"b",6,2,"c",2,2),nrow=3, ncol=3,byrow = TRUE)
> B <- as.data.frame(B)
> B$V2 <- as.numeric(B$V2)
> B$V3 <- as.numeric(B$V3)
> str(B)
> 
> results.2 <- numeric()
> results.3  <- numeric()
> 
> 
> #compare columns to identify those that are identical in the two dataframes 
> for(i in 2:3){
>  results.2[i] <- sum(A[,2]==B[,i])
>  results.3[i] <- sum(A[,3]==B[,i])
>  results.pc.all <- rbind(results.2,results.3)
> }
> results.pc.all
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merging df with world map

2016-06-20 Thread boB Rudis
you also don't need to do a merger if you use a base `geom_map()`
layer with the polygons and another using the fill (or points, lines,
etc).

On Fri, Jun 17, 2016 at 5:08 PM, MacQueen, Don  wrote:
> And you can check what David and Jeff suggested like this:
>
> intersect( df$COUNTRY, world_map$region )
>
> If they have any values in common, that command will show them. (Note that
> I said values in common, not countries in common.)
>
> WARNING:
> It appears that you have each country appearing more than once in both of
> the data frames. Even if the country names were spelled the same (which
> they are not in the first few rows), I would not care to predict the
> outcome of a many-to-many merge. It probably won't make sense for showing
> the data on a map.
>
> -Don
>
> --
> Don MacQueen
>
> Lawrence Livermore National Laboratory
> 7000 East Ave., L-627
> Livermore, CA 94550
> 925-423-1062
>
>
>
>
>
> On 6/17/16, 1:06 PM, "R-help on behalf of ch.elahe via R-help"
>  wrote:
>
>>Hi all,
>>I want to use world map in ggplot2 and show my data on world map. my df
>>is:
>>
>>
>>$ COUNTRY   : chr  "DE" "DE" "FR" "FR" ..
>>
>>$ ContrastColor : int  9 9 9 9 13 9 9 9 9 ..
>>
>>$ quant : Factor w/ 4 levels "FAST","SLOW",..I need to
>>merge my df with world_map data which is like this:
>>
>>
>>world_map=map_data("world")
>>data.frame':   99338 obs. of  6 variables:
>>$ long : num  -69.9 -69.9 -69.9 -70 -70.1 ...
>>$ lat  : num  12.5 12.4 12.4 12.5 12.5 ...
>>$ group: num  1 1 1 1 1 1 1 1 1 1 ...
>>$ order: int  1 2 3 4 5 6 7 8 9 10 ...
>>$ region   : chr  "Aruba" "Aruba" "Aruba" "Aruba" ...
>>$ subregion: chr  NA NA NA NA ...
>>but by merging my df with world map data I get a data frame with zero
>>observation in it,I use this command for merging:
>>
>>
>>world_map=merge(world_map,df,by.x="region",by.y="COUNTRY")
>>str(world_map)
>>
>>'data.frame':   0 obs. of  133 variables:
>>$ region: chr
>>$ long  : num
>>$ lat   : num
>>$ group : num
>>$ order : int
>>$ subregion : chr
>>does anyone know what is the problem of this merging that I am currently
>>using?
>>thanks for any help!
>>Elahe
>>
>>__
>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop testing unidentified columns

2016-06-20 Thread David L Carlson
It does not test the first column, but a vector must have consecutive indices. 
Since you did not assign a value, R inserts a missing value. If you don't want 
to see it use

> results.pc.all[, -1]
  [,1] [,2]
results.212
results.323

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352

-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Brittany Demmitt
Sent: Monday, June 20, 2016 12:15 PM
To: r-help@r-project.org
Subject: [R] loop testing unidentified columns

Hello,

I want to compare all of the columns of one data frame to another to see if any 
of the columns are equivalent to one another. The first column in both of my 
data frames are the sample IDs and do not need to be compared. Below is an 
example of the loop I am using to compare the two data frames that counts the 
number of equivalent values there between two columns. So in this example the 
value of 3 means that all three observations for the two columns being compared 
were equivalent. The loop works fine but I do not understand why it tests the 
first column of the sample IDs providing “NA” for the sum of matching when my 
loop is specifying to only test columns 2-3.  

Thank you!


#create dataframe A 
A = matrix(c("a",3,4,"b",5,7,"c",3,7),nrow=3, ncol=3,byrow = TRUE)
A <- as.data.frame(A)
A$V2 <- as.numeric(A$V2)
A$V3 <- as.numeric(A$V3)
str(A)

#create dataframe B
B = matrix(c("a",1,1,"b",6,2,"c",2,2),nrow=3, ncol=3,byrow = TRUE)
B <- as.data.frame(B)
B$V2 <- as.numeric(B$V2)
B$V3 <- as.numeric(B$V3)
str(B)

results.2 <- numeric()
results.3  <- numeric()


#compare columns to identify those that are identical in the two dataframes 
for(i in 2:3){
  results.2[i] <- sum(A[,2]==B[,i])
  results.3[i] <- sum(A[,3]==B[,i])
  results.pc.all <- rbind(results.2,results.3)
}
results.pc.all

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in setwd() : argument "dir" is missing, with no default

2016-06-20 Thread David L Carlson
You cannot use setwd() without an argument:
> setwd()
Error in setwd() : argument "dir" is missing, with no default

If you want to choose a directory use choose.dir(). But if you are using 
RStudio, you can use the Files tab in the window on the lower right. Navigate 
to the folder/directory you want and then click the More tab and select "Set As 
Working Directory."


David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of PIKAL Petr
Sent: Monday, June 20, 2016 7:59 AM
To: Shivi Bhatia; r-help@r-project.org
Subject: Re: [R] Error in setwd() : argument "dir" is missing, with no default

Hi

maybe it is feature of RStudio so you shall probably ask there. I use to start 
each project in a separate folder and I always start R by doubleclick on .RData 
icon.

So for each project I have different .RData.

Beware that Windows keeps you safe and usually hides files with dot at the 
beginning so you need to allow such files to be displayed.

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Shivi
> Bhatia
> Sent: Sunday, June 19, 2016 10:23 PM
> To: r-help@r-project.org
> Subject: [R] Error in setwd() : argument "dir" is missing, with no default
>
> Dear Team,
>
> I have searched for this error at various forums but enable to find a relevant
> solution.
>
> When i had installed R studio the WD was saved at a particular location now
> when i try to change it gives me this error:
>
> Error in setwd() : argument "dir" is missing, with no default. I have tried
> setting the WD using Shift+ Ctrl+ H or using setwd() command. While it
> changes the WD for that particular session but when i restart the session it 
> is
> again reset to the old previous location hence every instance i have to reset
> this as all my data and other files are saved at another location.
> While searching at some of the forums like stat exchange it was advised to
> use setwd("../") as it selects your WD one step back however with this also i
> cant fix the issue.
>
> Kindly advice at the earliest.
>
> Thanks, Shivi
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or 

[R-es] Ayuda con la exportación de datos

2016-06-20 Thread Sebastián Rangel
Un saludo a todos

Tengo un problema con la exportación de una serie de tiempo la cual le
impute unos datos faltantes . He usado el comando
write.table(datos1, "imputacion.csv",dec=".", sep=";",eol = "\r" )

Los datos salen bien en R

> head(datos1)

 brent  wti Fechas  Fecha
1 15.65000 16.01000 24/06/1988 24/06/1988
2 15.37683 15.86190 25/06/1988 25/06/1988
3 15.20872 15.77077 26/06/1988 26/06/1988
4 15.1 15.86000 27/06/1988 27/06/1988
5 15.27000 15.78000 28/06/1988 28/06/1988
6 14.97000 15.43000 29/06/1988 29/06/1988

Pero al exportarlos me salen

[image: Imágenes integradas 1]
Los valores que salen mal son los que fueron imputados en R con esta
funciòn
imputaion1=na.ma(Precio.Brent, k = 6, weighting = "exponential")

Agradezco su colaboración,

Henry Sebastián Rangel Quiñonez
Estudiante M.C.Estadística, UNAL.
___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es

[R] loop testing unidentified columns

2016-06-20 Thread Brittany Demmitt
Hello,

I want to compare all of the columns of one data frame to another to see if any 
of the columns are equivalent to one another. The first column in both of my 
data frames are the sample IDs and do not need to be compared. Below is an 
example of the loop I am using to compare the two data frames that counts the 
number of equivalent values there between two columns. So in this example the 
value of 3 means that all three observations for the two columns being compared 
were equivalent. The loop works fine but I do not understand why it tests the 
first column of the sample IDs providing “NA” for the sum of matching when my 
loop is specifying to only test columns 2-3.  

Thank you!


#create dataframe A 
A = matrix(c("a",3,4,"b",5,7,"c",3,7),nrow=3, ncol=3,byrow = TRUE)
A <- as.data.frame(A)
A$V2 <- as.numeric(A$V2)
A$V3 <- as.numeric(A$V3)
str(A)

#create dataframe B
B = matrix(c("a",1,1,"b",6,2,"c",2,2),nrow=3, ncol=3,byrow = TRUE)
B <- as.data.frame(B)
B$V2 <- as.numeric(B$V2)
B$V3 <- as.numeric(B$V3)
str(B)

results.2 <- numeric()
results.3  <- numeric()


#compare columns to identify those that are identical in the two dataframes 
for(i in 2:3){
  results.2[i] <- sum(A[,2]==B[,i])
  results.3[i] <- sum(A[,3]==B[,i])
  results.pc.all <- rbind(results.2,results.3)
}
results.pc.all

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to visualize this df

2016-06-20 Thread PIKAL Petr
Hi

you still should post a snippet of your data to help others better understand.

It should be something like
p<- ggplot(dat, aes(x=Protocol, y=NRuns, fill=Speed))
p+geom_bar(stat="identity")

But as you have 132 levels of protocol unless you have big big monitor you will 
have problems to display all protocols properly.

You could try to use points but it probably does not help much.

You could try similar approach as here
http://www.phaget4.org/R/image_matrix.html

or you could try to tweek your table to fit image function.

Cheers
Petr


> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rainer M
> Krug
> Sent: Monday, June 20, 2016 2:48 PM
> To: ch.elahe via R-help 
> Subject: Re: [R] How to visualize this df
>
>  writes:
>
> >  Hi Rainer,
>
> Please keep this on the mailing list for info.
>
> > Thanks for your reply. I want to show NRuns for each Protocol in my df
> > and color it by Speed. I think it's possible by a bar chart but I am
> > confused how to subset my df for using Bar chart in ggplot
>
> Sorry - haven't used ggplot in ages.
>
> Can't help you with that.
>
> Rainer
>
> >
> >
> > On Monday, June 20, 2016 1:29 PM, Rainer M Krug 
> wrote:
> > "ch.elahe via R-help"  writes:
> >
> >> Hi all,
> >> I have a question about how to visualize my df! here is my df I need to
> visualize:
> >>
> >> 'data.frame':   455 obs. of 128 variables:
> >> $Protocol  :Factor w/132 levels "_unknown","PD FS SAG","T1 SAG
> FS","T2 FS OR",...
> >> $NRuns : int   45 45 156 75 89 69 ..
> >> $Speed :Factor w/4 levels "Slow","Fast","VeryFast","VerySlow"
> >> NRuns is actually number of times that the customer used the protocol
> >> and speed is how did the costumer run the Protocol. Each Protocol can
> >> have different NRuns. Do you know what's the best way to visualize
> >> this df?
> >
> > That depends what you want to show. And that determines the best
> > visualization.
> >
> > Also: what do you want to use it for: an interactive presentation may
> > call for different visualizations than a printed report.
> >
> > Cheers,
> >
> > Rainer
> >
> >
> >> Thanks for any help!
> >> Elahe
> >>
>
> --
> Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation
> Biology, UCT), Dipl. Phys. (Germany)
>
> Centre of Excellence for Invasion Biology Stellenbosch University South Africa
>
> Tel :   +33 - (0)9 53 10 27 44
> Cell:   +33 - (0)6 85 62 59 98
> Fax :   +33 - (0)9 58 10 27 44
>
> Fax (D):+49 - (0)3 21 21 25 22 44
>
> email:  rai...@krugs.de
>
> Skype:  RMkrug
>
> PGP: 0x0F52F982


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this 

Re: [R] R help contingency table

2016-06-20 Thread S Ellison
> The first colomn is showing the first color, and the second is showing the
> second color of the transition
Are you sure?
transitions1 is a 3x3 matrix; it has three columns, not two. 

Could it be that the columns are colour 2 following initial condition given by 
row, or vice versa?

[not that that will help _me_ answer your question, but it may help someone 
else].

S Ellison



***
This email and any attachments are confidential. Any use, copying or
disclosure other than by the intended recipient is unauthorised. If 
you have received this message in error, please notify the sender 
immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com 
and delete this message and any copies from your computer and network. 
LGC Limited. Registered in England 2991879. 
Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in setwd() : argument "dir" is missing, with no default

2016-06-20 Thread PIKAL Petr
Hi

maybe it is feature of RStudio so you shall probably ask there. I use to start 
each project in a separate folder and I always start R by doubleclick on .RData 
icon.

So for each project I have different .RData.

Beware that Windows keeps you safe and usually hides files with dot at the 
beginning so you need to allow such files to be displayed.

Cheers
Petr

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Shivi
> Bhatia
> Sent: Sunday, June 19, 2016 10:23 PM
> To: r-help@r-project.org
> Subject: [R] Error in setwd() : argument "dir" is missing, with no default
>
> Dear Team,
>
> I have searched for this error at various forums but enable to find a relevant
> solution.
>
> When i had installed R studio the WD was saved at a particular location now
> when i try to change it gives me this error:
>
> Error in setwd() : argument "dir" is missing, with no default. I have tried
> setting the WD using Shift+ Ctrl+ H or using setwd() command. While it
> changes the WD for that particular session but when i restart the session it 
> is
> again reset to the old previous location hence every instance i have to reset
> this as all my data and other files are saved at another location.
> While searching at some of the forums like stat exchange it was advised to
> use setwd("../") as it selects your WD one step back however with this also i
> cant fix the issue.
>
> Kindly advice at the earliest.
>
> Thanks, Shivi
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a jsou určeny 
pouze jeho adresátům.
Jestliže jste obdržel(a) tento e-mail omylem, informujte laskavě neprodleně 
jeho odesílatele. Obsah tohoto emailu i s přílohami a jeho kopie vymažte ze 
svého systému.
Nejste-li zamýšleným adresátem tohoto emailu, nejste oprávněni tento email 
jakkoliv užívat, rozšiřovat, kopírovat či zveřejňovat.
Odesílatel e-mailu neodpovídá za eventuální škodu způsobenou modifikacemi či 
zpožděním přenosu e-mailu.

V případě, že je tento e-mail součástí obchodního jednání:
- vyhrazuje si odesílatel právo ukončit kdykoliv jednání o uzavření smlouvy, a 
to z jakéhokoliv důvodu i bez uvedení důvodu.
- a obsahuje-li nabídku, je adresát oprávněn nabídku bezodkladně přijmout; 
Odesílatel tohoto e-mailu (nabídky) vylučuje přijetí nabídky ze strany příjemce 
s dodatkem či odchylkou.
- trvá odesílatel na tom, že příslušná smlouva je uzavřena teprve výslovným 
dosažením shody na všech jejích náležitostech.
- odesílatel tohoto emailu informuje, že není oprávněn uzavírat za společnost 
žádné smlouvy s výjimkou případů, kdy k tomu byl písemně zmocněn nebo písemně 
pověřen a takové pověření nebo plná moc byly adresátovi tohoto emailu případně 
osobě, kterou adresát zastupuje, předloženy nebo jejich existence je adresátovi 
či osobě jím zastoupené známá.

This e-mail and any documents attached to it may be confidential and are 
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. 
Delete the contents of this e-mail with all attachments and its copies from 
your system.
If you are not the intended recipient of this e-mail, you are not authorized to 
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by 
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a 
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately 
accept such offer; The sender of this e-mail (offer) excludes any acceptance of 
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an 
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into 
any contracts on behalf of the company except for cases in which he/she is 
expressly authorized to do so in writing, and such authorization or power of 
attorney is submitted to the recipient or the person represented by the 
recipient, or the existence of such authorization is known to the recipient of 
the person represented by the recipient.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] How to visualize this df

2016-06-20 Thread Rainer M Krug
 writes:

>  Hi Rainer,

Please keep this on the mailing list for info.

> Thanks for your reply. I want to show NRuns for each Protocol in my df
> and color it by Speed. I think it's possible by a bar chart but I am
> confused how to subset my df for using Bar chart in ggplot

Sorry - haven't used ggplot in ages.

Can't help you with that.

Rainer

>  
>
> On Monday, June 20, 2016 1:29 PM, Rainer M Krug  wrote:
> "ch.elahe via R-help"  writes:
>
>> Hi all,
>> I have a question about how to visualize my df! here is my df I need to 
>> visualize:
>>
>> 'data.frame':   455 obs. of 128 variables:
>> $Protocol  :Factor w/132 levels "_unknown","PD FS SAG","T1 SAG 
>> FS","T2 FS OR",...
>> $NRuns : int   45 45 156 75 89 69 ..
>> $Speed :Factor w/4 levels "Slow","Fast","VeryFast","VerySlow" 
>> NRuns is actually number of times that the customer used the protocol
>> and speed is how did the costumer run the Protocol. Each Protocol can
>> have different NRuns. Do you know what's the best way to visualize
>> this df?
>
> That depends what you want to show. And that determines the best
> visualization.
>
> Also: what do you want to use it for: an interactive
> presentation may call for different visualizations than a printed
> report.
>
> Cheers,
>
> Rainer
>
>
>> Thanks for any help!
>> Elahe
>>

-- 
Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, 
UCT), Dipl. Phys. (Germany)

Centre of Excellence for Invasion Biology
Stellenbosch University
South Africa

Tel :   +33 - (0)9 53 10 27 44
Cell:   +33 - (0)6 85 62 59 98
Fax :   +33 - (0)9 58 10 27 44

Fax (D):+49 - (0)3 21 21 25 22 44

email:  rai...@krugs.de

Skype:  RMkrug

PGP: 0x0F52F982


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gbresolve function from the geiger package

2016-06-20 Thread Nomi Hadar
Hello,

I have troubles with the gbresolve function from the *geiger *package,
which works with the NCBI taxonomy.
When I use it, there are genera that are not found although they *do appear
*in the NCBI taxonomy browser.


for example, when I run:

library("ape")
library("geiger")

genus = "Christia"
gbresolve(genus, rank= "genus", within = "Fabaceae")

("Christia" is a genus within a plants group called Fabaceae)

I get:

Error in tmp[[idx]] : subscript out of bounds
In addition: Warning messages:
1: In FUN(X[[i]], ...) : Attempt one of the following:
Bacterium purifaciens Christiansen 1917
...
...
2: In gbresolve.default(genus, rank = "genus", within = "Fabaceae") :
  The following taxa were not encountered in the NCBI taxonomy:
Christia


And so for other genera such as "Pycnospora" / "Solori" / "Thailentadopsis"
and more.
You can see that "Christia" appears in browser
, and so I expect
to get "Christia vespertilionis" as result.

Why is that?

Thank you very much!
Nomi


-- 
*Nomi Hadar*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R help contingency table

2016-06-20 Thread Lucie Dupond
Hello,
I'm sorry if my question is really basic, but I'm having some troubles with the 
statistics for my thesis, and especially the khi square test and contingency 
tables.

For what I understood, there are two "kinds" of khisquare test, that are quite 
similar :
- Homogeneity, when we have one variable and we want to compare it with a 
theorical distribution
- Independence test, when we have 2 variable and we want to see if they are 
linked

-- -

I'm working on color transitions, with 3 possible factors : � High � , � Medium 
� and � Low �
I want to know if an individual will go preferably from a color � High � to 
another color � High �, more than from a color � High � to a color � Medium � 
(for example)

I have this table :

trans1<-c(51,17,27,12,21,13,37,15,60)
transitions1<-matrix(trans1, nrow=3, ncol=3, byrow=T)
rownames(transitions1) <- c("High"," Medium", "Low")
colnames(transitions1) <- c("High"," Medium", "Low")

The first colomn is showing the first color, and the second is showing the 
second color of the transition

It looks like I'm in the case of an Independence test, in order to see if the 
variable "second color" is linked to the "first color".

So I'm making the test :

chisq.test(transitions1)


(If I understood well, the test on the matrix is the independence  test, and 
the test on the vector trans1 is the homogeneity test ?)

The result is significatif, it means that some transitions are prefered.

My problem is that I have other transition tables like this one (with other 
individuals or other conditions)
For example, I also have this one :


trans2<-c(13,7,8,5,16,18,11,8,17)
transitions2<-matrix(trans2, nrow=3, ncol=3, byrow=T)
rownames(transitions2) <- c("High","Low", "Stick")
colnames(transitions2) <- c("High","Low", "Stick")

I want to know if the "prefered" transitions in the table 1 are the same in the 
table 2.
But if I try a khisquare test on those two matrix, R only takes the first one.

How can I compare those tables
Maybe with another test ?

Thanks in advance !

Kind regards

Lucie S.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: Matrix Constraints in R Optim

2016-06-20 Thread Priyank Dwivedi
All,
Here are the dput files of the input data to the code.

Thanks for any advice.

I am adding the entire code below too just in case.


file <- file.path("Learning R","CRM_R_Ver4.xlsx")
file
my.data <- readWorksheetFromFile(file,sheet=1,startRow=1)
str(my.data)  # DATA FRAME
my.data.matrix.inj <- as.matrix(my.data)  #convert DATA FRAME to MATRIX
my.data.matrix.inj

dput(my.data.matrix.inj,"my.data.matrix.inj.txt")


my.data.2 <- readWorksheetFromFile(file,sheet=2,startRow=1)
str(my.data.2)  # DATA FRAME
my.data.matrix.time <- as.matrix(my.data.2)  #convert DATA FRAME to MATRIX
my.data.matrix.time

dput(my.data.matrix.time,"my.data.matrix.time.txt")

my.data <- readWorksheetFromFile(file,sheet=3,startRow=1)
str(my.data)  # DATA FRAME
my.data.matrix.prod <- as.matrix(my.data)  #convert DATA FRAME to MATRIX
my.data.matrix.prod

dput(my.data.matrix.prod,"my.data.matrix.prod.txt")

 # my.data.var <- vector("numeric",length = 24)
 # my.data.var

my.data.var <- c(10,0.25,0.25,0.25,0.25,0.25,
 10,0.25,0.25,0.25,0.25,0.25,
 10,0.25,0.25,0.25,0.25,0.25,
 10,0.25,0.25,0.25,0.25,0.25)
my.data.var

dput(my.data.var,"my.data.var.txt")


my.data.qo <- c(5990,150,199,996)   #Pre-Waterflood Production
my.data.timet0 <- 0 # starting condition for time

#FUNCTION
Qjk.Cal.func <- function(my.data.timet0,my.data.qo,my.data.matrix.time,
 my.data.matrix.inj,
my.data.matrix.prod,my.data.var,my.data.var.mat)
{

  qjk.cal.matrix <- matrix(,nrow = nrow(my.data.matrix.prod),
ncol=ncol(my.data.matrix.prod))

  count <- 1
  number <- 1
  for(colnum in 1:ncol(my.data.matrix.prod))   # loop through all PROD
wells columns
  {
sum <-0
for(row in 1:nrow(my.data.matrix.prod)) #loop through all the rows
{
  sum <-0
  deltaT <-0
  expo <-0


for(column in 1:ncol(my.data.matrix.inj)) #loop through all
the injector columns to get the PRODUCT SUM
 {
sum = sum +
my.data.matrix.inj[row,column]*my.data.var.mat[colnum,number+column]
 }

  if(count<2)
  {
deltaT<- my.data.matrix.time[row]
  }
  else
  {deltaT <- my.data.matrix.time[row]-my.data.matrix.time[row-1]}


  expo <- exp(-deltaT/my.data.var.mat[colnum,1])
# change here too

  if(count<2)
  {
qjk.cal.matrix[row,colnum] = my.data.qo[colnum]*expo + (1-expo)*sum
  }
  else
  {
qjk.cal.matrix[row,colnum]=qjk.cal.matrix[row-1,colnum]*expo +
(1-expo)*sum
  }
  count <- count+1
}

count <-1
  }

  qjk.cal.matrix  # RETURN CALCULATED MATRIX TO THE ERROR FUNCTION

}


# ERROR FUNCTION - FINDS DIFFERENCE BETWEEN CAL. MATRIX AND ORIGINAL
MATRIX. Miminize the Error by changing my.data.var

Error.func <- function(my.data.var)
{
  #First convert vector(my.data.var) to MATRIX aand send it to
calculate new MATRIX
  my.data.var.mat <- matrix(my.data.var,nrow =
ncol(my.data.matrix.prod),ncol = ncol(my.data.matrix.inj)+1,byrow =
TRUE)

  Calc.Qjk.Value <- Qjk.Cal.func(my.data.timet0,my.data.qo,my.data.matrix.time,
 my.data.matrix.inj,
my.data.matrix.prod,my.data.var,my.data.var.mat)


  diff.values <- my.data.matrix.prod-Calc.Qjk.Value#FIND
DIFFERENCE BETWEEN CAL. MATRIX AND ORIGINAL MATRIX


  Error <- ((colSums ((diff.values^2), na.rm = FALSE, dims =
1))/nrow(my.data.matrix.inj))^0.5#sum of square root of the diff
  print(paste(Error))

  Error_total <- sum(Error,na.rm=FALSE)/ncol(my.data.matrix.prod)   #
total avg error


  Error_total
}

# OPTIMIZE

sols<-optim(my.data.var,Error.func,method="L-BFGS-B",upper=c(Inf,1,1,1,1,1,Inf,1,1,1,1,1,Inf,1,1,1,1,1,Inf,1,1,1,1,1),
  lower=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))

sols

On 17 June 2016 at 16:55, Jeff Newmiller  wrote:
> Your code is corrupt because you failed to send your email in plain text
> format.
>
> You also don't appear to have all data needed to reproduce the problem. Use
> the dput function to generate R code form of a sample of your data.
> --
> Sent from my phone. Please excuse my brevity.
>
> On June 17, 2016 1:07:21 PM PDT, Priyank Dwivedi 
> wrote:
>>
>> By mistake, I sent it earlier to the wrong address.
>>
>> -- Forwarded message --
>> From: Priyank Dwivedi 
>> Date: 17 June 2016 at 14:50
>> Subject: Matrix Constraints in R Optim
>> To: r-help-ow...@r-project.org
>>
>>
>> Hi,
>>
>> Below is the code snippet I wrote in R:
>>
>> The basic idea is to minimize error by optimizing set of values (in this
>> scenario 12) in the form of a matrix. I defined the matrix elements as
>> vector "*my.data.var" * and then stacked it into a matrix called
>> "*my.data.var.mat"
>> in the error function. *
>>
>> The only part that I can't figure out is "what if the column sum in
>> the *my.data.var.mat
>> needs to be <=1"; that's the constraint/s.. Where do I introduce it in the
>> OPTIM solver or 

[R] Error in setwd() : argument "dir" is missing, with no default

2016-06-20 Thread Shivi Bhatia
Dear Team,

I have searched for this error at various forums but enable to find a
relevant solution.

When i had installed R studio the WD was saved at a particular location now
when i try to change it gives me this error:

Error in setwd() : argument "dir" is missing, with no default. I have tried
setting the WD using Shift+ Ctrl+ H or using setwd() command. While it
changes the WD for that particular session but when i restart the session
it is again reset to the old previous location hence every instance i have
to reset this as all my data and other files are saved at another location.
While searching at some of the forums like stat exchange it was advised to
use setwd("../") as it selects your WD one step back however with this also
i cant fix the issue.

Kindly advice at the earliest.

Thanks, Shivi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] frailtypack - Model did not converge

2016-06-20 Thread Andreu Ferrero
Hey,


I am using "frailtyPenal" to fit a general joint model:

I got a "Model did not converge" message, and I guess it is because I miss
specify something in the command:

"Call:
frailtyPenal(formula = Surv(Time_final_mes_cor, BD_RE2.Re_IC1_cor) ~
cluster(BD_RE2.e_b_id) + (BD_RE2.X.a_probnp_bnpR) +
(BD_RE2.ae_presencia_Cizquierda) +
(BD_RE2.CKD_EPI60) + terminal(BD_RE2.death1_cor),
formula.terminalEvent = ~(BD_RE2.X.a_probnp_bnpR) +
(BD_RE2.ae_presencia_Cizquierda) + (BD_RE2.CKD_EPI60), data = BD_AV,
recurrentAG = FALSE, jointGeneral = TRUE, n.knots = 20, kappa =
c(1,
1), maxit = 700, LIMlogl = 0.0142)


  General Joint gamma frailty model for recurrent and a terminal event
processes
  using a Penalized Likelihood on the hazard function

   Convergence criteria:
   parameters = 6.81e-09 likelihood = 0.0115 gradient = 1

   n= 2473
   n recurrent events= 406
   n terminal events= 195"


Any idea??? Cause PC computing takes hours to "fit/non-fit" this model.


Thanks,




Andreu Ferrero Gregori

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to visualize this df

2016-06-20 Thread Rainer M Krug
"ch.elahe via R-help"  writes:

> Hi all,
> I have a question about how to visualize my df! here is my df I need to 
> visualize:
>
> 'data.frame':   455 obs. of 128 variables:
> $Protocol  :Factor w/132 levels "_unknown","PD FS SAG","T1 SAG 
> FS","T2 FS OR",...
> $NRuns : int   45 45 156 75 89 69 ..
> $Speed :Factor w/4 levels "Slow","Fast","VeryFast","VerySlow" 
> NRuns is actually number of times that the customer used the protocol
> and speed is how did the costumer run the Protocol. Each Protocol can
> have different NRuns. Do you know what's the best way to visualize
> this df?

That depends what you want to show. And that determines the best
visualization.

Also: what do you want to use it for: an interactive
presentation may call for different visualizations than a printed
report.

Cheers,

Rainer

> Thanks for any help!
> Elahe
>

-- 
Rainer M. Krug
email: Rainerkrugsde
PGP: 0x0F52F982


signature.asc
Description: PGP signature
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to visualize this df

2016-06-20 Thread ch.elahe via R-help
Hi all,
I have a question about how to visualize my df! here is my df I need to 
visualize:

'data.frame':   455 obs. of 128 variables:
$Protocol  :Factor w/132 levels "_unknown","PD FS SAG","T1 SAG FS","T2 
FS OR",...
$NRuns : int   45 45 156 75 89 69 ..
$Speed :Factor w/4 levels "Slow","Fast","VeryFast","VerySlow" 
NRuns is actually number of times that the customer used the protocol and speed 
is how did the costumer run the Protocol. Each Protocol can have different 
NRuns. Do you know what's the best way to visualize this df? 
Thanks for any help!
Elahe

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] L1 penalized regression fails to predict from model

2016-06-20 Thread Fredrik Karlsson
Dear list,

Sorry for this cross-post from StackOverflow, but I see that SO was maybe
the wrong forum for this question. Too package specific and

Ok, what I am trying to do is to predict from an L1 penalized regression.
This falls due to a data set dimension problem that I cannot figure out.

The procedure I'm using is the following:

require(penalized)# neg contains negative data# pos contains positive data

Now, the procedure below aims to construct comparable (balanced in terms os
positive and negative cases) training and validation data sets.

# 50% negative training set
negSamp <- neg %>% sample_frac(0.5) %>% as.data.frame()# Negative validation set
negCompl <- neg[setdiff(row.names(neg),row.names(negSamp)),]# 50%
positive training set
posSamp <- pos %>% sample_frac(0.5) %>% as.data.frame()# Positive validation set
posCompl <- pos[setdiff(row.names(pos),row.names(posSamp)),]# Combine sets
validat <- rbind(negSamp,posSamp)
training <- rbind(negCompl,posCompl)

Ok, so here we now have two comparable sets.

[1] FALSE  TRUE> dim(training)[1] 1061  381> dim(validat)[1] 1060
381> identical(names(training),names(validat))[1] TRUE

I fit the model to the training set without a problem (and I've tried using
a range of Lambda1 values here). But, fitting the model to the validation
data set fails, with a just odd error description.

> fit <- 
> penalized(VoiceTremor,training[-1],data=training,lambda1=40,standardize=TRUE)#
>  nonzero coefficients: 13> fit2 <- predict(fit, penalized=validat[-1], 
> data=validat)
Error in .local(object, ...) :
  row counts of "penalized", "unpenalized" and/or "data" do not match

Just to make sure that this is not due to some NA's in the data set:

> identical(validat,na.omit(validat))[1] TRUE

Oddly enough, I may generate some new data that is comparable to the proper
data set:

> data.frame(VoiceTremor="NVT",matrix(rnorm(38),nrow=1000,ncol=380) ) -> neg
> data.frame(VoiceTremor="VT",matrix(rnorm(38),nrow=1000,ncol=380) ) -> 
> pos> dim(pos)[1] 1000  381> dim(neg)[1] 1000  381

and run the procedure above, and then the prediction step works!

How come?

What could be wrong with my second (not training) data set?

Fredrik

-- 
"Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it."

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (Off-Topic] Introducing a new R Blog

2016-06-20 Thread G . Maubach
Hi All,

today I would like to announce a now R blog. I contains a few entries 
about the findings during my course of studies and my daily work:

https://github.com/gmaubach/R-Know-How/wiki/R-Blog

I hope you'll find my hints usefull.

In addition you could have a look at a small R collection of functions I 
found usefull when working with my data:

https://github.com/gmaubach/R-Project-Utilities

Kind regards

Georg Maubach

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.