Re: [R] metafor - code for analysing geometric means

2014-12-07 Thread Bert Gunter
While you may get a helpful reply, I think this is really not the
forum for such relatively basic math/stat questions. As you seem to be
more or less at sea here, I really really suggest that you seek help
from a local statistical resource.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, Dec 7, 2014 at 10:25 PM, Purssell, Ed  wrote:
> Dear All
>
> I have tried very hard to work out what to do with putting logged data into 
> metafor; the paper says..
> 'geometric mean antibody concentrations (GMCs) or opsonophagocytic activity 
> titres (geometric mean titres [GMT]) were calculated with 95% CIs by taking 
> the antilog of the mean of the log concentration or titre transformations.'
>
> Does this look right if I take the reported mean, upper and lower bound of 
> the CI, and the number?
>
> m<-log(mean)
> ub<-log(upper bound)
> lb<-log(lower bound)
> diff<-ub-lb
> SE<-diff/3.92
> SD<-SE*(sqrt(n))
>
> Then put m, SD and n for each group into metafor as normal.  Or is there a 
> better way?  I am afraid I didn't understand how to do it on a log scale.
>
> Thank you
>
> Edward
> 
> Edward Purssell PhD
> Senior Lecturer
>
> Florence Nightingale Faculty of Nursing and Midwifery
> King's College London
> James Clerk Maxwell Building
> 57 Waterloo Road
> London SE1 8WA
> Telephone 020 7848 3021
> Mobile 07782 374217
> email edward.purss...@kcl.ac.uk
> https://www.researchgate.net/profile/Edward_Purssell
>
> 
> From: Viechtbauer Wolfgang (STAT) 
> 
> Sent: 14 November 2014 10:40
> To: Michael Dewey; Purssell, Ed; r-help@r-project.org
> Subject: RE: [R] metafor - code for analysing geometric means
>
> With "geometric mean 1 CI /3.92", I assume you mean "(upper bound - lower 
> bound) / 3.92". Two things:
>
> 1) That will give you the SE of the mean, not the SD of the observations 
> (which is what you need as input).
>
> 2) Probably the CI for the geometric mean was calculated on the log-scale (as 
> Michael hinted at). Check if log(upper bound) and log(lower bound) is (within 
> rounding error) symmetric around log(geometric mean). Then (log(upper bound) 
> - log(lower bound)) / 3.96 * sqrt(n) will give you the SD of the log of the 
> values used to compute the geometric mean. Then you could use log(geometric 
> mean) and that SD as input. But this would give you the difference of the 
> log-transformed geometric means. Not sure if this is what you want to analyze.
>
> Two more articles that may be helpful here:
>
> Friedrich, J. O., Adhikari, N. K., & Beyene, J. (2012). Ratio of geometric 
> means to analyze continuous outcomes in meta-analysis: Comparison to mean 
> differences and ratio of arithmetic means using empiric data and simulation. 
> Statistics in Medicine, 31(17), 1857-1886.
>
> Souverein, O. W., Dullemeijer, C., van 't Veer, P., & van der Voet, H. 
> (2012). Transformations of summary statistics as input in meta-analysis for 
> linear dose-response models on a logarithmic scale: A methodology developed 
> within EURRECA. BMC Medical Research Methodology, 12(57).
>
> Best,
> Wolfgang
>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
>> On Behalf Of Michael Dewey
>> Sent: Thursday, November 13, 2014 12:36
>> To: Purssell, Ed; r-help@r-project.org
>> Subject: Re: [R] metafor - code for analysing geometric means
>>
>> On 13/11/2014 11:00, Purssell, Ed wrote:
>> > ?Dear All
>> >
>> > I have some data expressed in geometric means and 95% confidence
>> intervals.  Can I code them in metafor as:
>> >
>> > rma(m1i=geometric mean 1, m2i=geometric mean 2, sd1i=geometric mean 1
>> CI /3.92, sd2i=geometric mean 2 CI/3.92...etc, measure="MD")
>>
>> Would it not be better to work on the log scale?
>>
>> > All of the studies use geometric means.
>> >
>> > Thanks!
>> >
>> > Edward
>>
>> --
>> Michael
>> http://www.dewey.myzen.co.uk
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] metafor - code for analysing geometric means

2014-12-07 Thread Purssell, Ed
Dear All

I have tried very hard to work out what to do with putting logged data into 
metafor; the paper says..
'geometric mean antibody concentrations (GMCs) or opsonophagocytic activity 
titres (geometric mean titres [GMT]) were calculated with 95% CIs by taking the 
antilog of the mean of the log concentration or titre transformations.'

Does this look right if I take the reported mean, upper and lower bound of the 
CI, and the number?

m<-log(mean) 
ub<-log(upper bound)
lb<-log(lower bound)
diff<-ub-lb
SE<-diff/3.92
SD<-SE*(sqrt(n))

Then put m, SD and n for each group into metafor as normal.  Or is there a 
better way?  I am afraid I didn't understand how to do it on a log scale.

Thank you

Edward

Edward Purssell PhD
Senior Lecturer

Florence Nightingale Faculty of Nursing and Midwifery
King's College London
James Clerk Maxwell Building
57 Waterloo Road
London SE1 8WA
Telephone 020 7848 3021
Mobile 07782 374217
email edward.purss...@kcl.ac.uk
https://www.researchgate.net/profile/Edward_Purssell


From: Viechtbauer Wolfgang (STAT) 
Sent: 14 November 2014 10:40
To: Michael Dewey; Purssell, Ed; r-help@r-project.org
Subject: RE: [R] metafor - code for analysing geometric means

With "geometric mean 1 CI /3.92", I assume you mean "(upper bound - lower 
bound) / 3.92". Two things:

1) That will give you the SE of the mean, not the SD of the observations (which 
is what you need as input).

2) Probably the CI for the geometric mean was calculated on the log-scale (as 
Michael hinted at). Check if log(upper bound) and log(lower bound) is (within 
rounding error) symmetric around log(geometric mean). Then (log(upper bound) - 
log(lower bound)) / 3.96 * sqrt(n) will give you the SD of the log of the 
values used to compute the geometric mean. Then you could use log(geometric 
mean) and that SD as input. But this would give you the difference of the 
log-transformed geometric means. Not sure if this is what you want to analyze.

Two more articles that may be helpful here:

Friedrich, J. O., Adhikari, N. K., & Beyene, J. (2012). Ratio of geometric 
means to analyze continuous outcomes in meta-analysis: Comparison to mean 
differences and ratio of arithmetic means using empiric data and simulation. 
Statistics in Medicine, 31(17), 1857-1886.

Souverein, O. W., Dullemeijer, C., van 't Veer, P., & van der Voet, H. (2012). 
Transformations of summary statistics as input in meta-analysis for linear 
dose-response models on a logarithmic scale: A methodology developed within 
EURRECA. BMC Medical Research Methodology, 12(57).

Best,
Wolfgang

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Michael Dewey
> Sent: Thursday, November 13, 2014 12:36
> To: Purssell, Ed; r-help@r-project.org
> Subject: Re: [R] metafor - code for analysing geometric means
>
> On 13/11/2014 11:00, Purssell, Ed wrote:
> > ?Dear All
> >
> > I have some data expressed in geometric means and 95% confidence
> intervals.  Can I code them in metafor as:
> >
> > rma(m1i=geometric mean 1, m2i=geometric mean 2, sd1i=geometric mean 1
> CI /3.92, sd2i=geometric mean 2 CI/3.92...etc, measure="MD")
>
> Would it not be better to work on the log scale?
>
> > All of the studies use geometric means.
> >
> > Thanks!
> >
> > Edward
>
> --
> Michael
> http://www.dewey.myzen.co.uk

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to plot bootstrap distribution in rms package?

2014-12-07 Thread Eddie Smith
Hi guys,

I am trying to familiar myself with bootstrapping using rms package and
stuck on how to plot the bootstrap distribution. Would appreciate if
somebody could help.

Thanks in advance.

From
http://stats.stackexchange.com/questions/64788/interpreting-a-logistic-regression-model-with-multiple-predictors

library(rms)
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv";)
mylogit <- lrm(admit ~ gre, x=TRUE, y=TRUE, data = mydata)
mylogit
my.valid <- validate(mylogit, method="boot", B=1000)
my.valid
my.calib <- calibrate(mylogit, method="boot", B=1000)
my.calib

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date time problem

2014-12-07 Thread Alemu Tadesse
Thank you very much Jeff. Below is the data I used:
> Corrected_data
  SA_LST SA_GHI_mean
61759 3/11/2007 1:00 0.0
67517 3/11/2007 2:00 0.0
70017 3/11/2007 3:00 0.0
70524 3/11/2007 4:00 0.0
71061 3/11/2007 5:00 0.0
71638 3/11/2007 6:00 0.0
72113 3/11/2007 7:0020.20873
72629 3/11/2007 8:00   123.14231
73274 3/11/2007 9:00   306.97343

>test1<-as.POSIXct(Corrected_data$SA_LST,format="%m/%d/%Y %H:%M",
tz="EST5EDT")
>test1
[1] "2007-03-11 01:00:00 EST" NA"2007-03-11
03:00:00 EDT" "2007-03-11 04:00:00 EDT"
[5] "2007-03-11 05:00:00 EDT" "2007-03-11 06:00:00 EDT" "2007-03-11
07:00:00 EDT" "2007-03-11 08:00:00 EDT"
[9] "2007-03-11 09:00:00 EDT"

> format(test1, tz="Etc/GMT", usetz=TRUE )
[1] "2007-03-11 06:00:00 GMT" NA"2007-03-11
07:00:00 GMT" "2007-03-11 08:00:00 GMT"
[5] "2007-03-11 09:00:00 GMT" "2007-03-11 10:00:00 GMT" "2007-03-11
11:00:00 GMT" "2007-03-11 12:00:00 GMT"
[9] "2007-03-11 13:00:00 GMT"
>

As you can see the "2007-03-11 02:00:00 EST" is not recognized and returned
as NA. This may be due to DST, which I do not know how to handle in R.

Best,

Alemu


On Sun, Dec 7, 2014 at 1:59 PM, Jeff Newmiller 
wrote:

> You have not provided a reproducible example, so anything I say could be
> wrong just due to misinterpretation. Please read [1] for suggestions on
> making your examples reproducible, particularly regarding the use of dput
> to provide example data. You have also posted in HTML format, which can
> cause additional scrambling of communications on this list.
>
> From the notation you are using, I would guess that "Corrected_SA_data" is
> a data frame containing columns "date_time" and "TZ" at a minimum. The
> "date_time" column could be a vector of class POSIXlt or POSIXct... except
> that I cannot reproduce such an object that doesn't print some kind of
> timezone indicator in its output. What version of R are you using? Note
> that such an object contains a tz attribute already, so unless you have
> already made sure that the date_time column knows about that timezone, they
> are probably unrelated to each other.
>
> Note that any POSIXct object is internally represented as a specific
> instant of time. The tz attribute is only used to control how that time
> will be displayed. For example:
>
> testtime3 <- as.POSIXct( rawtime1, tz="America/New_York" )
> format( testtime3, tz="Etc/GMT", usetz=TRUE )
>
> If you want the output to omit the timezone information you can omit the
> usetz argument:
>
> format( testtime3, tz="Etc/GMT" )
>
> If you have not yet, you should read [2]. Note that three-letter timezone
> indicators (even though they are given in OUTPUT!) are at best unreliable
> for use in specifying timezones in R... read ?timezones.
>
> [1] http://stackoverflow.com/questions/5963269/how-to-make-
> a-great-r-reproducible-example
> [2] http://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf
>
>
> On Sun, 7 Dec 2014, Alemu Tadesse wrote:
>
>  Dear R users
>>
>> I am puzzled by the following result from R script. I am trying to convert
>> local time to UTC time. Time zone is -5, therefore I used the following
>> approach.
>>
>> Below is the script.
>>
>>> Corrected_SA_data$date_time[k-1]
>>>
>> [1] "2007-03-11 01:00:00"
>>
>>> Corrected_SA_data$TZ[k-1]
>>>
>> [1] -5
>>
>>> Corrected_SA_data$date_time[k-1]-Corrected_SA_data$TZ[k-1]*3600
>>>
>> [1] "2007-03-11 07:00:00 MDT"
>>
>> I was expecting this last value to be something like "2007-03-11 06:00:00
>> UTC"
>>
>> Please correct me if I ma wrong.
>>
>> On the other hand I have
>>
>>> Corrected_SA_data$date_time[k]
>>>
>> [1] "2007-03-11 02:00:00"
>>
>>> Corrected_SA_data$TZ[k]
>>>
>> [1] -5
>>
>>> Corrected_SA_data$date_time[k]-Corrected_SA_data$TZ[k]*3600
>>>
>> [1] NA
>>
>> I am not sure why I am getting NA.
>>
>> Thank you for your help.
>>
>> Alemu
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> 
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> 
> ---
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-he

Re: [R] neighborhood competition index in r package

2014-12-07 Thread Bert Gunter
... and even more directly, googling on "R package Hegyi" brought up
the siplab package as the first hit. As it deals with forestry and
vegetation, it would appear to be what the OP wanted -- and could have
found him/her self with a minimum of effort.

Sigh... I don't understand why people don't make even a minimal search
effort on their own and feel it necessary to ask for help here. It
ain't rocket science!

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sun, Dec 7, 2014 at 6:05 PM, Ben Bolker  wrote:
> catalin roibu  gmail.com> writes:
>
>>
>> Dear all!
>>
>> Is there a R package to compute neighborhood competition index (Shutz,
>> Hegyi, and many index).
>>
>> Thank you very much!
>>
>
>   library("sos")
>   findFn("Hegyi")
>
> leads to
>
>   http://finzi.psych.upenn.edu/R/library/siplab/html/pairwise.html
>
> Does that help?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] neighborhood competition index in r package

2014-12-07 Thread Ben Bolker
catalin roibu  gmail.com> writes:

> 
> Dear all!
> 
> Is there a R package to compute neighborhood competition index (Shutz,
> Hegyi, and many index).
> 
> Thank you very much!
> 

  library("sos")
  findFn("Hegyi")

leads to 

  http://finzi.psych.upenn.edu/R/library/siplab/html/pairwise.html

Does that help?

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condensing data.frame

2014-12-07 Thread John Posner
Looking over Jeff's "dplyr" solution, I see that I forgot this part of the 
original spec:

  where the first 6 have the highest "countPercentage"

So here's a corrected sum_cid() function for my "plyr" solution:

summ_cid = function(frm) {
  # sort by countPercentage
  sorted_frm = arrange(frm, desc(countPercentage))
  # grab at most 6 rows
  toprows = head(sorted_frm, 6)
  # add a 7th row
  toprows[nrow(toprows)+1,] = c(toprows[1,1], NA, NA, NA, 
sum(toprows$countPercentage))
  # all done
  return(toprows)
}

-John

> -Original Message-
> From: Jeff Newmiller [mailto:jdnew...@dcn.davis.ca.us]
> Sent: Sunday, December 07, 2014 3:14 PM
> To: John Posner
> Cc: 'Chel Hee Lee'; Morway, Eric; R mailing list
> Subject: Re: [R] Condensing data.frame
> 
> dplyr version (good for large datasets):
> 
> library(dplyr)
> 
> # if original example dat data.frame is used # using read.csv with as.is=TRUE
> or stringsAsFactors=FALSE is better dat$tax_name <- as.character(
> dat$tax_name )
> 
> # dplyr pipe chain
> (   dat
> %>% arrange( site, desc( countPercentage )) %>% group_by( site ) %>% do(
> rbind_list( .[ 1:6, c( "site", "tax_name", "countPercentage" ) ]
>, data.frame( site=.[ 1, "site" ]
>, tax_name="Other"
>, countPercentage=sum( .[ -(1:6)
>, "countPercentage" ] )
>)
>)
>)
> %>% as.data.frame
> )
> 
> #note that this will break if there are fewer than 7 rows for any site
> 
> On Sun, 7 Dec 2014, John Posner wrote:
> 
> > Here's a solution using the plyr library:
> >
> > 
> > library(plyr)
> >
> > dat <- read.table(header=TRUE, sep=",", as.is=TRUE,   ## <
> as.is=TRUE
> > text="site,tax_name,count,countTotal,countPercentage
> > CID_1,Cyanobacteria,46295,123509,37.483098398
> > CID_1,Proteobacteria,36120,123509,29.244832360
> > CID_1,Bacteroidetes,19546,123509,15.825567368
> > CID_1,Verrucomicrobia,7886,123509,6.384959801
> > CID_1,Firmicutes,2843,123509,2.301856545
> >   ... 
> > CID_9,Armatimonadetes,27,77120,0.035010373
> > CID_9,Fusobacteria,25,77120,0.032417012
> > CID_9,Aquificae,13,77120,0.016856846
> > CID_9,Synergistetes,12,77120,0.015560166
> > CID_9,Deferribacteres,8,77120,0.010373444
> > CID_9,Thermotogae,7,77120,0.009076763
> > CID_9,Chrysiogenetes,3,77120,0.003890041.
> > ")
> >
> > summ_cid = function(frm) {
> >  # grab at most 6 rows from data frame  toprows = head(frm, 6)  # add
> > a 7th row  toprows[nrow(toprows)+1,] = c(toprows[1,1], "", NA, NA,
> > sum(toprows$countPercentage))  # all done
> >  return(toprows)
> > }
> >
> > result = ddply(dat, "site", summ_cid)
> > 
> >
> > Notes:
> >
> > 1. I needed to add the as.is=TRUE option to read.table() 2. The
> > invocation of ddply() does *not* use summarize() 3. Because there is
> > no use of summarize(), I have not figured out how to use the dplyr package
> in this context.
> >
> > -John
> >
> >> -Original Message-
> >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chel
> >> Hee Lee
> >> Sent: Sunday, December 07, 2014 11:43 AM
> >> To: Morway, Eric; R mailing list
> >> Subject: Re: [R] Condensing data.frame
> >>
> >> > datBySite <- split(dat, dat$site)
> >> > output <- lapply(datBySite, function(x){
> >> + x$idx <- seq_len(nrow(x))
> >> + x$grp <- ifelse(x$idx < 7, x$idx, 7) rval <-
> >> + tapply(x$countPercentage, x$grp, sum) x$grp <- x$count <-
> >> + x$countTotal <- NULL x <- x[seq_len(7), ] x$tax_name <-
> >> + as.character(x$tax_name) x$tax_name[7] <- "Others"
> >> + x$new <- rval
> >> + return(x)
> >> + })
> >> >
> >> > head(do.call(rbind, output), 14)
> >>  sitetax_name countPercentage idx   new
> >> CID_1.1CID_1   Cyanobacteria   37.483098   1 37.483098
> >> CID_1.2CID_1  Proteobacteria   29.244832   2 29.244832
> >> CID_1.3CID_1   Bacteroidetes   15.825567   3 15.825567
> >> CID_1.4CID_1 Verrucomicrobia6.384960   4  6.384960
> >> CID_1.5CID_1  Firmicutes2.301857   5  2.301857
> >> CID_1.6CID_1   Acidobacteria2.075152   6  2.075152
> >> CID_1.7CID_1  Others1.675182   7  6.684533
> >> CID_10.27 CID_10  Proteobacteria   35.366606   1 35.366606
> >> CID_10.28 CID_10   Bacteroidetes   25.188484   2 25.188484
> >> CID_10.29 CID_10   Cyanobacteria   23.294828   3 23.294828
> >> CID_10.30 CID_10 Verrucomicrobia6.970592   4  6.970592
> >> CID_10.31 CID_10   Acidobacteria1.988448   5  1.988448
> >> CID_10.32 CID_10  Actinobacteria1.644548   6  1.644548
> >> CID_10.33 CID_10  Others1.582823   7  5.546493
> >> >
> >>
> >> I hope this helps.
> >>
> >> Chel Hee Lee
> >>
> >> On 12/07/2014 08:21 AM, Morway, Eric wrote:
> >>> Using the dataset "dat" (found below), I'm seeking a way to condense
> >>> down the data.fram

Re: [R] bad STATA dataset import, how to change value labels

2014-12-07 Thread Edoardo Prestianni
>> Excuse the inaccuracy, the warning is "value label missing". the same
variable is considered as factor (w/ values ranging from a to b) in one
dataset, as int in another. I want it to be a factor in both.

> So, you are importing two different Stata formatted files an in only one
of them is the warning being emitted?

Yes. But the variable is the very same variable, it's an
administrative/postal code given to each province of the country I'm
studying.
Factor have mode integer, but does that that "int" imply also "nominal" as
well? I thought I needed a "factor" for that.


One dataset emits a warning and shows the variable as "int". The other
seems fine, factor w/ values ranging from a to b,  but when I use the the
str(data$var) function, the value shown are weird, like 1011000, 1012
... instead of being four-digit numbers.


>> Post the results of dput(head( dfrm[ , "varname"]))
>My intent was for you to substitute the name of your dataframe for the
token `dfrm`.

I apologize. The output is:

__

[1] 1 1 1 1 1 1

__


Thanks for your time







2014-12-07 19:15 GMT+01:00 David Winsemius :

>
> On Dec 6, 2014, at 6:37 PM, Edoardo Prestianni wrote:
>
> > Excuse the inaccuracy, the warning is "value label missing". the same
> variable is considered as factor (w/ values ranging from a to b) in one
> dataset, as int in another. I want it to be a factor in both.
>
> So, you are importing two different Stata formatted files an in only one
> of them is the warning being emitted?
>
> >
> > I think I am missing a package, the output is.
> >
> > Error in head(dfrm[, "variable"]) : object 'dfrm' not found
>
> My intent was for you to substitute the name of your dataframe for the
> token `dfrm`.
>
> head(yourDataObject[, "yourVariableNameInQuotes"])
>
> --
> David.
>
> >
> >
> > 2014-12-07 3:14 GMT+01:00 David Winsemius :
> >
> > On Dec 6, 2014, at 3:54 PM, Edoardo Prestianni wrote:
> >
> > > hello,
> > >
> > > I have imported a couple of .dta datasets, but a variable, instead of
> being
> > > labeled as factor (w/ values ranging from a to b) is labeled as
> integer.
> > >
> > > How can I fix this? I am sorry if it is a rookie question but I don't
> find
> > > the command googling.
> >
> > What "command"?
> >
> > The word "labeled" is not an R term unless on is talking about the
> labels of factor variables in which case there is no problem. Factors have
> mode integer.
> >
> > Post the results of dput(head( dfrm[ , "varname"]))
> >
> > --
> > David.
> >
> >
> > > Thanks everyone for their help,
> > >
> > > --
> > > Edoardo Prestianni
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> >
> >
> >
> > --
> > Edoardo Prestianni
>
> David Winsemius
> Alameda, CA, USA
>
>


-- 
Edoardo Prestianni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] neighborhood competition index in r package

2014-12-07 Thread catalin roibu
Dear all!

Is there a R package to compute neighborhood competition index (Shutz,
Hegyi, and many index).

Thank you very much!

-- 
---
Catalin-Constantin ROIBU
Lecturer PhD, Forestry engineer
Forestry Faculty of Suceava
Str. Universitatii no. 13, Suceava, 720229, Romania
office phone +4 0230 52 29 78, ext. 531
mobile phone   +4 0745 53 18 01
   +4 0766 71 76 58
FAX:+4 0230 52 16 64
silvic.usv.ro

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date time problem

2014-12-07 Thread Jeff Newmiller
You have not provided a reproducible example, so anything I say could be 
wrong just due to misinterpretation. Please read [1] for suggestions on 
making your examples reproducible, particularly regarding the use of dput 
to provide example data. You have also posted in HTML format, 
which can cause additional scrambling of communications on this list.


From the notation you are using, I would guess that "Corrected_SA_data" is 
a data frame containing columns "date_time" and "TZ" at a minimum. 
The "date_time" column could be a vector of class POSIXlt or POSIXct... 
except that I cannot reproduce such an object that doesn't print some 
kind of timezone indicator in its output. What version of R are you 
using? Note that such an object contains a tz attribute already, so unless 
you have already made sure that the date_time column knows about that 
timezone, they are probably unrelated to each other.


Note that any POSIXct object is internally represented as a specific 
instant of time. The tz attribute is only used to control how that time 
will be displayed. For example:


testtime3 <- as.POSIXct( rawtime1, tz="America/New_York" )
format( testtime3, tz="Etc/GMT", usetz=TRUE )

If you want the output to omit the timezone information you can omit the 
usetz argument:


format( testtime3, tz="Etc/GMT" )

If you have not yet, you should read [2]. Note that three-letter timezone 
indicators (even though they are given in OUTPUT!) are at best unreliable 
for use in specifying timezones in R... read ?timezones.


[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
[2] http://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf

On Sun, 7 Dec 2014, Alemu Tadesse wrote:


Dear R users

I am puzzled by the following result from R script. I am trying to convert
local time to UTC time. Time zone is -5, therefore I used the following
approach.

Below is the script.

Corrected_SA_data$date_time[k-1]

[1] "2007-03-11 01:00:00"

Corrected_SA_data$TZ[k-1]

[1] -5

Corrected_SA_data$date_time[k-1]-Corrected_SA_data$TZ[k-1]*3600

[1] "2007-03-11 07:00:00 MDT"

I was expecting this last value to be something like "2007-03-11 06:00:00
UTC"

Please correct me if I ma wrong.

On the other hand I have

Corrected_SA_data$date_time[k]

[1] "2007-03-11 02:00:00"

Corrected_SA_data$TZ[k]

[1] -5

Corrected_SA_data$date_time[k]-Corrected_SA_data$TZ[k]*3600

[1] NA

I am not sure why I am getting NA.

Thank you for your help.

Alemu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



---
Jeff NewmillerThe .   .  Go Live...
DCN:Basics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date time problem

2014-12-07 Thread jim holtman
I would use the 'lubridate' package for this:

> z <- Sys.time()
> z
[1] "2014-12-07 15:43:50 EST"
> require(lubridate)
> with_tz(z, "UTC")
[1] "2014-12-07 20:43:50 UTC"




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Dec 7, 2014 at 2:25 PM, Alemu Tadesse 
wrote:

> Dear R users
>
> I am puzzled by the following result from R script. I am trying to convert
> local time to UTC time. Time zone is -5, therefore I used the following
> approach.
>
> Below is the script.
> > Corrected_SA_data$date_time[k-1]
> [1] "2007-03-11 01:00:00"
> > Corrected_SA_data$TZ[k-1]
> [1] -5
> > Corrected_SA_data$date_time[k-1]-Corrected_SA_data$TZ[k-1]*3600
> [1] "2007-03-11 07:00:00 MDT"
>
> I was expecting this last value to be something like "2007-03-11 06:00:00
> UTC"
>
> Please correct me if I ma wrong.
>
> On the other hand I have
> > Corrected_SA_data$date_time[k]
> [1] "2007-03-11 02:00:00"
> > Corrected_SA_data$TZ[k]
> [1] -5
> > Corrected_SA_data$date_time[k]-Corrected_SA_data$TZ[k]*3600
> [1] NA
>
> I am not sure why I am getting NA.
>
> Thank you for your help.
>
> Alemu
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condensing data.frame

2014-12-07 Thread Jeff Newmiller

dplyr version (good for large datasets):

library(dplyr)

# if original example dat data.frame is used
# using read.csv with as.is=TRUE or stringsAsFactors=FALSE is better
dat$tax_name <- as.character( dat$tax_name )

# dplyr pipe chain
(   dat
%>% arrange( site, desc( countPercentage ))
%>% group_by( site )
%>% do( rbind_list( .[ 1:6, c( "site", "tax_name", "countPercentage" ) ]
  , data.frame( site=.[ 1, "site" ]
  , tax_name="Other"
  , countPercentage=sum( .[ -(1:6)
  , "countPercentage" ] )
  )
  )
  )
%>% as.data.frame
)

#note that this will break if there are fewer than 7 rows for any site

On Sun, 7 Dec 2014, John Posner wrote:


Here's a solution using the plyr library:


library(plyr)

dat <- read.table(header=TRUE, sep=",", as.is=TRUE,   ## < as.is=TRUE
text="site,tax_name,count,countTotal,countPercentage
CID_1,Cyanobacteria,46295,123509,37.483098398
CID_1,Proteobacteria,36120,123509,29.244832360
CID_1,Bacteroidetes,19546,123509,15.825567368
CID_1,Verrucomicrobia,7886,123509,6.384959801
CID_1,Firmicutes,2843,123509,2.301856545
  ... 
CID_9,Armatimonadetes,27,77120,0.035010373
CID_9,Fusobacteria,25,77120,0.032417012
CID_9,Aquificae,13,77120,0.016856846
CID_9,Synergistetes,12,77120,0.015560166
CID_9,Deferribacteres,8,77120,0.010373444
CID_9,Thermotogae,7,77120,0.009076763
CID_9,Chrysiogenetes,3,77120,0.003890041.
")

summ_cid = function(frm) {
 # grab at most 6 rows from data frame
 toprows = head(frm, 6)
 # add a 7th row
 toprows[nrow(toprows)+1,] = c(toprows[1,1], "", NA, NA, 
sum(toprows$countPercentage))
 # all done
 return(toprows)
}

result = ddply(dat, "site", summ_cid)


Notes:

1. I needed to add the as.is=TRUE option to read.table()
2. The invocation of ddply() does *not* use summarize()
3. Because there is no use of summarize(), I have not figured out how to use 
the dplyr package in this context.

-John


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chel Hee
Lee
Sent: Sunday, December 07, 2014 11:43 AM
To: Morway, Eric; R mailing list
Subject: Re: [R] Condensing data.frame

> datBySite <- split(dat, dat$site)
> output <- lapply(datBySite, function(x){
+ x$idx <- seq_len(nrow(x))
+ x$grp <- ifelse(x$idx < 7, x$idx, 7)
+ rval <- tapply(x$countPercentage, x$grp, sum) x$grp <- x$count <-
+ x$countTotal <- NULL x <- x[seq_len(7), ] x$tax_name <-
+ as.character(x$tax_name) x$tax_name[7] <- "Others"
+ x$new <- rval
+ return(x)
+ })
>
> head(do.call(rbind, output), 14)
 sitetax_name countPercentage idx   new
CID_1.1CID_1   Cyanobacteria   37.483098   1 37.483098
CID_1.2CID_1  Proteobacteria   29.244832   2 29.244832
CID_1.3CID_1   Bacteroidetes   15.825567   3 15.825567
CID_1.4CID_1 Verrucomicrobia6.384960   4  6.384960
CID_1.5CID_1  Firmicutes2.301857   5  2.301857
CID_1.6CID_1   Acidobacteria2.075152   6  2.075152
CID_1.7CID_1  Others1.675182   7  6.684533
CID_10.27 CID_10  Proteobacteria   35.366606   1 35.366606
CID_10.28 CID_10   Bacteroidetes   25.188484   2 25.188484
CID_10.29 CID_10   Cyanobacteria   23.294828   3 23.294828
CID_10.30 CID_10 Verrucomicrobia6.970592   4  6.970592
CID_10.31 CID_10   Acidobacteria1.988448   5  1.988448
CID_10.32 CID_10  Actinobacteria1.644548   6  1.644548
CID_10.33 CID_10  Others1.582823   7  5.546493
>

I hope this helps.

Chel Hee Lee

On 12/07/2014 08:21 AM, Morway, Eric wrote:

Using the dataset "dat" (found below), I'm seeking a way to condense
down the data.frame such that each "site" (i.e., "CID_1"..."CID_13")
has a maximum of 7 rows of post-processed data, where the first 6 have
the highest "countPercentage" and the 7th row is the sum of

"countPercentage"

from all other rows within that "site", and it is assigned the name
"Other".  So, for the first two sites in the provided data.frame,
CID_1 & CID_10, they would reduce to:

CID_1 Cyanobacteria 37.48
CID_1 Proteobacteria 29.24
CID_1 Bacteroidetes  15.83
CID_1 Verrucomicrobia  6.38
CID_1 Firmicutes  2.30
CID_1 Acidobacteria  2.08
CID_1 Other6.68
CID_10 Proteobacteria 35.37
CID_10 Bacteroidetes  25.19
CID_10 Cyanobacteria 23.29
CID_10 Verrucomicrobia  6.97
CID_10 Acidobacteria  1.99
CID_10 Actinobacteria 1.64
CID_10 Other 5.55


dat <- read.table(header=TRUE, sep=",",
text="site,tax_name,count,countTotal,countPercentage
CID_1,Cyanobacteria,46295,123509,37.483098398
CID_1,Proteobacteria,36120,123509,29.244832360
CID_1,Bacteroidetes,19546,123509,15.825567368
CID_1,Verrucomicrobia,7886,123509,6.384959801
CID_1,Firmicutes,2843,123509,2.301856545
CID_1,Acidobacteria,2563,123509,2.075152418
CID_1,Actinobacter

[R] date time problem

2014-12-07 Thread Alemu Tadesse
Dear R users

I am puzzled by the following result from R script. I am trying to convert
local time to UTC time. Time zone is -5, therefore I used the following
approach.

Below is the script.
> Corrected_SA_data$date_time[k-1]
[1] "2007-03-11 01:00:00"
> Corrected_SA_data$TZ[k-1]
[1] -5
> Corrected_SA_data$date_time[k-1]-Corrected_SA_data$TZ[k-1]*3600
[1] "2007-03-11 07:00:00 MDT"

I was expecting this last value to be something like "2007-03-11 06:00:00
UTC"

Please correct me if I ma wrong.

On the other hand I have
> Corrected_SA_data$date_time[k]
[1] "2007-03-11 02:00:00"
> Corrected_SA_data$TZ[k]
[1] -5
> Corrected_SA_data$date_time[k]-Corrected_SA_data$TZ[k]*3600
[1] NA

I am not sure why I am getting NA.

Thank you for your help.

Alemu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in cummulative variance depending on print command

2014-12-07 Thread William Revelle
Dear Rena,

As Peter points out, it is better to ask the maintainer of the program for 
detailed questions.  

  As Peter correctly surmised, print.psych (which is used to print the output 
from the fa function), knows that you have an oblique solution and is reporting 
the amount of variance associated with the oblique factors (taking into account 
that they are correlated).  The default print method assumes orthogonal factors.

If you compare the total amount of variance accounted for (cumulative Var) for 
all of the factors (.59) , this will match that found using orthogonal 
rotations, while the default print method of the loadings does not.

Bill

> On Dec 6, 2014, at 10:48 AM, peter dalgaard  wrote:
> 
> Firstly, there is no fa() function in base R. There is one in package 
> psych(), which has a maintainer, etc.
> 
> I guess that it is because fa() does a non-orthogonal factor rotation and its 
> print method knows about it, whereas the default print method for loadings 
> assumes that rotations are orthogonal.
> 
> - Peter D.
> 
>> On 05 Dec 2014, at 13:28 , Rena Büsch  wrote:
>> 
>> Hello,
>> I am trying a factor analysis via R.
>> When running the pricipal axis analysis I do get different tables depending
>> on the print command.
>> This is my factor analysis:
>> fa.pa_cor_3_2<- fa(ItemsCor_4, nfactors=3, fm="pa",rotate="oblimin")
>> 
>> To get the h2 I did the following print command:
>> print (fa.pa_cor_3_2, digits=2, cut=.3, sort=T)
>> To just get the loadings I did the following print command:
>> print (fa.pa_cor_3_2$loadings, digits=2, cutoff=.3, sort=T)
>> 
>> The result of the first print is the following Eigenvalue-cumulative
>> variance table:
>>   PA1   PA2  PA3
>> SS loadings20.59 18.16 5.03
>> Proportion Var  0.28  0.25 0.07
>> Cumulative Var  0.28  0.52 0.59
>> 
>> With the second print command I get a different table:
>>   PA1   PA2  PA3
>> SS loadings17.63 15.12 3.14
>> Proportion Var  0.24  0.20 0.04
>> Cumulative Var  0.24  0.44 0.49
>> 
>> The loadings are the same for both commands. There is just this slight
>> difference in the cumulative Var.
>> 
>> Does anyone have an idea of a cause for the difference? What can I report?
>> Did I post enough information to fully understand my problem?
>> Thanks in Advance
>> Rena
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

William Revellehttp://personality-project.org/revelle.html
Professor  http://personality-project.org
Department of Psychology   http://www.wcas.northwestern.edu/psych/
Northwestern Universityhttp://www.northwestern.edu/
Use R for psychology http://personality-project.org/r
It is 5 minutes to midnighthttp://www.thebulletin.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which is the final model for a Boosted Regression Trees (GBM)?

2014-12-07 Thread Samuel Reuther
Hi Kristi,

 

One year later I've been with the same question and found a solution with
the help (see plot.gbm: Marginal plots of fitted gbm objects.)

 

If your GBM-model is gbm1 <-   gbm(y ~ x1+x2, .) one can get the
coefficients for each x with:

 

print(plot(gbm1, i.var=1, n.trees=1000, return.grid=TRUE)) print(plot(gbm1,
2, return.grid=TRUE))

print(plot(gbm1, i.var=2, n.trees=1000, return.grid=TRUE)) print(plot(gbm1,
2, return.grid=TRUE))

 

This includes all trees from 1 to 1000. Without "return.grid=TRUE" the
graphic is shown. Additionally one can see the coefficients for up to 3-way
interactions:

 

print(plot(gbm1, i.var=c(1,2), n.trees=1000, return.grid=TRUE))
print(plot(gbm1, 2, return.grid=TRUE))

 

Hope that helps for others searching for the same thing!

Samuel

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boot strapping poisson getting warnings and negative values

2014-12-07 Thread Aravindhan, K
Team,
I am giving the exact code that produces the error.  Please see below.  Can 
anyone please help ?

Thanks
Aravindhan

PROGRAM
---
rm(list = ls())
x<-c(1,14,49,26,4,10,25,36,79,15)
y<-c(1,5,32,8,4,23,399,79,341,2)
db = data.frame(x,y)
library(car)
db = data.frame(x,y)
print("-==")
summary(m1 <- glm(y~0+x, family = poisson()))
mod.db.hub<-glm(y~0+x,family="poisson",data=db)
fit<-fitted(mod.db.hub)
e<-residuals(mod.db.hub)
X<-model.matrix(mod.db.hub)
boot.huber.fixed<-function(data,indices,maxit=20) {
Y<-fit+e[indices]
mod<-glm(Y~X+0,family="poisson",maxit=maxit)
coefficients(mod)
}
library(boot)
db.fix.boot<-boot(db,boot.huber.fixed,2000,maxit=20)
db.fix.boot
boot.ci(db.fix.boot,index=1,type=c("bca","perc","poisson"))


The error I get is 
--

> db.fix.boot<-boot(db,boot.huber.fixed,2000,maxit=20)
Error in eval(expr, envir, enclos) : 
  negative values not allowed for the 'Poisson' family
In addition: Warning messages:
1: In dpois(y, mu, log = TRUE) : non-integer x = 1.002114
2: In dpois(y, mu, log = TRUE) : non-integer x = 4.050746
3: In dpois(y, mu, log = TRUE) : non-integer x = 44.219309
4: In dpois(y, mu, log = TRUE) : non-integer x = 7.786340
5: In dpois(y, mu, log = TRUE) : non-integer x = 3.189963
6: In dpois(y, mu, log = TRUE) : non-integer x = 10.348190
7: In dpois(y, mu, log = TRUE) : non-integer x = 56.408862
8: In dpois(y, mu, log = TRUE) : non-integer x = 27.751783
9: In dpois(y, mu, log = TRUE) : non-integer x = 480.383098
10: In dpois(y, mu, log = TRUE) : non-integer x = 2.497504
> db.fix.boot
Error: object 'db.fix.boot' not found
> boot.ci(db.fix.boot,index=1,type=c("bca","perc","poisson"))
Error in boot.ci(db.fix.boot, index = 1, type = c("bca", "perc", "poisson")) : 
  object 'db.fix.boot' not found
--


-Original Message-
From: Aravindhan, K 
Sent: Saturday, November 15, 2014 7:59 AM
To: r-help@r-project.org
Subject: RE: [R] boot strapping poisson getting warnings and negative values

Team,
Has anyone looked at this question from me ?  it will help me immensely if 
someone can provide an answer to this.

Thanks
Aravindhan

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of K Aravindhan
Sent: Friday, August 08, 2014 7:42 PM
To: r-help@r-project.org
Subject: [R] boot strapping poisson getting warnings and negative values

Dear Team,
I am getting this error while running the boot-strapping functions. 

==
mod.db.hub<-glm(TOTAL~1+IPD,family="poisson",data=db)
fit<-fitted(mod.db.hub)
e<-residuals(mod.db.hub)
X<-model.matrix(mod.db.hub)
boot.huber.fixed<-function(data,indices,maxit=20) { Y<-fit+e[indices]
mod<-glm(Y~X-1,family="poisson",maxit=maxit)
coefficients(mod)
}
library(boot)
db.fix.boot<-boot(db,boot.huber.fixed,2000,maxit=20)
db.fix.boot
boot.ci(db.fix.boot,index=1,type=c("bca","perc","poisson"))
boot.ci(db.fix.boot,index=2,type=c("bca","perc","poisson"))
==

Error in eval(expr, envir, enclos) : 
negative values not allowed for the 'Poisson' family In addition: Warning 
messages:
1: In dpois(y, mu, log = TRUE) : non-integer x = 25.006412
2: In dpois(y, mu, log = TRUE) : non-integer x = 26.969411
3: In dpois(y, mu, log = TRUE) : non-integer x = 66.352323
4: In dpois(y, mu, log = TRUE) : non-integer x = 61.083519
5: In dpois(y, mu, log = TRUE) : non-integer x = 20.596770
6: In dpois(y, mu, log = TRUE) : non-integer x = 43.428258
7: In dpois(y, mu, log = TRUE) : non-integer x = 1108.263554
8: In dpois(y, mu, log = TRUE) : non-integer x = 61.937982
9: In dpois(y, mu, log = TRUE) : non-integer x = 419.991213
10: In dpois(y, mu, log = TRUE) : non-integer x = 47.369133

Can you explain to me how to get rid of these ?

Thanks
Aravindhan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bad STATA dataset import, how to change value labels

2014-12-07 Thread David Winsemius

On Dec 6, 2014, at 6:37 PM, Edoardo Prestianni wrote:

> Excuse the inaccuracy, the warning is "value label missing". the same 
> variable is considered as factor (w/ values ranging from a to b) in one 
> dataset, as int in another. I want it to be a factor in both.

So, you are importing two different Stata formatted files an in only one of 
them is the warning being emitted?

> 
> I think I am missing a package, the output is. 
> 
> Error in head(dfrm[, "variable"]) : object 'dfrm' not found

My intent was for you to substitute the name of your dataframe for the token 
`dfrm`.

head(yourDataObject[, "yourVariableNameInQuotes"])

-- 
David.

> 
> 
> 2014-12-07 3:14 GMT+01:00 David Winsemius :
> 
> On Dec 6, 2014, at 3:54 PM, Edoardo Prestianni wrote:
> 
> > hello,
> >
> > I have imported a couple of .dta datasets, but a variable, instead of being
> > labeled as factor (w/ values ranging from a to b) is labeled as integer.
> >
> > How can I fix this? I am sorry if it is a rookie question but I don't find
> > the command googling.
> 
> What "command"?
> 
> The word "labeled" is not an R term unless on is talking about the labels of 
> factor variables in which case there is no problem. Factors have mode integer.
> 
> Post the results of dput(head( dfrm[ , "varname"]))
> 
> --
> David.
> 
> 
> > Thanks everyone for their help,
> >
> > --
> > Edoardo Prestianni
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius
> Alameda, CA, USA
> 
> 
> 
> 
> -- 
> Edoardo Prestianni

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Condensing data.frame

2014-12-07 Thread John Posner
Here's a solution using the plyr library:


library(plyr)

dat <- read.table(header=TRUE, sep=",", as.is=TRUE,   ## < as.is=TRUE
text="site,tax_name,count,countTotal,countPercentage
CID_1,Cyanobacteria,46295,123509,37.483098398
CID_1,Proteobacteria,36120,123509,29.244832360
CID_1,Bacteroidetes,19546,123509,15.825567368
CID_1,Verrucomicrobia,7886,123509,6.384959801
CID_1,Firmicutes,2843,123509,2.301856545
   ... 
CID_9,Armatimonadetes,27,77120,0.035010373
CID_9,Fusobacteria,25,77120,0.032417012
CID_9,Aquificae,13,77120,0.016856846
CID_9,Synergistetes,12,77120,0.015560166
CID_9,Deferribacteres,8,77120,0.010373444
CID_9,Thermotogae,7,77120,0.009076763
CID_9,Chrysiogenetes,3,77120,0.003890041.
")

summ_cid = function(frm) {
  # grab at most 6 rows from data frame
  toprows = head(frm, 6)  
  # add a 7th row
  toprows[nrow(toprows)+1,] = c(toprows[1,1], "", NA, NA, 
sum(toprows$countPercentage))
  # all done
  return(toprows)
}

result = ddply(dat, "site", summ_cid)


Notes: 

 1. I needed to add the as.is=TRUE option to read.table()
 2. The invocation of ddply() does *not* use summarize()
 3. Because there is no use of summarize(), I have not figured out how to use 
the dplyr package in this context.

-John

> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Chel Hee
> Lee
> Sent: Sunday, December 07, 2014 11:43 AM
> To: Morway, Eric; R mailing list
> Subject: Re: [R] Condensing data.frame
> 
>  > datBySite <- split(dat, dat$site)
>  > output <- lapply(datBySite, function(x){
> + x$idx <- seq_len(nrow(x))
> + x$grp <- ifelse(x$idx < 7, x$idx, 7)
> + rval <- tapply(x$countPercentage, x$grp, sum) x$grp <- x$count <-
> + x$countTotal <- NULL x <- x[seq_len(7), ] x$tax_name <-
> + as.character(x$tax_name) x$tax_name[7] <- "Others"
> + x$new <- rval
> + return(x)
> + })
>  >
>  > head(do.call(rbind, output), 14)
>  sitetax_name countPercentage idx   new
> CID_1.1CID_1   Cyanobacteria   37.483098   1 37.483098
> CID_1.2CID_1  Proteobacteria   29.244832   2 29.244832
> CID_1.3CID_1   Bacteroidetes   15.825567   3 15.825567
> CID_1.4CID_1 Verrucomicrobia6.384960   4  6.384960
> CID_1.5CID_1  Firmicutes2.301857   5  2.301857
> CID_1.6CID_1   Acidobacteria2.075152   6  2.075152
> CID_1.7CID_1  Others1.675182   7  6.684533
> CID_10.27 CID_10  Proteobacteria   35.366606   1 35.366606
> CID_10.28 CID_10   Bacteroidetes   25.188484   2 25.188484
> CID_10.29 CID_10   Cyanobacteria   23.294828   3 23.294828
> CID_10.30 CID_10 Verrucomicrobia6.970592   4  6.970592
> CID_10.31 CID_10   Acidobacteria1.988448   5  1.988448
> CID_10.32 CID_10  Actinobacteria1.644548   6  1.644548
> CID_10.33 CID_10  Others1.582823   7  5.546493
>  >
> 
> I hope this helps.
> 
> Chel Hee Lee
> 
> On 12/07/2014 08:21 AM, Morway, Eric wrote:
> > Using the dataset "dat" (found below), I'm seeking a way to condense
> > down the data.frame such that each "site" (i.e., "CID_1"..."CID_13")
> > has a maximum of 7 rows of post-processed data, where the first 6 have
> > the highest "countPercentage" and the 7th row is the sum of
> "countPercentage"
> > from all other rows within that "site", and it is assigned the name
> > "Other".  So, for the first two sites in the provided data.frame,
> > CID_1 & CID_10, they would reduce to:
> >
> > CID_1 Cyanobacteria 37.48
> > CID_1 Proteobacteria 29.24
> > CID_1 Bacteroidetes  15.83
> > CID_1 Verrucomicrobia  6.38
> > CID_1 Firmicutes  2.30
> > CID_1 Acidobacteria  2.08
> > CID_1 Other6.68
> > CID_10 Proteobacteria 35.37
> > CID_10 Bacteroidetes  25.19
> > CID_10 Cyanobacteria 23.29
> > CID_10 Verrucomicrobia  6.97
> > CID_10 Acidobacteria  1.99
> > CID_10 Actinobacteria 1.64
> > CID_10 Other 5.55
> >
> >
> > dat <- read.table(header=TRUE, sep=",",
> > text="site,tax_name,count,countTotal,countPercentage
> > CID_1,Cyanobacteria,46295,123509,37.483098398
> > CID_1,Proteobacteria,36120,123509,29.244832360
> > CID_1,Bacteroidetes,19546,123509,15.825567368
> > CID_1,Verrucomicrobia,7886,123509,6.384959801
> > CID_1,Firmicutes,2843,123509,2.301856545
> > CID_1,Acidobacteria,2563,123509,2.075152418
> > CID_1,Actinobacteria,2069,123509,1.675181566
> > CID_1,Planctomycetes,1481,123509,1.199102899
> > CID_1,Chloroflexi,1181,123509,0.956205621
> > CID_1,Gemmatimonadetes,956,123509,0.774032662
> > CID_1,Spirochaetes,688,123509,0.557044426
> > CID_1,Lentisphaerae,526,123509,0.425879895
> > CID_1,Ignavibacteriae,324,123509,0.262329061
> > CID_1,Chlorobi,238,123509,0.192698508
> > CID_1,Nitrospirae,230,123509,0.186221247
> > CID_1,Nitrospinae,169,123509,0.136832134
> > CID_1,Elusimicrobia,131,123509,0.106065145
> > CID_1,Tenericutes,114,123509,0.092300966
> > CID_1,Fibrobacteres,72,123509,0.058295347
> > CID_1

Re: [R] Condensing data.frame

2014-12-07 Thread Chel Hee Lee

> datBySite <- split(dat, dat$site)
> output <- lapply(datBySite, function(x){
+ x$idx <- seq_len(nrow(x))
+ x$grp <- ifelse(x$idx < 7, x$idx, 7)
+ rval <- tapply(x$countPercentage, x$grp, sum)
+ x$grp <- x$count <- x$countTotal <- NULL
+ x <- x[seq_len(7), ]
+ x$tax_name <- as.character(x$tax_name)
+ x$tax_name[7] <- "Others"
+ x$new <- rval
+ return(x)
+ })
>
> head(do.call(rbind, output), 14)
sitetax_name countPercentage idx   new
CID_1.1CID_1   Cyanobacteria   37.483098   1 37.483098
CID_1.2CID_1  Proteobacteria   29.244832   2 29.244832
CID_1.3CID_1   Bacteroidetes   15.825567   3 15.825567
CID_1.4CID_1 Verrucomicrobia6.384960   4  6.384960
CID_1.5CID_1  Firmicutes2.301857   5  2.301857
CID_1.6CID_1   Acidobacteria2.075152   6  2.075152
CID_1.7CID_1  Others1.675182   7  6.684533
CID_10.27 CID_10  Proteobacteria   35.366606   1 35.366606
CID_10.28 CID_10   Bacteroidetes   25.188484   2 25.188484
CID_10.29 CID_10   Cyanobacteria   23.294828   3 23.294828
CID_10.30 CID_10 Verrucomicrobia6.970592   4  6.970592
CID_10.31 CID_10   Acidobacteria1.988448   5  1.988448
CID_10.32 CID_10  Actinobacteria1.644548   6  1.644548
CID_10.33 CID_10  Others1.582823   7  5.546493
>

I hope this helps.

Chel Hee Lee

On 12/07/2014 08:21 AM, Morway, Eric wrote:

Using the dataset "dat" (found below), I'm seeking a way to condense down
the data.frame such that each "site" (i.e., "CID_1"..."CID_13") has a
maximum of 7 rows of post-processed data, where the first 6 have the
highest "countPercentage" and the 7th row is the sum of "countPercentage"
from all other rows within that "site", and it is assigned the name
"Other".  So, for the first two sites in the provided data.frame, CID_1 &
CID_10, they would reduce to:

CID_1 Cyanobacteria 37.48
CID_1 Proteobacteria 29.24
CID_1 Bacteroidetes  15.83
CID_1 Verrucomicrobia  6.38
CID_1 Firmicutes  2.30
CID_1 Acidobacteria  2.08
CID_1 Other6.68
CID_10 Proteobacteria 35.37
CID_10 Bacteroidetes  25.19
CID_10 Cyanobacteria 23.29
CID_10 Verrucomicrobia  6.97
CID_10 Acidobacteria  1.99
CID_10 Actinobacteria 1.64
CID_10 Other 5.55


dat <- read.table(header=TRUE, sep=",",
text="site,tax_name,count,countTotal,countPercentage
CID_1,Cyanobacteria,46295,123509,37.483098398
CID_1,Proteobacteria,36120,123509,29.244832360
CID_1,Bacteroidetes,19546,123509,15.825567368
CID_1,Verrucomicrobia,7886,123509,6.384959801
CID_1,Firmicutes,2843,123509,2.301856545
CID_1,Acidobacteria,2563,123509,2.075152418
CID_1,Actinobacteria,2069,123509,1.675181566
CID_1,Planctomycetes,1481,123509,1.199102899
CID_1,Chloroflexi,1181,123509,0.956205621
CID_1,Gemmatimonadetes,956,123509,0.774032662
CID_1,Spirochaetes,688,123509,0.557044426
CID_1,Lentisphaerae,526,123509,0.425879895
CID_1,Ignavibacteriae,324,123509,0.262329061
CID_1,Chlorobi,238,123509,0.192698508
CID_1,Nitrospirae,230,123509,0.186221247
CID_1,Nitrospinae,169,123509,0.136832134
CID_1,Elusimicrobia,131,123509,0.106065145
CID_1,Tenericutes,114,123509,0.092300966
CID_1,Fibrobacteres,72,123509,0.058295347
CID_1,Thermotogae,21,123509,0.017002810
CID_1,Fusobacteria,21,123509,0.017002810
CID_1,Armatimonadetes,15,123509,0.012144864
CID_1,Synergistetes,10,123509,0.008096576
CID_1,Deinococcus-Thermus,6,123509,0.004857946
CID_1,Deferribacteres,2,123509,0.001619315
CID_1,Caldiserica,2,123509,0.001619315
CID_10,Proteobacteria,16043,45362,35.366606411
CID_10,Bacteroidetes,11426,45362,25.188483753
CID_10,Cyanobacteria,10567,45362,23.294828270
CID_10,Verrucomicrobia,3162,45362,6.970592126
CID_10,Acidobacteria,902,45362,1.988448481
CID_10,Actinobacteria,746,45362,1.644548300
CID_10,Firmicutes,718,45362,1.582822627
CID_10,Gemmatimonadetes,358,45362,0.789206825
CID_10,Planctomycetes,357,45362,0.787002337
CID_10,Chloroflexi,265,45362,0.584189410
CID_10,Spirochaetes,235,45362,0.518054759
CID_10,Ignavibacteriae,177,45362,0.390194436
CID_10,Lentisphaerae,108,45362,0.238084741
CID_10,Nitrospinae,75,45362,0.165336625
CID_10,Nitrospirae,58,45362,0.127860324
CID_10,Chlorobi,44,45362,0.096997487
CID_10,Elusimicrobia,28,45362,0.061725673
CID_10,Fibrobacteres,26,45362,0.057316697
CID_10,Armatimonadetes,15,45362,0.033067325
CID_10,Deinococcus-Thermus,13,45362,0.028658348
CID_10,Tenericutes,10,45362,0.022044883
CID_10,Synergistetes,9,45362,0.019840395
CID_10,Fusobacteria,9,45362,0.019840395
CID_10,Deferribacteres,6,45362,0.013226930
CID_10,Thermotogae,3,45362,0.006613465
CID_10,Caldiserica,2,45362,0.004408977
CID_11,Proteobacteria,10019,31387,31.920858954
CID_11,Cyanobacteria,8811,31387,28.072131774
CID_11,Bacteroidetes,7930,31387,25.265237200
CID_11,Verrucomicrobia,1750,31387,5.575556759
CID_11,Firmicutes,806,31387,2.567942142
CID_11,Acidobacteria,548,31387,1.745945774
CID_11,Actinobacteria,434,31387,1.382738076
CID_11,Chloroflexi,203,31387,0.646764584
CID_1

[R] Condensing data.frame

2014-12-07 Thread Morway, Eric
Using the dataset "dat" (found below), I'm seeking a way to condense down
the data.frame such that each "site" (i.e., "CID_1"..."CID_13") has a
maximum of 7 rows of post-processed data, where the first 6 have the
highest "countPercentage" and the 7th row is the sum of "countPercentage"
from all other rows within that "site", and it is assigned the name
"Other".  So, for the first two sites in the provided data.frame, CID_1 &
CID_10, they would reduce to:

CID_1 Cyanobacteria 37.48
CID_1 Proteobacteria 29.24
CID_1 Bacteroidetes  15.83
CID_1 Verrucomicrobia  6.38
CID_1 Firmicutes  2.30
CID_1 Acidobacteria  2.08
CID_1 Other6.68
CID_10 Proteobacteria 35.37
CID_10 Bacteroidetes  25.19
CID_10 Cyanobacteria 23.29
CID_10 Verrucomicrobia  6.97
CID_10 Acidobacteria  1.99
CID_10 Actinobacteria 1.64
CID_10 Other 5.55


dat <- read.table(header=TRUE, sep=",",
text="site,tax_name,count,countTotal,countPercentage
CID_1,Cyanobacteria,46295,123509,37.483098398
CID_1,Proteobacteria,36120,123509,29.244832360
CID_1,Bacteroidetes,19546,123509,15.825567368
CID_1,Verrucomicrobia,7886,123509,6.384959801
CID_1,Firmicutes,2843,123509,2.301856545
CID_1,Acidobacteria,2563,123509,2.075152418
CID_1,Actinobacteria,2069,123509,1.675181566
CID_1,Planctomycetes,1481,123509,1.199102899
CID_1,Chloroflexi,1181,123509,0.956205621
CID_1,Gemmatimonadetes,956,123509,0.774032662
CID_1,Spirochaetes,688,123509,0.557044426
CID_1,Lentisphaerae,526,123509,0.425879895
CID_1,Ignavibacteriae,324,123509,0.262329061
CID_1,Chlorobi,238,123509,0.192698508
CID_1,Nitrospirae,230,123509,0.186221247
CID_1,Nitrospinae,169,123509,0.136832134
CID_1,Elusimicrobia,131,123509,0.106065145
CID_1,Tenericutes,114,123509,0.092300966
CID_1,Fibrobacteres,72,123509,0.058295347
CID_1,Thermotogae,21,123509,0.017002810
CID_1,Fusobacteria,21,123509,0.017002810
CID_1,Armatimonadetes,15,123509,0.012144864
CID_1,Synergistetes,10,123509,0.008096576
CID_1,Deinococcus-Thermus,6,123509,0.004857946
CID_1,Deferribacteres,2,123509,0.001619315
CID_1,Caldiserica,2,123509,0.001619315
CID_10,Proteobacteria,16043,45362,35.366606411
CID_10,Bacteroidetes,11426,45362,25.188483753
CID_10,Cyanobacteria,10567,45362,23.294828270
CID_10,Verrucomicrobia,3162,45362,6.970592126
CID_10,Acidobacteria,902,45362,1.988448481
CID_10,Actinobacteria,746,45362,1.644548300
CID_10,Firmicutes,718,45362,1.582822627
CID_10,Gemmatimonadetes,358,45362,0.789206825
CID_10,Planctomycetes,357,45362,0.787002337
CID_10,Chloroflexi,265,45362,0.584189410
CID_10,Spirochaetes,235,45362,0.518054759
CID_10,Ignavibacteriae,177,45362,0.390194436
CID_10,Lentisphaerae,108,45362,0.238084741
CID_10,Nitrospinae,75,45362,0.165336625
CID_10,Nitrospirae,58,45362,0.127860324
CID_10,Chlorobi,44,45362,0.096997487
CID_10,Elusimicrobia,28,45362,0.061725673
CID_10,Fibrobacteres,26,45362,0.057316697
CID_10,Armatimonadetes,15,45362,0.033067325
CID_10,Deinococcus-Thermus,13,45362,0.028658348
CID_10,Tenericutes,10,45362,0.022044883
CID_10,Synergistetes,9,45362,0.019840395
CID_10,Fusobacteria,9,45362,0.019840395
CID_10,Deferribacteres,6,45362,0.013226930
CID_10,Thermotogae,3,45362,0.006613465
CID_10,Caldiserica,2,45362,0.004408977
CID_11,Proteobacteria,10019,31387,31.920858954
CID_11,Cyanobacteria,8811,31387,28.072131774
CID_11,Bacteroidetes,7930,31387,25.265237200
CID_11,Verrucomicrobia,1750,31387,5.575556759
CID_11,Firmicutes,806,31387,2.567942142
CID_11,Acidobacteria,548,31387,1.745945774
CID_11,Actinobacteria,434,31387,1.382738076
CID_11,Chloroflexi,203,31387,0.646764584
CID_11,Planctomycetes,197,31387,0.627648389
CID_11,Gemmatimonadetes,192,31387,0.611718227
CID_11,Ignavibacteriae,87,31387,0.277184822
CID_11,Spirochaetes,80,31387,0.254882595
CID_11,Tenericutes,71,31387,0.226208303
CID_11,Fusobacteria,67,31387,0.213464173
CID_11,Lentisphaerae,54,31387,0.172045751
CID_11,Chlorobi,40,31387,0.127441297
CID_11,Nitrospinae,33,31387,0.105139070
CID_11,Armatimonadetes,22,31387,0.070092714
CID_11,Fibrobacteres,15,31387,0.047790487
CID_11,Nitrospirae,13,31387,0.041418422
CID_11,Elusimicrobia,13,31387,0.041418422
CID_11,Deinococcus-Thermus,2,31387,0.006372065
CID_12,Cyanobacteria,241,644,37.422360248
CID_12,Bacteroidetes,210,644,32.608695652
CID_12,Proteobacteria,118,644,18.322981366
CID_12,Verrucomicrobia,38,644,5.900621118
CID_12,Acidobacteria,11,644,1.708074534
CID_12,Ignavibacteriae,6,644,0.931677019
CID_12,Lentisphaerae,5,644,0.776397516
CID_12,Firmicutes,5,644,0.776397516
CID_12,Planctomycetes,3,644,0.465838509
CID_12,Fusobacteria,3,644,0.465838509
CID_12,Tenericutes,2,644,0.310559006
CID_12,Actinobacteria,2,644,0.310559006
CID_13,Cyanobacteria,8581,25530,33.611437524
CID_13,Bacteroidetes,6878,25530,26.940853897
CID_13,Proteobacteria,5341,25530,20.920485703
CID_13,Verrucomicrobia,1244,25530,4.872698786
CID_13,Firmicutes,1148,25530,4.496670584
CID_13,Acidobacteria,548,25530,2.146494320
CID_13,Spirochaetes,477,25530,1.868390129
CID_13,Ignavibacteriae,298,25530,1.167254211
CID_13,Ac