Re: [R] Count data in random Forest

2023-05-18 Thread Bert Gunter
This is R-**Help**, not R- **we give you all the answers.** Please
read and follow the posting guide linked below for what sorts of
questions are appropriate for this list and what you should include
when asking them.

Incidentally, there are several packages in R that do (versions of)
random forests. You can read about at least some of them in the
**Random Forests** section of the Machine Learning Task View here:
https://CRAN.R-project.org/view=MachineLearning

Many R packages have, besides extensive Help pages, so-called
"vignettes", tutorials that help you use them. If available, you
should consult these before posting here.

Cheers,
Bert


-- Bert

On Thu, May 18, 2023 at 11:00 AM Suriya Kannan  wrote:
>
> Respected Sir
> Good Evening. My name is V.Suriya, I am a research scholar. Doing my Ph.D
> at University of Madras, Tamil Nadu, India. I need the r code for random
> forest count data. It helps me lot to complete my research work sir.
>
> And also need the r code for comparison of predictors with the help of
> mtry, best size, best node.
>
> Thanks and Regards
> V Suriya
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count data in random Forest

2023-05-18 Thread Suriya Kannan
Respected Sir
Good Evening. My name is V.Suriya, I am a research scholar. Doing my Ph.D
at University of Madras, Tamil Nadu, India. I need the r code for random
forest count data. It helps me lot to complete my research work sir.

And also need the r code for comparison of predictors with the help of
mtry, best size, best node.

Thanks and Regards
V Suriya

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data as independent variable in logistinc regression

2012-10-02 Thread Bert Gunter
This is not primarily an R question, although I grant you that it
might intersect packages in R that do what you want. Nevertheless, I
think you would do better posting on a statistical list, like
stats.stackexchange.com . Maybe once you've figured out there what you
want, you can come back to R to find an implementation.

Cheers,
Bert

On Tue, Oct 2, 2012 at 9:10 AM,   wrote:
>
> Dear R users,
>
> I would like to employ count data as covariates while fitting a logistic
> regression model. My question is:
>
> do I violate any assumption of the logistic (and, more in general, of the
> generalized linear) models by employing count, non-negative integer
> variables as independent variables?
>
> I found a lot of references in the literature regarding hot to use count
> data as outcome, but not as covariates; see for example the very clear
> paper: "N E Breslow (1996) Generalized Linear Models: Checking Assumptions
> and Strengthening Conclusions, Congresso Nazionale Societa Italiana di
> Biometria, Cortona June 1995", available at
> http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf.
>
> Loosely speaking, it seems that glm assumptions may be expressed as follows:
>
> iid residuals;
> the link function must correctly represent the relationship among dependent
> and independent variables;
> absence of outliers
>
> Does everybody knows whether there exists any other assumption/technical
> problem that may suggest to use some other type of models for dealing with
> count covariates?
>
> Finally, please notice that my data contain relatively few samples (<100)
> and that count variables' ranges can vary within 3-4 order of magnitude
> (i.e. some variables has value in the range 0-10, while other variables may
> have values within 0-1).
>
> A simple example code follows:
>
> ###
>
> #genrating simulated data
> var1 = sample(0:10, 100, replace = TRUE);
> var2 = sample(0:1000, 100, replace = TRUE);
> var3 = sample(0:10, 100, replace = TRUE);
> outcome = sample(0:1, 100, replace = TRUE);
> dataset = data.frame(outcome, var1, var2, var3);
>
> #fitting the model
> model = glm(outcome ~ ., family=binomial, data = dataset)
>
> #inspecting the model
> print(model)
>
> ###
>
> Regards,
>
> --
> Vincenzo Lagani
> Research Fellow
> BioInformatics Laboratory
> Institute of Computer Science
> Foundation for Research and Technology - Hellas
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count data as independent variable in logistinc regression

2012-10-02 Thread vlagani


Dear R users,

I would like to employ count data as covariates while fitting a  
logistic regression model. My question is:


do I violate any assumption of the logistic (and, more in general, of  
the generalized linear) models by employing count, non-negative  
integer variables as independent variables?


I found a lot of references in the literature regarding hot to use  
count data as outcome, but not as covariates; see for example the very  
clear paper: "N E Breslow (1996) Generalized Linear Models: Checking  
Assumptions and Strengthening Conclusions, Congresso Nazionale Societa  
Italiana di Biometria, Cortona June 1995", available at

http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf.

Loosely speaking, it seems that glm assumptions may be expressed as follows:

iid residuals;
the link function must correctly represent the relationship among  
dependent and independent variables;

absence of outliers

Does everybody knows whether there exists any other  
assumption/technical problem that may suggest to use some other type  
of models for dealing with count covariates?


Finally, please notice that my data contain relatively few samples  
(<100) and that count variables' ranges can vary within 3-4 order of  
magnitude (i.e. some variables has value in the range 0-10, while  
other variables may have values within 0-1).


A simple example code follows:

###

#genrating simulated data
var1 = sample(0:10, 100, replace = TRUE);
var2 = sample(0:1000, 100, replace = TRUE);
var3 = sample(0:10, 100, replace = TRUE);
outcome = sample(0:1, 100, replace = TRUE);
dataset = data.frame(outcome, var1, var2, var3);

#fitting the model
model = glm(outcome ~ ., family=binomial, data = dataset)

#inspecting the model
print(model)

###

Regards,

--
Vincenzo Lagani
Research Fellow
BioInformatics Laboratory
Institute of Computer Science
Foundation for Research and Technology - Hellas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data without NA in certain time intervals and plot it

2012-06-17 Thread arun
Hi,

Sorry, I didn't understand your question in the first post.  I saw Rui's reply 
and your reply that it is solved.

I have another solution if it helps you.


dattrial<-data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4))
dattrial_wk3<-subset(dattrial,Week==3)
dattrial_wk4<-subset(dattrial,Week==4)
count1<-colSums(!is.na(dattrial_wk3))
count2<-colSums(!is.na(dattrial_wk4))
dattrialnew<-data.frame(rbind(count1[1],count2[2]),Week=(rle(dattrial$Week)$values))
 plot(dattrialnew$Week,dattrialnew$a,type="l",col="blue",pch=14,xlab="Week",ylab="Count")


A.K.




- Original Message -
From: Tagmarie 
To: r-help@r-project.org
Cc: 
Sent: Sunday, June 17, 2012 5:40 AM
Subject: Re: [R] count data without NA in certain time intervals and plot it

Thank you Arun for your time!
Your idea is maybe only the first step to what I want but it was
nevertheless a new tool for me and interessing to learn. 

I added a "week"-column to your data set: 
dattrial<-data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4))

I am looking for a way to count the number of rows for each week which do
contain data (without NA).
In the next step I want to create a graph which shows the week on the x-axis
and the counted number of data for each week on the y-axis. 

Thank you! 

--
View this message in context: 
http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611p4633635.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data without NA in certain time intervals and plot it

2012-06-17 Thread Tagmarie
Great! That works! 
Thank you Rui! 
I would have spent days (which I don't have left before handing my report
in) getting there by myself! 
Have a great rest-weekend!

--
View this message in context: 
http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611p4633638.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data without NA in certain time intervals and plot it

2012-06-17 Thread Rui Barradas

Hello,

I've seen your reply to arun's reply and gave it a try.
Since arun's code included more than one column, I've added another in 
one of the examples.


# Example 1
dattrial1 <- data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4))

d1 <- split(dattrial1, dattrial1$Week)
count <- sapply(d1, function(x) sum(!is.na(x$a)))
count


# Example 2
dattrial2 <- data.frame(a=c(1,NA,rnorm(4,10)), b=c(1,2,NA,3,4,6), 
Week=c(3,3,3,4,4,4))


d2 <- split(dattrial2, dattrial2$Week)
count <- sapply(d2, function(x){
yes <- apply(x, 1, function(y) all(!is.na(y)))
sum(yes)
})
count


# Works for both examples
plot(names(count), count, type="b", col="red", pch=16)


Hope this helps,

Rui Barradas

Em 16-06-2012 21:11, Tagmarie escreveu:

Hello,
I'm quite new to R and still spend hours trying to figure out single things
so I hope nobody rolls his eyes over my question.

I have a data set over time and converted it to the POSTIXct format. I added
a column in the data set for the week and the month.

I try to get a plot which shows the weeks on the x-axis and the number of
datasets without NAs on the y-axis. That doesn't sound too difficult but I
can't figure it out.

Does anybody have an idea?

--
View this message in context: 
http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data without NA in certain time intervals and plot it

2012-06-17 Thread Tagmarie
Thank you Arun for your time!
Your idea is maybe only the first step to what I want but it was
nevertheless a new tool for me and interessing to learn. 

I added a "week"-column to your data set: 
dattrial<-data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4))

I am looking for a way to count the number of rows for each week which do
contain data (without NA).
In the next step I want to create a graph which shows the week on the x-axis
and the counted number of data for each week on the y-axis. 

Thank you! 

--
View this message in context: 
http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611p4633635.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data without NA in certain time intervals and plot it

2012-06-16 Thread arun


Hi,



Not quite understand the question.
Do you want to select only certain columns or rows without NAs?

Suppose, I have a dataset such as the one below:
dattrial<-data.frame(a=c(1,NA,rnorm(4,10)),b=c(NA,NA,NA,3,4,6),c=c(sample(LETTERS[1:3],replace=TRUE),
 sample(LETTERS[3:5],3,replace=TRUE)),d=runif(6,0.4))
# to eliminate the rows with NAs
dattrial1<-dattrial[complete.cases(dattrial),]
# to delete columns with NAs
dattrial1<-dattrial[,colSums(is.na(dattrial))==0]
or
dattrial1<-dattrial[rowSums(is.na(dattrial))==0,]

A.K.



- Original Message -
From: Tagmarie 
To: r-help@r-project.org
Cc: 
Sent: Saturday, June 16, 2012 4:11 PM
Subject: [R] count data without NA in certain time intervals and plot it

Hello, 
I'm quite new to R and still spend hours trying to figure out single things
so I hope nobody rolls his eyes over my question. 

I have a data set over time and converted it to the POSTIXct format. I added
a column in the data set for the week and the month. 

I try to get a plot which shows the weeks on the x-axis and the number of
datasets without NAs on the y-axis. That doesn't sound too difficult but I
can't figure it out. 

Does anybody have an idea?

--
View this message in context: 
http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count data without NA in certain time intervals and plot it

2012-06-16 Thread Tagmarie
Hello, 
I'm quite new to R and still spend hours trying to figure out single things
so I hope nobody rolls his eyes over my question. 

I have a data set over time and converted it to the POSTIXct format. I added
a column in the data set for the week and the month. 

I try to get a plot which shows the weeks on the x-axis and the number of
datasets without NAs on the y-axis. That doesn't sound too difficult but I
can't figure it out. 

Does anybody have an idea?

--
View this message in context: 
http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data

2011-02-25 Thread ONKELINX, Thierry
Dear Sacha,

Do you revisit the same locations per site? If so, use (1|site/location) as 
random effect. Otherwise use just (1|site). You might want to add a crossed 
random effect (1|date) if you can expect an effect of phenology.

Best regards,

Thierry

PS R-sig-mixed-models is a better list for this kind of questions.


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey
  

> -Oorspronkelijk bericht-
> Van: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] Namens Sacha Viquerat
> Verzonden: vrijdag 25 februari 2011 13:16
> Aan: r-help
> Onderwerp: [R] count data
> 
> hello dear list! I wonder about the layout of my csv for my 
> study design:
> 
> i have 11 different sites.
> 
> each site had been visited 9 times.
> 
> on each visit, 6 distinctive water parameters had been taken 
> ONCE on each visit (as continuous variables).
> 
> on each visit, the fish abundance was counted using a net at 
> 3 different locations within the site (count data).
> 
> I know i will have to do an lmer using the nested locations 
> as error term. Question is: how to organize my data, since i 
> have abundances from the same 3 locations per site replicate 
> but only one water parameter measurement per site replicate. 
> to give you an idea, heres the basic look so far of my csv:
> 
> 
> sitelocationabundancepHno3and so on...
> A1127.10.003...
> A2157.10.003...
> A3187.10.003...
> B1117.40.004...
> B287.40.004...
> B3177.40.004...
> A1137.20.001...
> A2197.20.001...
> A3217.20.001...
> B196.90.002...
> B256.90.002...
> B326.90.002...
> 
> i just made up the table to give an idea how the data looks 
> like. the goal would be to analyze fish abundance ~ water 
> parameters, does anyone have a suggestion?
> 
> thanks in advance!
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count data

2011-02-25 Thread Sacha Viquerat

hello dear list! I wonder about the layout of my csv for my study design:

i have 11 different sites.

each site had been visited 9 times.

on each visit, 6 distinctive water parameters had been taken ONCE on 
each visit (as continuous variables).


on each visit, the fish abundance was counted using a net at 3 different 
locations within the site (count data).


I know i will have to do an lmer using the nested locations as error 
term. Question is: how to organize my data, since i have abundances from 
the same 3 locations per site replicate but only one water parameter 
measurement per site replicate. to give you an idea, heres the basic 
look so far of my csv:



sitelocationabundancepHno3and so on...
A1127.10.003...
A2157.10.003...
A3187.10.003...
B1117.40.004...
B287.40.004...
B3177.40.004...
A1137.20.001...
A2197.20.001...
A3217.20.001...
B196.90.002...
B256.90.002...
B326.90.002...

i just made up the table to give an idea how the data looks like. the 
goal would be to analyze fish abundance ~ water parameters, does anyone 
have a suggestion?


thanks in advance!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with a specific range

2010-06-24 Thread Joris Meys
see ?levels eg:

x <- rnorm(10)
y <- cut(x,c(-10,0,10))
levels(y)<-c("-10-0","0-10")

cheers
Joris

On Thu, Jun 24, 2010 at 4:14 AM, Yi  wrote:
> Yeap. It works. Just to make the result more beautiful.
>
> One more question.
>
> The interval is showns as (0,10].
>
> Is there a way to change it into the format 0-10?
> Thanks.
>
> On Wed, Jun 23, 2010 at 6:12 PM, Joris Meys  wrote:
>>
>> see ?cut
>>
>> Cheers
>> Joris
>>
>> On Thu, Jun 24, 2010 at 2:57 AM, Yi  wrote:
>> > I would like to prepare the data for barplot. But I only have the data
>> > frame
>> > now.
>> >
>> > x1=rnorm(10,mean=2)
>> > x2=rnorm(20,mean=-1)
>> > x3=rnorm(15,mean=3)
>> > data=data.frame(x1,x2,x3)
>> >
>> > If there a way to put data within a specific range? The expected result
>> > is
>> > as follows:
>> >  range       x1                  x2                    x3
>> > -10-0        2                      5                     1  (# points
>> > in
>> > this range)
>> > 0-10         7                     9                      6
>> > ...
>> >
>> > I know the table function but I do not know how to deal with the range
>> > issue.
>> >
>> > Thanks in advance.
>> >
>> > Yi
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Ghent University
>> Faculty of Bioscience Engineering
>> Department of Applied mathematics, biometrics and process control
>>
>> tel : +32 9 264 59 87
>> joris.m...@ugent.be
>> ---
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with a specific range

2010-06-23 Thread Joris Meys
see ?cut

Cheers
Joris

On Thu, Jun 24, 2010 at 2:57 AM, Yi  wrote:
> I would like to prepare the data for barplot. But I only have the data frame
> now.
>
> x1=rnorm(10,mean=2)
> x2=rnorm(20,mean=-1)
> x3=rnorm(15,mean=3)
> data=data.frame(x1,x2,x3)
>
> If there a way to put data within a specific range? The expected result is
> as follows:
>  range       x1                  x2                    x3
> -10-0        2                      5                     1  (# points in
> this range)
> 0-10         7                     9                      6
> ...
>
> I know the table function but I do not know how to deal with the range
> issue.
>
> Thanks in advance.
>
> Yi
>
>        [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count data with a specific range

2010-06-23 Thread Yi
I would like to prepare the data for barplot. But I only have the data frame
now.

x1=rnorm(10,mean=2)
x2=rnorm(20,mean=-1)
x3=rnorm(15,mean=3)
data=data.frame(x1,x2,x3)

If there a way to put data within a specific range? The expected result is
as follows:
 range   x1  x2x3
-10-02  5 1  (# points in
this range)
0-10 7 9  6
...

I know the table function but I do not know how to deal with the range
issue.

Thanks in advance.

Yi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count data categories from table

2009-07-17 Thread John Kane

aggregate?

Something like this should work although it is not very elegant.

mydata <- data.frame(aa=rep(letters[1:10],2), bb=rnorm(20, 5,1))
ss <- aggregate(mydata[,2],list(aa=mydata$aa), sum)
pie(ss[,2])

A more serious problem is that the results are going to be close to 
uninterpretable.  Pie charts just are not very good for this amount of data.  
Have a look at the notes section of ?pie.
You might want to consider using a dot.chart.  


--- On Fri, 7/17/09, Miroslav Nikolov  wrote:

> From: Miroslav Nikolov 
> Subject: [R]  Count data categories from table
> To: r-help@r-project.org
> Received: Friday, July 17, 2009, 5:47 AM
> 
> Hi there,
> 
> I have a relatively simple question, though, I couldn't
> find a solution for
> it so far. I have a table with 1000 entries and columns
> containing
> information about different parameters for each entry.
> What I want to do is group all parameters from one of the
> columns [e.g. if
> all 1000 entries are grouped in 30 different categories
> (described as
> character strings) in a second column] and have a pie chart
> describing the
> distribution of all 1000 entries into these 30 categories.
> The problem I have is to make R count how many times each
> of the 30
> categories is present in the table; then if I have them
> counted (e.g. if I
> have category1 - 234 times,  category2 - 356 times,
> etc. in a vector/table)
> the rest will be easier.
> 
> Thanx for the help in advance!
> 
> Best,
> Miro
> -- 
> View this message in context: 
> http://www.nabble.com/Count-data-categories-from-table-tp24531524p24531524.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org
> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
> 


  __
Ask a question on any topic and get answers from real people. Go to Yahoo! 
Answers and share what you know at http://ca.answers.yahoo.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count data categories from table

2009-07-17 Thread David Winsemius


On Jul 17, 2009, at 5:47 AM, Miroslav Nikolov wrote:



Hi there,

I have a relatively simple question, though, I couldn't find a  
solution for

it so far. I have a table with 1000 entries and columns containing
information about different parameters for each entry.
What I want to do is group all parameters from one of the columns  
[e.g. if

all 1000 entries are grouped in 30 different categories (described as
character strings) in a second column] and have a pie chart  
describing the

distribution of all 1000 entries into these 30 categories.
The problem I have is to make R count how many times each of the 30
categories is present in the table; then if I have them counted  
(e.g. if I
have category1 - 234 times,  category2 - 356 times, etc. in a vector/ 
table)

the rest will be easier.


Several options are available. An example dataset would have made  
understanding your setup much clearer. I am having trouble parsing  
your natural language presentation of the problem.

Perhaps these examples will help:

> y <- data.frame(x=sample(LETTERS[1:5], 20, replace=TRUE)  )

> table(y$x)

A B C D E
3 5 2 7 3
> xtabs(~x, data=y)
x
A B C D E
3 5 2 7 3
> ?tapply
> tapply(y$x, y$x, length)
A B C D E
3 5 2 7 3

Pie charts are deprecated on this list, so it's not surprising you  
have difficulty finding examples, but surely you can find worked  
examples, nonetheless. The search sites to consult include:

http://search.r-project.org/nmz.html
http://addictedtor.free.fr/graphiques/

--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Count data categories from table

2009-07-17 Thread Miroslav Nikolov

Hi there,

I have a relatively simple question, though, I couldn't find a solution for
it so far. I have a table with 1000 entries and columns containing
information about different parameters for each entry.
What I want to do is group all parameters from one of the columns [e.g. if
all 1000 entries are grouped in 30 different categories (described as
character strings) in a second column] and have a pie chart describing the
distribution of all 1000 entries into these 30 categories.
The problem I have is to make R count how many times each of the 30
categories is present in the table; then if I have them counted (e.g. if I
have category1 - 234 times,  category2 - 356 times, etc. in a vector/table)
the rest will be easier.

Thanx for the help in advance!

Best,
Miro
-- 
View this message in context: 
http://www.nabble.com/Count-data-categories-from-table-tp24531524p24531524.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count data with several numbers separated by commas

2009-04-16 Thread Bill.Venables
It rather depends on how you have your data stored.  Here is one possibility 
you might want to look at:

> con <- textConnection("
+ id_name  x1   x2 x3
+ aa101  1,4,5   2 1
+ aa102  1,2,5   1 2
+ aa103  1,2,5   1 1
+ aa104  1,2,3   1 2
+ aa105  1,5   2 2
+ aa106  1,2,5   2 2
+ aa107  1,2,5   2 1
+ aa108  1,4,5   2 1
+ aa109  1,2   1 2
+ aa110  3,5   1 2")
> 
> dat <- read.table(con, header = TRUE)
> 
> x1_all <- as.numeric(unlist(strsplit(as.character(dat$x1), ",")))
> 
> x1_all
 [1] 1 4 5 1 2 5 1 2 5 1 2 3 1 5 1 2 5 1 2 5 1 4 5 1 2 3 5
> table(x1_all)
x1_all
1 2 3 4 5 
9 6 2 2 8 
> 

Bill Venables
http://www.cmis.csiro.au/bill.venables/ 


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Xiyan Lon
Sent: Thursday, 16 April 2009 4:04 PM
To: r-h...@stat.math.ethz.ch
Subject: [R] Count data with several numbers separated by commas

Dear all,
I have a data file with 3 variables (x1, x2, x3) where variable x1
have data that consists of several numbers separated by commas.

id name  x1   x2x3
aa1011,4,52 1
aa1021,2,51 2
aa1031,2,51 1
aa1041,2,31 2
aa1051,5  2 2
aa1061,2,52 2
aa1071,2,52 1
aa1081,4,52 1
aa1091,2  1 2
aa1103,5  1 2


I want to count the number of data for each variables and make barplot
for each variables.
I know how to count for variable x2 and x3 and make barplot for x2 and
x3, but I don't know how to count data in variable x1.
Are there any trick how to count data in variable x1?
The result maybe like:

x1
1 9
2 6
3 2
4 4
5 8


x2
1 5
2 5

x3
1 4
2 6


Thank you for any help.

Xiyanlon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count data with several numbers separated by commas

2009-04-15 Thread Simon Blomberg
Here's a solution, though it may be overcomplicated. I assume the data
frame is called "dat":


 vec <- unlist(lapply(strsplit(dat$x1, ","), function (x)
summary(as.factor(x

> table(names(vec))

1 2 3 4 5 
9 6 2 2 8

Cheers,

Simon.

On Thu, 2009-04-16 at 13:03 +0700, Xiyan Lon wrote:
> Dear all,
> I have a data file with 3 variables (x1, x2, x3) where variable x1
> have data that consists of several numbers separated by commas.
> 
> id namex1   x2x3
> aa101  1,4,52 1
> aa102  1,2,51 2
> aa103  1,2,51 1
> aa104  1,2,31 2
> aa105  1,5  2 2
> aa106  1,2,52 2
> aa107  1,2,52 1
> aa108  1,4,52 1
> aa109  1,2  1 2
> aa110  3,5  1 2
>   
> 
> I want to count the number of data for each variables and make barplot
> for each variables.
> I know how to count for variable x2 and x3 and make barplot for x2 and
> x3, but I don't know how to count data in variable x1.
> Are there any trick how to count data in variable x1?
> The result maybe like:
> 
> x1
> 1 9
> 2 6
> 3 2
> 4 4
> 5 8
> 
> 
> x2
> 1 5
> 2 5
> 
> x3
> 1 4
> 2 6
> 
> 
> Thank you for any help.
> 
> Xiyanlon
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Simon Blomberg, BSc (Hons), PhD, MAppStat. 
Lecturer and Consultant Statistician 
School of Biological Sciences
The University of Queensland 
St. Lucia Queensland 4072 
Australia
Room 320 Goddard Building (8)
T: +61 7 3365 2506
http://www.uq.edu.au/~uqsblomb
email: S.Blomberg1_at_uq.edu.au

Policies:
1.  I will NOT analyse your data for you.
2.  Your deadline is your problem.

The combination of some data and an aching desire for 
an answer does not ensure that a reasonable answer can 
be extracted from a given body of data. - John Tukey.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Count data with several numbers separated by commas

2009-04-15 Thread Xiyan Lon
Dear all,
I have a data file with 3 variables (x1, x2, x3) where variable x1
have data that consists of several numbers separated by commas.

id name  x1   x2x3
aa1011,4,52 1
aa1021,2,51 2
aa1031,2,51 1
aa1041,2,31 2
aa1051,5  2 2
aa1061,2,52 2
aa1071,2,52 1
aa1081,4,52 1
aa1091,2  1 2
aa1103,5  1 2


I want to count the number of data for each variables and make barplot
for each variables.
I know how to count for variable x2 and x3 and make barplot for x2 and
x3, but I don't know how to count data in variable x1.
Are there any trick how to count data in variable x1?
The result maybe like:

x1
1 9
2 6
3 2
4 4
5 8


x2
1 5
2 5

x3
1 4
2 6


Thank you for any help.

Xiyanlon

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with some conditions

2008-11-01 Thread David Winsemius


On Nov 1, 2008, at 3:30 AM, (Ted Harding) wrote:


On 01-Nov-08 02:51:37, David Winsemius wrote:

Do you want the count of remaining  elements which are strictly
greater than the first element?


length(which(a[1] < a[2:10]))

[1] 4

or perhaps a bit more deviously:


sum( a[1]
[1] 4


No need to be devious! Simply
 sum(a[1] < a[2:10])
# [1] 4
will do it. The reason is that when TRUE or FALSE are involved in
an arithmetic operation (which sum() is), they are cast into 1 or 0.


Agreed. I now also see that TRUE+TRUE and T+T both return 2. The  
second observation should be further warning to us newbies not to  
create variables named "T".


It's now been pointed out to me both on and off list that the +0 is  
unnecessary. I don't remember when I learned this, but it could not  
have been more than a year ago. I seem to remember that Gabor  
Grothendeick used the +0 device to convert a logical vector to a  
numeric vector. Perhaps it was for the purpose of making a matrix or  
something less necessarily arithmetical than sum() or "+".


--
David Winsemius, MD
Heritage Labs



Ted.


On Oct 31, 2008, at 7:56 PM, sandsky wrote:

Hi there,
I have a data set:

a=cbind(5,2,4,7,8,3,4,11,1,20)

I want to count # of data, satistfying a[1]


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Nov-08   Time: 07:30:17
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with some conditions

2008-11-01 Thread sandsky

David,

Yes, it is what I want. It is a great help. Thank you,

Jin


David Winsemius wrote:
> 
> Do you want the count of remaining  elements which are strictly  
> greater than the first element?
> 
>  > length(which(a[1] < a[2:10]))
> [1] 4
> 
> or perhaps a bit more deviously:
> 
>  > sum( a[1] [1] 4
> 
> -- 
> David Winsemius, MD
> Heritage Labs.
> 
> On Oct 31, 2008, at 7:56 PM, sandsky wrote:
> 
>>
>> Hi there,
>>
>> I have a data set:
>>
>> a=cbind(5,2,4,7,8,3,4,11,1,20)
>>
>> I want to count # of data, satistfying a[1]>
>> Anyone helps me solving this case?
>>
>> Thank you in advance,
>>
>>
>> Jin
>> -- 
>> View this message in context:
>> http://www.nabble.com/count-data-with-some-conditions-tp20275722p20275722.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/count-data-with-some-conditions-tp20275722p2024.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with some conditions

2008-11-01 Thread Ted Harding
On 01-Nov-08 02:51:37, David Winsemius wrote:
> Do you want the count of remaining  elements which are strictly  
> greater than the first element?
> 
>  > length(which(a[1] < a[2:10]))
> [1] 4
> 
> or perhaps a bit more deviously:
> 
>  > sum( a[1] [1] 4

No need to be devious! Simply
  sum(a[1] < a[2:10])
# [1] 4
will do it. The reason is that when TRUE or FALSE are involved in
an arithmetic operation (which sum() is), they are cast into 1 or 0.

Ted.

> On Oct 31, 2008, at 7:56 PM, sandsky wrote:
>> Hi there,
>> I have a data set:
>>
>> a=cbind(5,2,4,7,8,3,4,11,1,20)
>>
>> I want to count # of data, satistfying a[1]> Anyone helps me solving this case?
>>
>> Thank you in advance,
>> Jin


E-Mail: (Ted Harding) <[EMAIL PROTECTED]>
Fax-to-email: +44 (0)870 094 0861
Date: 01-Nov-08   Time: 07:30:17
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with some conditions

2008-10-31 Thread David Winsemius
Do you want the count of remaining  elements which are strictly  
greater than the first element?


> length(which(a[1] < a[2:10]))
[1] 4

or perhaps a bit more deviously:

> sum( a[1]

Hi there,

I have a data set:

a=cbind(5,2,4,7,8,3,4,11,1,20)

I want to count # of data, satistfying a[1]http://www.nabble.com/count-data-with-some-conditions-tp20275722p20275722.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count data with some conditions

2008-10-31 Thread sandsky

Hi there,

I have a data set:

a=cbind(5,2,4,7,8,3,4,11,1,20)

I want to count # of data, satistfying a[1]http://www.nabble.com/count-data-with-some-conditions-tp20275722p20275722.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with a specific range

2008-09-30 Thread sandsky

I am converting exact time data to interval data and generate samples via
Bootstraping. I had a quite long code to get the frequency but your help
makes it simpler. Thank you..



jholtman wrote:
> 
>> data<-c(2,6,13,26,19,25,18,11,22,25)
>> table(cut(data, breaks=c(0,10,20,30)))
> 
>  (0,10] (10,20] (20,30]
>   2   4   4
> 
> 
> On Mon, Sep 29, 2008 at 5:41 PM, sandsky <[EMAIL PROTECTED]> wrote:
>>
>> Hi there,
>>
>> The data is
>>
>> data<-c(2,6,13,26,19,25,18,11,22,25)
>>
>> I want to count data for these rages:
>>
>> [0~10]:
>> [11~20]:
>> [21-30]:
>>
>> Is anyone can help me?
>>
>> Thank you in advance
>> --
>> View this message in context:
>> http://www.nabble.com/count-data-with-a-specific-range-tp19732290p19732290.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem that you are trying to solve?
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/counting-data-elements-for-a-specific-range-tp19732290p19748345.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with a specific range

2008-09-29 Thread jim holtman
> data<-c(2,6,13,26,19,25,18,11,22,25)
> table(cut(data, breaks=c(0,10,20,30)))

 (0,10] (10,20] (20,30]
  2   4   4


On Mon, Sep 29, 2008 at 5:41 PM, sandsky <[EMAIL PROTECTED]> wrote:
>
> Hi there,
>
> The data is
>
> data<-c(2,6,13,26,19,25,18,11,22,25)
>
> I want to count data for these rages:
>
> [0~10]:
> [11~20]:
> [21-30]:
>
> Is anyone can help me?
>
> Thank you in advance
> --
> View this message in context: 
> http://www.nabble.com/count-data-with-a-specific-range-tp19732290p19732290.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count data with a specific range

2008-09-29 Thread Peter Alspach
Kia ora

?hist

in particular the breaks argument and set plot=FALSE.

HTH ...

Peter Alspach
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of sandsky
> Sent: Tuesday, 30 September 2008 10:42 a.m.
> To: r-help@r-project.org
> Subject: [R] count data with a specific range
> 
> 
> Hi there,
> 
> The data is
> 
> data<-c(2,6,13,26,19,25,18,11,22,25)
> 
> I want to count data for these rages:
> 
> [0~10]:
> [11~20]:
> [21-30]:
> 
> Is anyone can help me?
> 
> Thank you in advance
> --
> View this message in context: 
> http://www.nabble.com/count-data-with-a-specific-range-tp19732
> 290p19732290.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

The contents of this e-mail are privileged and/or confidential to the named
 recipient and are not to be used by any other person and/or organisation.
 If you have received this e-mail in error, please notify the sender and delete
 all material pertaining to this e-mail.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count data with a specific range

2008-09-29 Thread sandsky

Hi there,

The data is

data<-c(2,6,13,26,19,25,18,11,22,25)

I want to count data for these rages:

[0~10]:
[11~20]:
[21-30]:

Is anyone can help me?

Thank you in advance
-- 
View this message in context: 
http://www.nabble.com/count-data-with-a-specific-range-tp19732290p19732290.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Count data in random Forest

2008-05-06 Thread Volker Bahn

Hi Birgit,

I'm not sure that I understand your question. I'll try to answer 
anyways. Regression trees and therefore also RandomForests are invariant 
to monotonic transformations in the independent variables. There are no 
distributional assumptions for the independent variables. The dependent 
variable, however, is used to calculate the variances within the two 
groups of cases that result from a split. Therefore, it would make sense 
to have the dependent variable follow the typical distributional 
requirements of least-squares driven models such as homoscedasity, 
symmetrical distribution etc. For count data a square root 
transformation is often appropriate.


HTH

Volker

Birgit Lemcke wrote:
Hello 
R-user!


I am running R 2.7.0 on a Power Book (Tiger). (I am still R and 
statistics beginner)


I try to find the most important variables to divide my dataset as 
given in a categorical variable using randomForest.


Is randomForest() able to deal with count data?
Or is there no difference because only the ranks are used in the trees?

Thanks in advance

Birgit

Birgit Lemcke
Institut für Systematische Botanik
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
[EMAIL PROTECTED]

175 Jahre UZH
«staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
MNF-Jubiläumsevent für gross und klein.
19. April 2008, 10.00 Uhr bis 02.00 Uhr
Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Count data in random Forest

2008-05-05 Thread Birgit Lemcke

Hello R-user!

I am running R 2.7.0 on a Power Book (Tiger). (I am still R and  
statistics beginner)


I try to find the most important variables to divide my dataset as  
given in a categorical variable using randomForest.


Is randomForest() able to deal with count data?
Or is there no difference because only the ranks are used in the trees?

Thanks in advance

Birgit

Birgit Lemcke
Institut für Systematische Botanik
Zollikerstrasse 107
CH-8008 Zürich
Switzerland
Ph: +41 (0)44 634 8351
[EMAIL PROTECTED]

175 Jahre UZH
«staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
MNF-Jubiläumsevent für gross und klein.
19. April 2008, 10.00 Uhr bis 02.00 Uhr
Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.