Re: [R] Count data in random Forest
This is R-**Help**, not R- **we give you all the answers.** Please read and follow the posting guide linked below for what sorts of questions are appropriate for this list and what you should include when asking them. Incidentally, there are several packages in R that do (versions of) random forests. You can read about at least some of them in the **Random Forests** section of the Machine Learning Task View here: https://CRAN.R-project.org/view=MachineLearning Many R packages have, besides extensive Help pages, so-called "vignettes", tutorials that help you use them. If available, you should consult these before posting here. Cheers, Bert -- Bert On Thu, May 18, 2023 at 11:00 AM Suriya Kannan wrote: > > Respected Sir > Good Evening. My name is V.Suriya, I am a research scholar. Doing my Ph.D > at University of Madras, Tamil Nadu, India. I need the r code for random > forest count data. It helps me lot to complete my research work sir. > > And also need the r code for comparison of predictors with the help of > mtry, best size, best node. > > Thanks and Regards > V Suriya > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count data in random Forest
Respected Sir Good Evening. My name is V.Suriya, I am a research scholar. Doing my Ph.D at University of Madras, Tamil Nadu, India. I need the r code for random forest count data. It helps me lot to complete my research work sir. And also need the r code for comparison of predictors with the help of mtry, best size, best node. Thanks and Regards V Suriya [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data as independent variable in logistinc regression
This is not primarily an R question, although I grant you that it might intersect packages in R that do what you want. Nevertheless, I think you would do better posting on a statistical list, like stats.stackexchange.com . Maybe once you've figured out there what you want, you can come back to R to find an implementation. Cheers, Bert On Tue, Oct 2, 2012 at 9:10 AM, wrote: > > Dear R users, > > I would like to employ count data as covariates while fitting a logistic > regression model. My question is: > > do I violate any assumption of the logistic (and, more in general, of the > generalized linear) models by employing count, non-negative integer > variables as independent variables? > > I found a lot of references in the literature regarding hot to use count > data as outcome, but not as covariates; see for example the very clear > paper: "N E Breslow (1996) Generalized Linear Models: Checking Assumptions > and Strengthening Conclusions, Congresso Nazionale Societa Italiana di > Biometria, Cortona June 1995", available at > http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf. > > Loosely speaking, it seems that glm assumptions may be expressed as follows: > > iid residuals; > the link function must correctly represent the relationship among dependent > and independent variables; > absence of outliers > > Does everybody knows whether there exists any other assumption/technical > problem that may suggest to use some other type of models for dealing with > count covariates? > > Finally, please notice that my data contain relatively few samples (<100) > and that count variables' ranges can vary within 3-4 order of magnitude > (i.e. some variables has value in the range 0-10, while other variables may > have values within 0-1). > > A simple example code follows: > > ### > > #genrating simulated data > var1 = sample(0:10, 100, replace = TRUE); > var2 = sample(0:1000, 100, replace = TRUE); > var3 = sample(0:10, 100, replace = TRUE); > outcome = sample(0:1, 100, replace = TRUE); > dataset = data.frame(outcome, var1, var2, var3); > > #fitting the model > model = glm(outcome ~ ., family=binomial, data = dataset) > > #inspecting the model > print(model) > > ### > > Regards, > > -- > Vincenzo Lagani > Research Fellow > BioInformatics Laboratory > Institute of Computer Science > Foundation for Research and Technology - Hellas > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data as independent variable in logistinc regression
Dear R users, I would like to employ count data as covariates while fitting a logistic regression model. My question is: do I violate any assumption of the logistic (and, more in general, of the generalized linear) models by employing count, non-negative integer variables as independent variables? I found a lot of references in the literature regarding hot to use count data as outcome, but not as covariates; see for example the very clear paper: "N E Breslow (1996) Generalized Linear Models: Checking Assumptions and Strengthening Conclusions, Congresso Nazionale Societa Italiana di Biometria, Cortona June 1995", available at http://biostat.georgiahealth.edu/~dryu/course/stat9110spring12/land16_ref.pdf. Loosely speaking, it seems that glm assumptions may be expressed as follows: iid residuals; the link function must correctly represent the relationship among dependent and independent variables; absence of outliers Does everybody knows whether there exists any other assumption/technical problem that may suggest to use some other type of models for dealing with count covariates? Finally, please notice that my data contain relatively few samples (<100) and that count variables' ranges can vary within 3-4 order of magnitude (i.e. some variables has value in the range 0-10, while other variables may have values within 0-1). A simple example code follows: ### #genrating simulated data var1 = sample(0:10, 100, replace = TRUE); var2 = sample(0:1000, 100, replace = TRUE); var3 = sample(0:10, 100, replace = TRUE); outcome = sample(0:1, 100, replace = TRUE); dataset = data.frame(outcome, var1, var2, var3); #fitting the model model = glm(outcome ~ ., family=binomial, data = dataset) #inspecting the model print(model) ### Regards, -- Vincenzo Lagani Research Fellow BioInformatics Laboratory Institute of Computer Science Foundation for Research and Technology - Hellas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data without NA in certain time intervals and plot it
Hi, Sorry, I didn't understand your question in the first post. I saw Rui's reply and your reply that it is solved. I have another solution if it helps you. dattrial<-data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4)) dattrial_wk3<-subset(dattrial,Week==3) dattrial_wk4<-subset(dattrial,Week==4) count1<-colSums(!is.na(dattrial_wk3)) count2<-colSums(!is.na(dattrial_wk4)) dattrialnew<-data.frame(rbind(count1[1],count2[2]),Week=(rle(dattrial$Week)$values)) plot(dattrialnew$Week,dattrialnew$a,type="l",col="blue",pch=14,xlab="Week",ylab="Count") A.K. - Original Message - From: Tagmarie To: r-help@r-project.org Cc: Sent: Sunday, June 17, 2012 5:40 AM Subject: Re: [R] count data without NA in certain time intervals and plot it Thank you Arun for your time! Your idea is maybe only the first step to what I want but it was nevertheless a new tool for me and interessing to learn. I added a "week"-column to your data set: dattrial<-data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4)) I am looking for a way to count the number of rows for each week which do contain data (without NA). In the next step I want to create a graph which shows the week on the x-axis and the counted number of data for each week on the y-axis. Thank you! -- View this message in context: http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611p4633635.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data without NA in certain time intervals and plot it
Great! That works! Thank you Rui! I would have spent days (which I don't have left before handing my report in) getting there by myself! Have a great rest-weekend! -- View this message in context: http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611p4633638.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data without NA in certain time intervals and plot it
Hello, I've seen your reply to arun's reply and gave it a try. Since arun's code included more than one column, I've added another in one of the examples. # Example 1 dattrial1 <- data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4)) d1 <- split(dattrial1, dattrial1$Week) count <- sapply(d1, function(x) sum(!is.na(x$a))) count # Example 2 dattrial2 <- data.frame(a=c(1,NA,rnorm(4,10)), b=c(1,2,NA,3,4,6), Week=c(3,3,3,4,4,4)) d2 <- split(dattrial2, dattrial2$Week) count <- sapply(d2, function(x){ yes <- apply(x, 1, function(y) all(!is.na(y))) sum(yes) }) count # Works for both examples plot(names(count), count, type="b", col="red", pch=16) Hope this helps, Rui Barradas Em 16-06-2012 21:11, Tagmarie escreveu: Hello, I'm quite new to R and still spend hours trying to figure out single things so I hope nobody rolls his eyes over my question. I have a data set over time and converted it to the POSTIXct format. I added a column in the data set for the week and the month. I try to get a plot which shows the weeks on the x-axis and the number of datasets without NAs on the y-axis. That doesn't sound too difficult but I can't figure it out. Does anybody have an idea? -- View this message in context: http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data without NA in certain time intervals and plot it
Thank you Arun for your time! Your idea is maybe only the first step to what I want but it was nevertheless a new tool for me and interessing to learn. I added a "week"-column to your data set: dattrial<-data.frame(a=c(1,NA,rnorm(4,10)), Week=c(3,3,3,4,4,4)) I am looking for a way to count the number of rows for each week which do contain data (without NA). In the next step I want to create a graph which shows the week on the x-axis and the counted number of data for each week on the y-axis. Thank you! -- View this message in context: http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611p4633635.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data without NA in certain time intervals and plot it
Hi, Not quite understand the question. Do you want to select only certain columns or rows without NAs? Suppose, I have a dataset such as the one below: dattrial<-data.frame(a=c(1,NA,rnorm(4,10)),b=c(NA,NA,NA,3,4,6),c=c(sample(LETTERS[1:3],replace=TRUE), sample(LETTERS[3:5],3,replace=TRUE)),d=runif(6,0.4)) # to eliminate the rows with NAs dattrial1<-dattrial[complete.cases(dattrial),] # to delete columns with NAs dattrial1<-dattrial[,colSums(is.na(dattrial))==0] or dattrial1<-dattrial[rowSums(is.na(dattrial))==0,] A.K. - Original Message - From: Tagmarie To: r-help@r-project.org Cc: Sent: Saturday, June 16, 2012 4:11 PM Subject: [R] count data without NA in certain time intervals and plot it Hello, I'm quite new to R and still spend hours trying to figure out single things so I hope nobody rolls his eyes over my question. I have a data set over time and converted it to the POSTIXct format. I added a column in the data set for the week and the month. I try to get a plot which shows the weeks on the x-axis and the number of datasets without NAs on the y-axis. That doesn't sound too difficult but I can't figure it out. Does anybody have an idea? -- View this message in context: http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data without NA in certain time intervals and plot it
Hello, I'm quite new to R and still spend hours trying to figure out single things so I hope nobody rolls his eyes over my question. I have a data set over time and converted it to the POSTIXct format. I added a column in the data set for the week and the month. I try to get a plot which shows the weeks on the x-axis and the number of datasets without NAs on the y-axis. That doesn't sound too difficult but I can't figure it out. Does anybody have an idea? -- View this message in context: http://r.789695.n4.nabble.com/count-data-without-NA-in-certain-time-intervals-and-plot-it-tp4633611.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data
Dear Sacha, Do you revisit the same locations per site? If so, use (1|site/location) as random effect. Otherwise use just (1|site). You might want to add a crossed random effect (1|date) if you can expect an effect of phenology. Best regards, Thierry PS R-sig-mixed-models is a better list for this kind of questions. ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey > -Oorspronkelijk bericht- > Van: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] Namens Sacha Viquerat > Verzonden: vrijdag 25 februari 2011 13:16 > Aan: r-help > Onderwerp: [R] count data > > hello dear list! I wonder about the layout of my csv for my > study design: > > i have 11 different sites. > > each site had been visited 9 times. > > on each visit, 6 distinctive water parameters had been taken > ONCE on each visit (as continuous variables). > > on each visit, the fish abundance was counted using a net at > 3 different locations within the site (count data). > > I know i will have to do an lmer using the nested locations > as error term. Question is: how to organize my data, since i > have abundances from the same 3 locations per site replicate > but only one water parameter measurement per site replicate. > to give you an idea, heres the basic look so far of my csv: > > > sitelocationabundancepHno3and so on... > A1127.10.003... > A2157.10.003... > A3187.10.003... > B1117.40.004... > B287.40.004... > B3177.40.004... > A1137.20.001... > A2197.20.001... > A3217.20.001... > B196.90.002... > B256.90.002... > B326.90.002... > > i just made up the table to give an idea how the data looks > like. the goal would be to analyze fish abundance ~ water > parameters, does anyone have a suggestion? > > thanks in advance! > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data
hello dear list! I wonder about the layout of my csv for my study design: i have 11 different sites. each site had been visited 9 times. on each visit, 6 distinctive water parameters had been taken ONCE on each visit (as continuous variables). on each visit, the fish abundance was counted using a net at 3 different locations within the site (count data). I know i will have to do an lmer using the nested locations as error term. Question is: how to organize my data, since i have abundances from the same 3 locations per site replicate but only one water parameter measurement per site replicate. to give you an idea, heres the basic look so far of my csv: sitelocationabundancepHno3and so on... A1127.10.003... A2157.10.003... A3187.10.003... B1117.40.004... B287.40.004... B3177.40.004... A1137.20.001... A2197.20.001... A3217.20.001... B196.90.002... B256.90.002... B326.90.002... i just made up the table to give an idea how the data looks like. the goal would be to analyze fish abundance ~ water parameters, does anyone have a suggestion? thanks in advance! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
see ?levels eg: x <- rnorm(10) y <- cut(x,c(-10,0,10)) levels(y)<-c("-10-0","0-10") cheers Joris On Thu, Jun 24, 2010 at 4:14 AM, Yi wrote: > Yeap. It works. Just to make the result more beautiful. > > One more question. > > The interval is showns as (0,10]. > > Is there a way to change it into the format 0-10? > Thanks. > > On Wed, Jun 23, 2010 at 6:12 PM, Joris Meys wrote: >> >> see ?cut >> >> Cheers >> Joris >> >> On Thu, Jun 24, 2010 at 2:57 AM, Yi wrote: >> > I would like to prepare the data for barplot. But I only have the data >> > frame >> > now. >> > >> > x1=rnorm(10,mean=2) >> > x2=rnorm(20,mean=-1) >> > x3=rnorm(15,mean=3) >> > data=data.frame(x1,x2,x3) >> > >> > If there a way to put data within a specific range? The expected result >> > is >> > as follows: >> > range x1 x2 x3 >> > -10-0 2 5 1 (# points >> > in >> > this range) >> > 0-10 7 9 6 >> > ... >> > >> > I know the table function but I do not know how to deal with the range >> > issue. >> > >> > Thanks in advance. >> > >> > Yi >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Joris Meys >> Statistical consultant >> >> Ghent University >> Faculty of Bioscience Engineering >> Department of Applied mathematics, biometrics and process control >> >> tel : +32 9 264 59 87 >> joris.m...@ugent.be >> --- >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
see ?cut Cheers Joris On Thu, Jun 24, 2010 at 2:57 AM, Yi wrote: > I would like to prepare the data for barplot. But I only have the data frame > now. > > x1=rnorm(10,mean=2) > x2=rnorm(20,mean=-1) > x3=rnorm(15,mean=3) > data=data.frame(x1,x2,x3) > > If there a way to put data within a specific range? The expected result is > as follows: > range x1 x2 x3 > -10-0 2 5 1 (# points in > this range) > 0-10 7 9 6 > ... > > I know the table function but I do not know how to deal with the range > issue. > > Thanks in advance. > > Yi > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be --- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data with a specific range
I would like to prepare the data for barplot. But I only have the data frame now. x1=rnorm(10,mean=2) x2=rnorm(20,mean=-1) x3=rnorm(15,mean=3) data=data.frame(x1,x2,x3) If there a way to put data within a specific range? The expected result is as follows: range x1 x2x3 -10-02 5 1 (# points in this range) 0-10 7 9 6 ... I know the table function but I do not know how to deal with the range issue. Thanks in advance. Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count data categories from table
aggregate? Something like this should work although it is not very elegant. mydata <- data.frame(aa=rep(letters[1:10],2), bb=rnorm(20, 5,1)) ss <- aggregate(mydata[,2],list(aa=mydata$aa), sum) pie(ss[,2]) A more serious problem is that the results are going to be close to uninterpretable. Pie charts just are not very good for this amount of data. Have a look at the notes section of ?pie. You might want to consider using a dot.chart. --- On Fri, 7/17/09, Miroslav Nikolov wrote: > From: Miroslav Nikolov > Subject: [R] Count data categories from table > To: r-help@r-project.org > Received: Friday, July 17, 2009, 5:47 AM > > Hi there, > > I have a relatively simple question, though, I couldn't > find a solution for > it so far. I have a table with 1000 entries and columns > containing > information about different parameters for each entry. > What I want to do is group all parameters from one of the > columns [e.g. if > all 1000 entries are grouped in 30 different categories > (described as > character strings) in a second column] and have a pie chart > describing the > distribution of all 1000 entries into these 30 categories. > The problem I have is to make R count how many times each > of the 30 > categories is present in the table; then if I have them > counted (e.g. if I > have category1 - 234 times, category2 - 356 times, > etc. in a vector/table) > the rest will be easier. > > Thanx for the help in advance! > > Best, > Miro > -- > View this message in context: > http://www.nabble.com/Count-data-categories-from-table-tp24531524p24531524.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > __ Ask a question on any topic and get answers from real people. Go to Yahoo! Answers and share what you know at http://ca.answers.yahoo.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count data categories from table
On Jul 17, 2009, at 5:47 AM, Miroslav Nikolov wrote: Hi there, I have a relatively simple question, though, I couldn't find a solution for it so far. I have a table with 1000 entries and columns containing information about different parameters for each entry. What I want to do is group all parameters from one of the columns [e.g. if all 1000 entries are grouped in 30 different categories (described as character strings) in a second column] and have a pie chart describing the distribution of all 1000 entries into these 30 categories. The problem I have is to make R count how many times each of the 30 categories is present in the table; then if I have them counted (e.g. if I have category1 - 234 times, category2 - 356 times, etc. in a vector/ table) the rest will be easier. Several options are available. An example dataset would have made understanding your setup much clearer. I am having trouble parsing your natural language presentation of the problem. Perhaps these examples will help: > y <- data.frame(x=sample(LETTERS[1:5], 20, replace=TRUE) ) > table(y$x) A B C D E 3 5 2 7 3 > xtabs(~x, data=y) x A B C D E 3 5 2 7 3 > ?tapply > tapply(y$x, y$x, length) A B C D E 3 5 2 7 3 Pie charts are deprecated on this list, so it's not surprising you have difficulty finding examples, but surely you can find worked examples, nonetheless. The search sites to consult include: http://search.r-project.org/nmz.html http://addictedtor.free.fr/graphiques/ -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Count data categories from table
Hi there, I have a relatively simple question, though, I couldn't find a solution for it so far. I have a table with 1000 entries and columns containing information about different parameters for each entry. What I want to do is group all parameters from one of the columns [e.g. if all 1000 entries are grouped in 30 different categories (described as character strings) in a second column] and have a pie chart describing the distribution of all 1000 entries into these 30 categories. The problem I have is to make R count how many times each of the 30 categories is present in the table; then if I have them counted (e.g. if I have category1 - 234 times, category2 - 356 times, etc. in a vector/table) the rest will be easier. Thanx for the help in advance! Best, Miro -- View this message in context: http://www.nabble.com/Count-data-categories-from-table-tp24531524p24531524.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count data with several numbers separated by commas
It rather depends on how you have your data stored. Here is one possibility you might want to look at: > con <- textConnection(" + id_name x1 x2 x3 + aa101 1,4,5 2 1 + aa102 1,2,5 1 2 + aa103 1,2,5 1 1 + aa104 1,2,3 1 2 + aa105 1,5 2 2 + aa106 1,2,5 2 2 + aa107 1,2,5 2 1 + aa108 1,4,5 2 1 + aa109 1,2 1 2 + aa110 3,5 1 2") > > dat <- read.table(con, header = TRUE) > > x1_all <- as.numeric(unlist(strsplit(as.character(dat$x1), ","))) > > x1_all [1] 1 4 5 1 2 5 1 2 5 1 2 3 1 5 1 2 5 1 2 5 1 4 5 1 2 3 5 > table(x1_all) x1_all 1 2 3 4 5 9 6 2 2 8 > Bill Venables http://www.cmis.csiro.au/bill.venables/ -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Xiyan Lon Sent: Thursday, 16 April 2009 4:04 PM To: r-h...@stat.math.ethz.ch Subject: [R] Count data with several numbers separated by commas Dear all, I have a data file with 3 variables (x1, x2, x3) where variable x1 have data that consists of several numbers separated by commas. id name x1 x2x3 aa1011,4,52 1 aa1021,2,51 2 aa1031,2,51 1 aa1041,2,31 2 aa1051,5 2 2 aa1061,2,52 2 aa1071,2,52 1 aa1081,4,52 1 aa1091,2 1 2 aa1103,5 1 2 I want to count the number of data for each variables and make barplot for each variables. I know how to count for variable x2 and x3 and make barplot for x2 and x3, but I don't know how to count data in variable x1. Are there any trick how to count data in variable x1? The result maybe like: x1 1 9 2 6 3 2 4 4 5 8 x2 1 5 2 5 x3 1 4 2 6 Thank you for any help. Xiyanlon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count data with several numbers separated by commas
Here's a solution, though it may be overcomplicated. I assume the data frame is called "dat": vec <- unlist(lapply(strsplit(dat$x1, ","), function (x) summary(as.factor(x > table(names(vec)) 1 2 3 4 5 9 6 2 2 8 Cheers, Simon. On Thu, 2009-04-16 at 13:03 +0700, Xiyan Lon wrote: > Dear all, > I have a data file with 3 variables (x1, x2, x3) where variable x1 > have data that consists of several numbers separated by commas. > > id namex1 x2x3 > aa101 1,4,52 1 > aa102 1,2,51 2 > aa103 1,2,51 1 > aa104 1,2,31 2 > aa105 1,5 2 2 > aa106 1,2,52 2 > aa107 1,2,52 1 > aa108 1,4,52 1 > aa109 1,2 1 2 > aa110 3,5 1 2 > > > I want to count the number of data for each variables and make barplot > for each variables. > I know how to count for variable x2 and x3 and make barplot for x2 and > x3, but I don't know how to count data in variable x1. > Are there any trick how to count data in variable x1? > The result maybe like: > > x1 > 1 9 > 2 6 > 3 2 > 4 4 > 5 8 > > > x2 > 1 5 > 2 5 > > x3 > 1 4 > 2 6 > > > Thank you for any help. > > Xiyanlon > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Simon Blomberg, BSc (Hons), PhD, MAppStat. Lecturer and Consultant Statistician School of Biological Sciences The University of Queensland St. Lucia Queensland 4072 Australia Room 320 Goddard Building (8) T: +61 7 3365 2506 http://www.uq.edu.au/~uqsblomb email: S.Blomberg1_at_uq.edu.au Policies: 1. I will NOT analyse your data for you. 2. Your deadline is your problem. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. - John Tukey. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Count data with several numbers separated by commas
Dear all, I have a data file with 3 variables (x1, x2, x3) where variable x1 have data that consists of several numbers separated by commas. id name x1 x2x3 aa1011,4,52 1 aa1021,2,51 2 aa1031,2,51 1 aa1041,2,31 2 aa1051,5 2 2 aa1061,2,52 2 aa1071,2,52 1 aa1081,4,52 1 aa1091,2 1 2 aa1103,5 1 2 I want to count the number of data for each variables and make barplot for each variables. I know how to count for variable x2 and x3 and make barplot for x2 and x3, but I don't know how to count data in variable x1. Are there any trick how to count data in variable x1? The result maybe like: x1 1 9 2 6 3 2 4 4 5 8 x2 1 5 2 5 x3 1 4 2 6 Thank you for any help. Xiyanlon __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with some conditions
On Nov 1, 2008, at 3:30 AM, (Ted Harding) wrote: On 01-Nov-08 02:51:37, David Winsemius wrote: Do you want the count of remaining elements which are strictly greater than the first element? length(which(a[1] < a[2:10])) [1] 4 or perhaps a bit more deviously: sum( a[1] [1] 4 No need to be devious! Simply sum(a[1] < a[2:10]) # [1] 4 will do it. The reason is that when TRUE or FALSE are involved in an arithmetic operation (which sum() is), they are cast into 1 or 0. Agreed. I now also see that TRUE+TRUE and T+T both return 2. The second observation should be further warning to us newbies not to create variables named "T". It's now been pointed out to me both on and off list that the +0 is unnecessary. I don't remember when I learned this, but it could not have been more than a year ago. I seem to remember that Gabor Grothendeick used the +0 device to convert a logical vector to a numeric vector. Perhaps it was for the purpose of making a matrix or something less necessarily arithmetical than sum() or "+". -- David Winsemius, MD Heritage Labs Ted. On Oct 31, 2008, at 7:56 PM, sandsky wrote: Hi there, I have a data set: a=cbind(5,2,4,7,8,3,4,11,1,20) I want to count # of data, satistfying a[1] E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 01-Nov-08 Time: 07:30:17 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with some conditions
David, Yes, it is what I want. It is a great help. Thank you, Jin David Winsemius wrote: > > Do you want the count of remaining elements which are strictly > greater than the first element? > > > length(which(a[1] < a[2:10])) > [1] 4 > > or perhaps a bit more deviously: > > > sum( a[1] [1] 4 > > -- > David Winsemius, MD > Heritage Labs. > > On Oct 31, 2008, at 7:56 PM, sandsky wrote: > >> >> Hi there, >> >> I have a data set: >> >> a=cbind(5,2,4,7,8,3,4,11,1,20) >> >> I want to count # of data, satistfying a[1]> >> Anyone helps me solving this case? >> >> Thank you in advance, >> >> >> Jin >> -- >> View this message in context: >> http://www.nabble.com/count-data-with-some-conditions-tp20275722p20275722.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/count-data-with-some-conditions-tp20275722p2024.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with some conditions
On 01-Nov-08 02:51:37, David Winsemius wrote: > Do you want the count of remaining elements which are strictly > greater than the first element? > > > length(which(a[1] < a[2:10])) > [1] 4 > > or perhaps a bit more deviously: > > > sum( a[1] [1] 4 No need to be devious! Simply sum(a[1] < a[2:10]) # [1] 4 will do it. The reason is that when TRUE or FALSE are involved in an arithmetic operation (which sum() is), they are cast into 1 or 0. Ted. > On Oct 31, 2008, at 7:56 PM, sandsky wrote: >> Hi there, >> I have a data set: >> >> a=cbind(5,2,4,7,8,3,4,11,1,20) >> >> I want to count # of data, satistfying a[1]> Anyone helps me solving this case? >> >> Thank you in advance, >> Jin E-Mail: (Ted Harding) <[EMAIL PROTECTED]> Fax-to-email: +44 (0)870 094 0861 Date: 01-Nov-08 Time: 07:30:17 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with some conditions
Do you want the count of remaining elements which are strictly greater than the first element? > length(which(a[1] < a[2:10])) [1] 4 or perhaps a bit more deviously: > sum( a[1] Hi there, I have a data set: a=cbind(5,2,4,7,8,3,4,11,1,20) I want to count # of data, satistfying a[1]http://www.nabble.com/count-data-with-some-conditions-tp20275722p20275722.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data with some conditions
Hi there, I have a data set: a=cbind(5,2,4,7,8,3,4,11,1,20) I want to count # of data, satistfying a[1]http://www.nabble.com/count-data-with-some-conditions-tp20275722p20275722.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
I am converting exact time data to interval data and generate samples via Bootstraping. I had a quite long code to get the frequency but your help makes it simpler. Thank you.. jholtman wrote: > >> data<-c(2,6,13,26,19,25,18,11,22,25) >> table(cut(data, breaks=c(0,10,20,30))) > > (0,10] (10,20] (20,30] > 2 4 4 > > > On Mon, Sep 29, 2008 at 5:41 PM, sandsky <[EMAIL PROTECTED]> wrote: >> >> Hi there, >> >> The data is >> >> data<-c(2,6,13,26,19,25,18,11,22,25) >> >> I want to count data for these rages: >> >> [0~10]: >> [11~20]: >> [21-30]: >> >> Is anyone can help me? >> >> Thank you in advance >> -- >> View this message in context: >> http://www.nabble.com/count-data-with-a-specific-range-tp19732290p19732290.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/counting-data-elements-for-a-specific-range-tp19732290p19748345.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
> data<-c(2,6,13,26,19,25,18,11,22,25) > table(cut(data, breaks=c(0,10,20,30))) (0,10] (10,20] (20,30] 2 4 4 On Mon, Sep 29, 2008 at 5:41 PM, sandsky <[EMAIL PROTECTED]> wrote: > > Hi there, > > The data is > > data<-c(2,6,13,26,19,25,18,11,22,25) > > I want to count data for these rages: > > [0~10]: > [11~20]: > [21-30]: > > Is anyone can help me? > > Thank you in advance > -- > View this message in context: > http://www.nabble.com/count-data-with-a-specific-range-tp19732290p19732290.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] count data with a specific range
Kia ora ?hist in particular the breaks argument and set plot=FALSE. HTH ... Peter Alspach > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of sandsky > Sent: Tuesday, 30 September 2008 10:42 a.m. > To: r-help@r-project.org > Subject: [R] count data with a specific range > > > Hi there, > > The data is > > data<-c(2,6,13,26,19,25,18,11,22,25) > > I want to count data for these rages: > > [0~10]: > [11~20]: > [21-30]: > > Is anyone can help me? > > Thank you in advance > -- > View this message in context: > http://www.nabble.com/count-data-with-a-specific-range-tp19732 > 290p19732290.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > The contents of this e-mail are privileged and/or confidential to the named recipient and are not to be used by any other person and/or organisation. If you have received this e-mail in error, please notify the sender and delete all material pertaining to this e-mail. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] count data with a specific range
Hi there, The data is data<-c(2,6,13,26,19,25,18,11,22,25) I want to count data for these rages: [0~10]: [11~20]: [21-30]: Is anyone can help me? Thank you in advance -- View this message in context: http://www.nabble.com/count-data-with-a-specific-range-tp19732290p19732290.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Count data in random Forest
Hi Birgit, I'm not sure that I understand your question. I'll try to answer anyways. Regression trees and therefore also RandomForests are invariant to monotonic transformations in the independent variables. There are no distributional assumptions for the independent variables. The dependent variable, however, is used to calculate the variances within the two groups of cases that result from a split. Therefore, it would make sense to have the dependent variable follow the typical distributional requirements of least-squares driven models such as homoscedasity, symmetrical distribution etc. For count data a square root transformation is often appropriate. HTH Volker Birgit Lemcke wrote: Hello R-user! I am running R 2.7.0 on a Power Book (Tiger). (I am still R and statistics beginner) I try to find the most important variables to divide my dataset as given in a categorical variable using randomForest. Is randomForest() able to deal with count data? Or is there no difference because only the ranks are used in the trees? Thanks in advance Birgit Birgit Lemcke Institut für Systematische Botanik Zollikerstrasse 107 CH-8008 Zürich Switzerland Ph: +41 (0)44 634 8351 [EMAIL PROTECTED] 175 Jahre UZH «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.» MNF-Jubiläumsevent für gross und klein. 19. April 2008, 10.00 Uhr bis 02.00 Uhr Campus Irchel, Winterthurerstrasse 190, 8057 Zürich Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Count data in random Forest
Hello R-user! I am running R 2.7.0 on a Power Book (Tiger). (I am still R and statistics beginner) I try to find the most important variables to divide my dataset as given in a categorical variable using randomForest. Is randomForest() able to deal with count data? Or is there no difference because only the ranks are used in the trees? Thanks in advance Birgit Birgit Lemcke Institut für Systematische Botanik Zollikerstrasse 107 CH-8008 Zürich Switzerland Ph: +41 (0)44 634 8351 [EMAIL PROTECTED] 175 Jahre UZH «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.» MNF-Jubiläumsevent für gross und klein. 19. April 2008, 10.00 Uhr bis 02.00 Uhr Campus Irchel, Winterthurerstrasse 190, 8057 Zürich Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.