Re: [R] Why 'gbm' is not giving me error when I change the response from numeric to categorical?
Thanks Peter and Marc. I am sorry, I was wrong in dichotomizing the response. Thanks for pointing to my mistake. However, a correct dichotomization is not helping. Also the link that you provided is very useful and I am thinking now not to dichotomize my values. Thanks again On Fri, Oct 4, 2013 at 3:50 PM, Marc Schwartz wrote: > > On Oct 4, 2013, at 2:35 PM, peter dalgaard wrote: > > > > > On Oct 4, 2013, at 21:16 , Mary Kindall wrote: > > > >> Y[Y < mean(Y)] = 0 #My edit > >> Y[Y >= mean(Y)] = 1 #My edit > > > > I have no clue about gbm, but I don't think the above does what I think > you think it does. > > > > Y <- as.integer(Y >= mean(Y)) > > > > might be closer to the mark. > > > Good catch Peter! I didn't pay attention to that initially. > > Here is an example: > > set.seed(1) > Y <- rnorm(10) > > > Y > [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684 > [7] 0.4874291 0.7383247 0.5757814 -0.3053884 > > > mean(Y) > [1] 0.1322028 > > Before changing Y: > > > Y[Y < mean(Y)] > [1] -0.6264538 -0.8356286 -0.8204684 -0.3053884 > > > Y[Y >= mean(Y)] > [1] 0.1836433 1.5952808 0.3295078 0.4874291 0.7383247 0.5757814 > > > However, the incantation that Mary is using, which calculates mean(Y) > separately in each call, results in: > > Y[Y < mean(Y)] = 0 > > > Y > [1] 0.000 0.1836433 0.000 1.5952808 0.3295078 0.000 > [7] 0.4874291 0.7383247 0.5757814 0.000 > > > # mean(Y) is no longer the original value from above > > mean(Y) > [1] 0.3909967 > > > Thus: > > Y[Y >= mean(Y)] = 1 > > > Y > [1] 0.000 0.1836433 0.000 1.000 0.3295078 0.000 > [7] 1.000 1.000 1.000 0.000 > > > Some of the values in Y do not change because the threshold for modifying > the values changed as a result of the recalculation of the mean after the > first set of values in Y have changed. As Peter noted, you don't end up > with a dichotomous vector. > > Using Peter's method: > > Y <- as.integer(Y >= mean(Y)) > > Y > [1] 0 1 0 1 1 0 1 1 1 0 > > > That being said, the original viewpoint stands, which is to not do this > due to loss of information. > > Regards, > > Marc Schwartz > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not getting any result from 'gbm'?
Sorry David, The formula that I use here is fmla = as.formula(Y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8) Thanks On Sat, Oct 5, 2013 at 2:02 AM, David Winsemius wrote: > > On Oct 3, 2013, at 3:07 PM, Mary Kindall wrote: > > > In the reproducible example given below, why I am not getting any result > > with generalized boosted model (gbm). Other methods does show me the > > desired result. > > In the example data file (attached) example.txt, the predictors x3 and x4 > > are correlated with response Y. > > > > > > > > tmpData = read.table("Desktop/example.txt", sep="\t",header=TRUE) > > head(tmpData) > > fmla = getTheFormulaFromDataFrame(tmpData) > > > fmla = getTheFormulaFromDataFrame(tmpData) > Error: could not find function "getTheFormulaFromDataFrame" > > > > fmla > > gbm(fmla, distribution = "bernoulli", data = tmpData) #doesn't work > > > > #All the following works > > bagging(fmla, data=tmpData, control=control, coob=TRUE) > > rpart(fmla, dat=tmpData, method = "class", control=control ) > > glm(fmla, family="binomial", data = tmpData) > > > > > > Thanks > > > > > > > > -- > > - > > Mary Kindall > > Yorktown Heights, NY > > USA > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why 'gbm' is not giving me error when I change the response from numeric to categorical?
This reproducible example is from the help of 'gbm' in R. I ran the following code in R, and works fine as long as the response is numeric. The problem starts when I convert the response from numeric to binary (0/1). It gives me an error. My question is, is converting the response from numeric to binary will have this much effect. Help page code: N <- 1000 X1 <- runif(N) X2 <- 2*runif(N) X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1]) X4 <- factor(sample(letters[1:6],N,replace=TRUE)) X5 <- factor(sample(letters[1:3],N,replace=TRUE)) X6 <- 3*runif(N) mu <- c(-1,0,1,2)[as.numeric(X3)] SNR <- 10 # signal-to-noise ratio Y <- X1**1.5 + 2 * (X2**.5) + mu sigma <- sqrt(var(Y)/SNR) Y <- Y + rnorm(N,0,sigma) # introduce some missing values X1[sample(1:N,size=500)] <- NA X4[sample(1:N,size=300)] <- NA data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6) # fit initial model gbm1 <- gbm(Y~X1+X2+X3+X4+X5+X6, # formula data=data, # dataset var.monotone=c(0,0,0,0,0,0), # -1: monotone decrease, # +1: monotone increase, # 0: no monotone restrictions distribution="gaussian", # see the help for other choices n.trees=1000,# number of trees shrinkage=0.05, # shrinkage or learning rate, # 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 0.5,# fraction of data for training, # first train.fraction*N used for training n.minobsinnode = 10, # minimum total weight needed in each node cv.folds = 3,# do 3-fold cross-validation keep.data=TRUE, # keep a copy of the dataset with the object verbose=FALSE) # don't print out progress gbm1 summary(gbm1) Now I slightly change the response variable to make it binary. Y[Y < mean(Y)] = 0 #My edit Y[Y >= mean(Y)] = 1 #My edit data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6) fmla = as.formula(factor(Y)~X1+X2+X3+X4+X5+X6) #My edit gbm2 <- gbm(fmla,# formula data=data, # dataset distribution="bernoulli", # My edit n.trees=1000,# number of trees shrinkage=0.05, # shrinkage or learning rate, # 0.001 to 0.1 usually work interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. bag.fraction = 0.5, # subsampling fraction, 0.5 is probably best train.fraction = 0.5,# fraction of data for training, # first train.fraction*N used for training n.minobsinnode = 10, # minimum total weight needed in each node cv.folds = 3,# do 3-fold cross-validation keep.data=TRUE, # keep a copy of the dataset with the object verbose=FALSE) # don't print out progress gbm2 > gbm2 gbm(formula = fmla, distribution = "bernoulli", data = data, n.trees = 1000, interaction.depth = 3, n.minobsinnode = 10, shrinkage = 0.05, bag.fraction = 0.5, train.fraction = 0.5, cv.folds = 3, keep.data = TRUE, verbose = FALSE) A gradient boosted model with bernoulli loss function. 1000 iterations were performed. The best cross-validation iteration was . The best test-set iteration was . Error in 1:n.trees : argument of length 0 My question is, Is binarizing the response will have so much effect that it does not find anythin useful in the predictors? Thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to create an ROC with three possible classes?
I have a toy example reproduced below in which the response variable has three possible classes. I am trying to create an ROC but not sure how to deal with it when there are three classes. library(ipred) control = rpart.control(maxdepth = 20, minsplit = 20, cp = 0.01, maxsurrogate=2, surrogatestyle = 0, xval=25) n <- 500; p <- 10 f <- function(x,a,b,d) return( a*(x-b)^2+d ) x1 <- runif(n/2,0,4) y1 <- f(x1,-1,2,1.7)+runif(n/2,-1,1) x2 <- runif(n/2,2,6) y2 <- f(x2,1,4,-1.7)+runif(n/2,-1,1) y <- c(rep(-1,floor(n/3)),rep(0,ceiling(n/3)), rep(1,ceiling(n/3))) dat <- data.frame(y=factor(y),x1=c(x1,x2),x2=c(y1,y2), matrix(rnorm(n*(p-2)),ncol=(p-2))) names(dat)<-c("y",paste("x",1:p,sep="")) dat plot(dat$x1,dat$x2,pch=c(1:2)[y], col=c(1,8)[y], xlab=names(dat)[2],ylab=names(dat)[3]) indtrain<-sample(1:n,300,replace=FALSE) train<-dat[indtrain,]; dim(train) test<-dat[setdiff(1:n,indtrain),]; dim(test) test mod <- bagging(y~., data=train, control=control, coob=TRUE, nbagg=25, keepX = TRUE) mod pred<-predict(mod, newdata=test[,-1],type="prob", aggregation= "average"); pred For two class case, I use to do the following but it is no longer valid for three classes. yhat <- pred[,2] y = test[, -1] plot.roc(y, yhat) Any help will be appreciated. Thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Which regression tree algorithm to use for large data?
I have a dataframe with 2 million rows and approximately 200 columns / features. Approximately 30-40% of the entries are blank. I am trying to find important features for a binary response variable. The predictors may be categorical or continuous. I started with applying logistic regression, but having so much missing entries I feel that this is not a good approach as glm discard all records which have any item blank. So I am now looking to apply tree based algorithms (rpart or gbm) which are capable to handle missing data in a better way. Since my data is too big for rpart or gbm, I decided to randomly fetch 10,000 records from original data, apply rpart on that, and keep building a pool of important variables. However, even this 10,000 records seem to be too much for the rpart algorithm. What can I do in this situation? Is there any switch that I can use to make it fast? Or it is impossible to apply rpart on my data. I am using the following rpart command: varimp = rpart(fmla, dat=tmpData, method = "class")$variable.importance Thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pvalue calculate
I have a value a=300 observation (x) = sample(1:50) How to find a p-value from this. I need to show that "a" is different fom mean(x). Thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] which function/method to find agreement between two
Hi I have data in the following format: itemsperson1 person2 - - -- car honda,toyotahonda bikesuzuki suzuki pant Lee Levis, Lee shirt Van_housen Hollister house rented rented -- How to summarize and visualize such type of data? OR How can we statistically find agreement or disagreement between the two persons? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot- using geom_point and geom_line at the same time
Thanks Hadley for your input. The following code works fine now. Thanks again con = textConnection("inputs var1 var2 var3 100 10 5 2 1000 20 10 4 5000 30 15 8 1 40 20 16 3 50 25 32") data = read.table(con, header=TRUE) data data = melt(data, id="inputs") g <- ggplot(data,aes(x=inputs, value, colour= variable, shape=variable)) g <- g + geom_line(lwd=0.8) g <- g + geom_point() g <- g + scale_colour_discrete('my Custom Legend') g <- g + scale_shape_discrete("my Custom Legend") g - On Tue, Jan 17, 2012 at 10:07 AM, Hadley Wickham wrote: > On Mon, Jan 16, 2012 at 6:05 PM, Mary Kindall > wrote: > > Thanks for reply > > I wanted to have legend name with spaces. Right now I am using the > > following code but it produce two legends. I have to use Gimp to cut the > > redundant legend. > > Your basic problem is that you're using the fill and colour > aesthetics, but you only need colour. > > Hadley > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot- using geom_point and geom_line at the same time
Thanks for reply I wanted to have legend name with spaces. Right now I am using the following code but it produce two legends. I have to use Gimp to cut the redundant legend. -- con = textConnection("inputs var1 var2 var3 100 10 5 2 1000 20 10 4 5000 30 15 8 1 40 20 16 3 50 25 32") data = read.table(con, header=TRUE) data data = melt(data, id="inputs") g <- ggplot(data,aes(x=inputs, value, colour= variable, fill = variable, shape=variable)) g <- g + geom_line(lwd=0.8) g <- g + geom_point() g <- g + scale_colour_discrete('my Custom Legend') g <- g + scale_shape_discrete("my Custom Legend") g - On Mon, Jan 16, 2012 at 6:55 PM, Felipe Carrillo wrote: > Mary: > Here's one way. > ## change the variable name to whatever title you want on your legend > data = melt(data, id="inputs",variable_name="customName") > data > g <- ggplot(data,aes(x=inputs, value, colour= customName, fill = > customName, > shape=customName)) > g <- g + geom_line(lwd=0.8) > g <- g + geom_point() > g <- g + scale_x_continuous(name='Number of inputs') > g <- g + scale_y_continuous('Conversion time (sec.)') > > Felipe D. Carrillo > Supervisory Fishery Biologist > Department of the Interior > US Fish & Wildlife Service > California, USA > http://www.fws.gov/redbluff/rbdd_jsmp.aspx > > *From:* Mary Kindall > *To:* r-help@r-project.org > *Sent:* Monday, January 16, 2012 1:14 PM > *Subject:* [R] ggplot- using geom_point and geom_line at the same time > > Hi > I am plotting line chart using ggplot and want to use geom_line and > geom_point simultaneously. I want to rename my legend but uptonow I remain > unsuccessful. > Someone please point what to add for renaming the legend. > I attached my example below. > Thanks > > > > con = textConnection("inputs var1 var2 var3 > 100 10 5 2 > 1000 20 10 4 > 5000 30 15 8 > 1 40 20 16 > 3 50 25 32") > data = read.table(con, header=TRUE) > data > data = melt(data, id="inputs") > g <- ggplot(data,aes(x=inputs, value, colour= variable, fill = variable, > shape=variable)) > g <- g + geom_line(lwd=0.8) > g <- g + geom_point() > g <- g + scale_x_continuous(name='Number of inputs') > g <- g + scale_y_continuous('Conversion time (sec.)') > g > > > > -- > - > Mary Kindall > Yorktown Heights, NY > USA > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot- using geom_point and geom_line at the same time
Hi I am plotting line chart using ggplot and want to use geom_line and geom_point simultaneously. I want to rename my legend but uptonow I remain unsuccessful. Someone please point what to add for renaming the legend. I attached my example below. Thanks con = textConnection("inputs var1 var2 var3 100 10 5 2 1000 20 10 4 5000 30 15 8 1 40 20 16 3 50 25 32") data = read.table(con, header=TRUE) data data = melt(data, id="inputs") g <- ggplot(data,aes(x=inputs, value, colour= variable, fill = variable, shape=variable)) g <- g + geom_line(lwd=0.8) g <- g + geom_point() g <- g + scale_x_continuous(name='Number of inputs') g <- g + scale_y_continuous('Conversion time (sec.)') g -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot- using geom_point and geom_line at the same time
Hi I am plotting line chart using ggplot and want to use geom_line and geom_point simultaneously. I get the plot but now I have two legends. None of the legend is representing the true values. I need the legend with shape and color both. Thanks > con = textConnection("inputs var1var2var3+ 100 10 5 > 2+ 1000 20 10 4+ 5000 30 15 8+ 140 20 > 16+ 3 50 25 32")> data = read.table(con, header=TRUE)> > data inputs var1 var2 var3 1100 1052 2 1000 20 104 3 5000 30 158 4 1 40 20 16 5 3 50 25 32> data = melt(data, id="inputs")> data inputs variable value 1 100 var110 21000 var120 35000 var130 4 1 var140 5 3 var150 6 100 var2 5 71000 var210 85000 var215 9 1 var220 10 3 var225 11100 var3 2 12 1000 var3 4 13 5000 var3 8 14 1 var316 15 3 var332> g <- ggplot(data,aes(x=inputs, value, colour=variable, fill = variable))> g <- g + geom_point(aes(shape=variable), size=3) > g <- g + geom_line(lwd=1) + ylab("time") + xlab("inputs") + labs(colour="MyLegend", fill = "MyLegend")> g [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] relative frequency plot using ggplot or other function
Hi this is exactly what i am looking for but I do not like to draw as histogram instead I want two separate plot for this data. Something like the ones shown in the following link. Please disregard the legends of the following fig. http://had.co.nz/ggplot2/graphics/55078149a733dd1a0b42a57faf847036.png http://had.co.nz/ggplot2/graphics/90983232ced45a93d9fbbe40afffd69a.png Thanks On Thu, Jan 12, 2012 at 12:13 PM, Justin Haynes wrote: > On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote: > >> Hi >> I have a data frame in the following form. There are two groups and for >> each 'width' relative frequency for group1 and group2 is given. How to >> plot >> this in R using ggplot or other package. >> >> >> Width relativeFrequency1 relativeFrequency2 >> 1 100 0.0006388783 0.02265428 >> 2 200 0.0022677303 0.02948625 >> 3 300 0.0061182673 0.01739936 >> 4 400 0.0152237225 0.02569902 >> 5 500 0.0300215262 0.03639880 >> 6 600 0.0597610250 0.07717765 >> >> >> Thanks >> >> > not sure exactly what you're looking for but... > > dat<-data.frame(width=1:6*100,**rel1=runif(6), rel2=runif(6)) >> dat.melt<-melt(dat,id.var='**width') >> ggplot(dat.melt,aes(x=factor(**width),y=value,fill=variable))** >> +geom_bar(stat='identity',**position='dodge') >> > > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] relative frequency plot using ggplot or other function
Hi I have a data frame in the following form. There are two groups and for each 'width' relative frequency for group1 and group2 is given. How to plot this in R using ggplot or other package. Width relativeFrequency1 relativeFrequency2 1 100 0.0006388783 0.02265428 2 200 0.0022677303 0.02948625 3 300 0.0061182673 0.01739936 4 400 0.0152237225 0.02569902 5 500 0.0300215262 0.03639880 6 600 0.0597610250 0.07717765 Thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mode of frequency distribution table
Thanks andrija I was wondering is there any statistical test that can give me the most frequent continuous interval range. Your code will also give discontinuous frequency intervals. For example, In the code below I dont want the entry with value 400. I am interested more in the bell shape region. 1> x = c(1,2, rep(4,3), rep(5,6), rep(6,7),rep(7,8), rep(9,7), rep(10,4), 13, 17,17,30,100,300, rep(400,10)) 1> barplot(table(x)) I am looking for some test that can give me an out of any of the 4-10, 5-9, 5-10 etc intervals. Thanks again. On Sun, Jan 8, 2012 at 9:37 AM, andrija djurovic wrote: > Hi. You can do something like this: > #find the most frequent values of x > > t <- table(x) > > t[t==max(t)] > 5 > 8 > #sort table t based on frequencies > > t[order(as.numeric(t),decreasing = TRUE)] > x > 5 6 4 17 1 2 13 30 100 300 > 8 5 4 2 1 1 1 1 1 1 > #extract any range from sorted table > > t[order(as.numeric(t),decreasing = TRUE)][1:3] > x > 5 6 4 > 8 5 4 > > I hope this helps. > > Andrija > > > On Sun, Jan 8, 2012 at 1:48 PM, Mary Kindall > wrote: > > In a frequency distribution table (bell shaped), how can we find the most > > frequent range? > > for example: > > > > x = c(1,2, 4,4,4,4, 5,5,5,6,6,5,5,5,5,5,6,6,6,13, 17,17,30,100,300) > > > > barplot(table(x)) > > > > > > In the code above, which function do we use to find that the most > > frequent value range from 4 to 6. > > > > Thanks. > > > > > > > > -- > > - > > Mary Kindall > > Yorktown Heights, NY > > USA > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mode of frequency distribution table
In a frequency distribution table (bell shaped), how can we find the most frequent range? for example: x = c(1,2, 4,4,4,4, 5,5,5,6,6,5,5,5,5,5,6,6,6,13, 17,17,30,100,300) barplot(table(x)) In the code above, which function do we use to find that the most frequent value range from 4 to 6. Thanks. -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (Edited) cbind alternate for data frames
> > I have two dataframes and want to perform cbind and then write into a > file. The number of entries are more than a million in both frames. R is > taking a lot of time performing this operation. > > Is there any alternate way to perform cbind? > > x = table1[1:100,1:4] > y = table2[1:100,3:6] > > z = cbind(x,y) //hanging the machine > > write.table(z,'out.txt) > > > > -- > - > Mary Kindall > Yorktown Heights, NY > USA > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cbind alternate
I have two one dimensional list of elements and want to perform cbind and then write into a file. The number of entries are more than a million in both lists. R is taking a lot of time performing this operation. Is there any alternate way to perform cbind? x = table1[1:100,1] y = table2[1:100,5] z = cbind(x,y) //hanging the machine write.table(z,'out.txt) -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate function
Hi Jim Thanks for reply but this is not working. I think I am missing something over here. 1> x <- cbind(c(1,2,2,2,3,4), c('a','b', 'c','d','e','f')) 1> colnames(x) = c('param', 'case1') 1> x = as.data.frame(x) 1> x param case1 1 1 a 2 2 b 3 2 c 4 2 d 5 3 e 6 4 f 1> x[ 1+ , list(case1 = paste(x$case1, collapse = ',')) 1+ , by = x$param 1+ ] Error in `[.data.frame`(x, , list(case1 = paste(x$case1, collapse = ",")), : unused argument(s) (by = x$param) Hi David. Thanks a lot for your help. 1> aggregate(x$case1, x['param'], FUN = paste, collapse=",") param x 1 1 a 2 2 b,c,d 3 3 e 4 4 f 1> Thanks again M On Wed, Dec 21, 2011 at 11:56 AM, David Winsemius wrote: > > On Dec 21, 2011, at 11:31 AM, jim holtman wrote: > > Here is an example using 'data.table'" >> >> x <- read.table(text = "param case1 >>> >> + 1 a >> + 2 b >> + 2 c >> + 2 d >> + 3 e >> + 4 f", header = TRUE, as.is = TRUE) >> > > And the aggregate version: > > > aggregate(x$case1, x["param"], FUN=paste, collapse=",") > param x > 1 1 a > 2 2 b,c,d > 3 3 e > 4 4 f > > ( Generally one uses the "[[" function for extraction, but using "[" > returns a list which is what aggregate is designed to process as its second > argument, whereas you would get an error with either of these: > > aggregate(x$case1, x$param, FUN=paste, collapse=",") > aggregate(x$case1, x[["param"]], FUN=paste, collapse=",") > > ) > > require(data.table) >>> x <- data.table(x) >>> x[ >>> >> + , list( case1 = paste(case1, collapse = ',')) >> + , by = param >> + ] >>param case1 >> [1,] 1 a >> [2,] 2 b,c,d >> [3,] 3 e >> [4,] 4 f >> >>> >>> >> >> On Wed, Dec 21, 2011 at 11:26 AM, Mary Kindall >> wrote: >> >>> Hi >>> I have a data frame with values in following format. >>> >>> >>> param case1 >>> 1 a >>> 2 b >>> 2 c >>> 2 d >>> 3 e >>> 4 f >>> >>> >>> how to use aggregate so that it I only one row for each 'param' value. >>> >>> the output for the above input should be >>> >>> param case1 >>> 1 a >>> 2 b,c,d >>> 3 e >>> 4 f >>> >>> Thanks >>> M >>> >>> >>> >>> -- >>> - >>> Mary Kindall >>> Yorktown Heights, NY >>> USA >>> >>> [[alternative HTML version deleted]] >>> >>> __** >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>> PLEASE do read the posting guide http://www.R-project.org/** >>> posting-guide.html <http://www.R-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > David Winsemius, MD > West Hartford, CT > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate function
Hi I have a data frame with values in following format. param case1 1 a 2 b 2 c 2 d 3 e 4 f how to use aggregate so that it I only one row for each 'param' value. the output for the above input should be param case1 1 a 2 b,c,d 3 e 4 f Thanks M -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table : fill missing entry with "unAvailable" [edit]
I am very sorry for multiple mails. Hi R users, I try to read a data file (tab delimited format) in which some of the entries in a particular field are missing. Is it possible to fill the unavailable data with 'UnAvailable' string while performing read.table() Something like df = read.table(DataFile, header=FALSE, fill_missing_entry = 'unAvailable') Or remove the the complete row in which the missing entry appear. thanks -- ----- Mary Kindall Yorktown Heights, NY USA -- ----- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table : fill missing entry with "unAvailable" [edit]
Hi R users, I try to read a data file (tab delimited format) in which some of the entries in a particular field are missing. Is it possible to fill the unavailable data with 'UnAvailable' string while performing read.table() Something like df = read.table(DataFile, header=FALSE, fill_missing_entry = 'unAvailable') 1 thanks -- ----- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table : fill missing entry with "unAvailable"
Hi R users, I try to read a data file (tab delimited format) in which some of the entries in a particular field are missing. Is it possible to fill the unavailable data with 'UnAvailable' string while performing read.table() Something like df = read.table(DataFile, header=FALSE, fill_missing_entry = 'unAvailable') thanks -- ----- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] list.dir() function
Hi I have an organism directory that contains two folders galGal3 and hg19 and many other files. orgDir = '/home/mary/org' When I try to use list.dir() function, it gives me the same answer, no matter what is the value of full.names argument. > list.dirs(path = indexDir, full.names = FALSE)[1] "/home/mary/org" [2] "/home/mary/org/galGal3" [3] "/home/mary/org/hg19" > list.dirs(path = indexDir, full.names = TRUE) [1] "/home/mary/org" [2] "/home/mary/org/galGal3" [3] "/home/mary/org/hg19" Also, It prints the directory itself which I don't want to be printed. Why it is so? Any workaround for this problem? Thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] download.file
I am downloading say 100 files from ucsc website and storing it into dest folder. download.file function create a file in destination folder even if the file is not present which is something I dont want. So I wrote if condition to remove the file if the download function has non zero value. Now it exits when there is an error or file not present. How can I use "try" and "if" condition together so that the program does not exit on error and delete the created file in destination folder. for (i in 1: 100) { fileUrl = ucscfilenames[i] if (download.file(fileUrl, destFile, 'wget' , quiet = TRUE) != 0) { file.remove(destFile) } } thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] download files using ftp: avoid error
wget worked for me. Thanks Willian and Rainer. -M On Fri, Sep 16, 2011 at 4:18 PM, William Dunlap wrote: > Wrap the call that may abort with try() or tryCatch(). > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > > -Original Message- > > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Rainer > > Schuermann > > Sent: Friday, September 16, 2011 1:09 PM > > To: r-help@r-project.org > > Subject: Re: [R] download files using ftp: avoid error > > > > I haven't tested it thoroughly but what worked here is replacing > > > download.file(url, destfile, quiet = FALSE) > > with > > > > sys_call <- paste( "wget", url, ">", destfile, sep=" " ) > > system( sys_call ) > > > > Program execution continues, whether or not the download from url was > > successful. However, wget is, I believe, not available on Windows. > > > > Rgds, > > Rainer > > > > > > On Friday 16 September 2011 15:07:15 Mary Kindall wrote: > > > I am planning to download a large number of files from some website. I > am > > > using the following script. > > > > > > files2down = c('aaa', 'bbb', ) > > > for (i in 1: len) > > > { > > > print(paste('downloading file', i, ' of total ', len)); > > > url = paste(urlPrefix, files2down[i], sep='') > > > destfile = paste (dest, 'inDir', files2down[i], sep='/' ) > > > download.file(url, destfile, quiet = FALSE) > > > } > > > > > > It works fine as long as the file is present. When the file is not > present, > > > it exit from loop. Is there a way to continue looping if error occurs. > > > Thanks > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] download files using ftp: avoid error
I am planning to download a large number of files from some website. I am using the following script. files2down = c('aaa', 'bbb', ) for (i in 1: len) { print(paste('downloading file', i, ' of total ', len)); url = paste(urlPrefix, files2down[i], sep='') destfile = paste (dest, 'inDir', files2down[i], sep='/' ) download.file(url, destfile, quiet = FALSE) } It works fine as long as the file is present. When the file is not present, it exit from loop. Is there a way to continue looping if error occurs. Thanks -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] installing packages systemwide
I installed some downloaded packages in R. I always do $sudo R CMD INSTALL By default it is storing these packages into my directory /home/mary/R/x86_64-pc-linux-gnu-library/2.13/. However I want them to be systemwide into /usr/local/lib/R/site-library/ folder. I tried $sudo R R> install.packages("anRpackage", dep=TRUE) I did not succeed into getting them install in req folder. Any idea? -- ----- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] save and load in R
Thanks Jeff and Duncan Assign worked for me. I did not check the other methods suggested by you. I am repreducing my code below: ## filenames = list.files(path = ".", pattern = '.txt$', all.files = FALSE, full.names = TRUE, ignore.case = FALSE) #all input files numFiles = length(filenames) outfilenames <- paste("./file", 1:numFiles, '.Rdata', sep="") # output files for ( i in 1:numFiles) { dataFile = read.table(filenames[i], header=TRUE, sep='\t'); save(dataFile, file = outfilenames[i]) #Saving into the output files } newnames <- paste("file", 1:numFiles, sep="") #output variables to load into for ( i in 1:numFiles) { load(file = outfilenames[i]); assign(newnames[i], dataFile) #assign into corresponding output variables; } # Regards - M On Sun, Jun 19, 2011 at 10:38 AM, Duncan Murdoch wrote: > On 11-06-19 10:26 AM, Mary Kindall wrote: > >> I have a list of txt files that I want to convert into .rdata R data >> object. >> >> filenames >> 1. "./file1.txt" >> 2. "./file2.txt" >> 3. "./file3.txt" >> 4. "./file4.txt" >> 5. "./file5.txt" >> 6. "./file6.txt" >> 7. "./file7.txt" >> 8. "./file8.txt" >> 9. "./file9.txt" >> 10. "./file10.txt" >> >> I saved these files as >> >> for ( i in 1:10) >> { >> dataFile = read.table(filenames[i], header=TRUE, sep='\t'); >> save (dataFile, file = outfilenames[i]) >> } >> >> The inpt files are saves as: >> outfilenames >> 1. "./file1.Rdata" >> 2. "./file2.Rdata" >> 3. "./file3.Rdata" >> 4. "./file4.Rdata" >> 5. "./file5.Rdata" >> 6. "./file6.Rdata" >> 7. "./file7.Rdata" >> 8. "./file8.Rdata" >> 9. "./file9.Rdata" >> 10. "./file10.Rdata" >> >> >> Now I want to load these out files in such a way that the data is loaded >> into a variable that is same as the file name without extension. >> >> file1 = load (file = './file1.Rdata') >> file2 = load (file = './file2.Rdata') >> file3 = load (file = './file3.Rdata') >> file4 = load (file = './file4.Rdata') >> >> How can I do that. >> > > When you load() a file, the variables in it are restored with the same > names that were saved. So you would need something like > > newnames <- paste("file", 1:10, sep="") # file1, file2, etc. > > > for (i in 1:10) { > load(file=outfilenames[i]) # assuming that's still around... > assign(newnames[i], dataFile) > } > > It would be a little simpler to use saveRDS() and readRDS() to save and > load your files. They don't save the object names. > > A more R-like version of this would be to create a list of datasets, e.g. > > files <- list() > > for (i in 1:10) { > load(file=outfilesnames[i]) > files[[i]] <- dataFile > } > > Then you don't end up creating 10 objects, but you can still access them > separately. > > Duncan Murdoch > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] save and load in R
I have a list of txt files that I want to convert into .rdata R data object. filenames 1. "./file1.txt" 2. "./file2.txt" 3. "./file3.txt" 4. "./file4.txt" 5. "./file5.txt" 6. "./file6.txt" 7. "./file7.txt" 8. "./file8.txt" 9. "./file9.txt" 10. "./file10.txt" I saved these files as for ( i in 1:10) { dataFile = read.table(filenames[i], header=TRUE, sep='\t'); save (dataFile, file = outfilenames[i]) } The inpt files are saves as: outfilenames 1. "./file1.Rdata" 2. "./file2.Rdata" 3. "./file3.Rdata" 4. "./file4.Rdata" 5. "./file5.Rdata" 6. "./file6.Rdata" 7. "./file7.Rdata" 8. "./file8.Rdata" 9. "./file9.Rdata" 10. "./file10.Rdata" Now I want to load these out files in such a way that the data is loaded into a variable that is same as the file name without extension. file1 = load (file = './file1.Rdata') file2 = load (file = './file2.Rdata') file3 = load (file = './file3.Rdata') file4 = load (file = './file4.Rdata') How can I do that. Regards - M -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [Resolved] combine the data frames into comma separated list.
Hi Thanks Gabor for your suggestion. I am posting the code that worked for me. dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 = c('aaa','bbb','ccc','aaa','ddd'))); #must be data frame dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 = c('','','','',''))); dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 = c('xx','yy','zz','tt','uu'))); dataframe4 = data.frame(cbind(Src = c(3,5,'y','z','z'), Target4 = c('xx','yy','zz','tt','uu'))); L <- list(dataframe1, dataframe2, dataframe3, dataframe4) merge.all <- function(...) merge(..., all = TRUE) Reduce(merge.all, lapply(L, function(x) aggregate(x[2], x[1], toString))) Cheers !!! - M On Tue, Jun 14, 2011 at 11:27 AM, Gabor Grothendieck < ggrothendi...@gmail.com> wrote: > On Tue, Jun 14, 2011 at 11:21 AM, Mary Kindall > wrote: > > I resolved it. There was a problem in type casting at some point in my > > program. > > Thanks again. > > - > > Please post a corrected version of L for benefit of others who were > following this. > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combine the data frames into comma separated list.
Superb Gabor, Though I dont know what is happening, but yes it is workin and without fail. Thanks - M On Mon, Jun 13, 2011 at 8:20 PM, Gabor Grothendieck wrote: > On Mon, Jun 13, 2011 at 5:17 PM, Mary Kindall > wrote: > > Hi R users, > > I am new to R and am trying to merge data frames in the following way. > > Suppose I have n data frames each with two fields. Field 1 is common > among > > data frames but may have different entries. Field 2 is different. > > > > > > Data frame 1: > > > > Src Target1 > > 1aaa > > 1bbb > > 1ccc > > 2aaa > > 3ddd > > > > > > Data frame 2: > > > > Src Target2 > > 2 > > 3 > > 4 > > 4 > > 4 > > > > > > Data frame 3: > > > > Src Target3 > > 1xx > > 3yy > > 5zz > > 6tt > > 6uu > > > > And so on... > > > > I want to convert this into a data frame something similar to: > > Src Target1 target2 > > target3 > > 1 aaa,bbb,ccc- > xx > > > > 2 aaa > - > > 3 ddd > > yy > > 4 -,, > - > > > > 5 - > > -zz > > 6 - > > - tt,uu > > > > > > Try this where DF1, DF2 and DF3 are the data frames: > > L <- list(DF1, DF2, DF3) > merge.all <- function(...) merge(..., all = TRUE) > Reduce(merge.all, lapply(L, function(x) aggregate(x[2], x[1], toString))) > > The last line gives this: > > Src Target1 Target2 Target3 > 1 1 aaa, bbb, ccc xx > 2 2 aaa > 3 3 ddd yy > 4 4 , , > 5 5 zz > 6 6 tt, uu > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combine the data frames into comma separated list.
How?? I dont think there is any parameter that does this job. I came up with ddply function in plyr package but having tens of dataframe and doing it in a for loop may not be a good idea. ddply(test, ~ Src , colwise(paste, .(Target1)), collapse ="," ); Can you please write how it can be done by write.csv. Or is there any efficient method that can do this for me. dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 = c('aaa','bbb','ccc','aaa','ddd'))); dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 = c('','','','',''))); dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 = c('xx','yy','zz','tt','uu'))); test = merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE, incomparables=''), by = 'Src', all=TRUE, incomparables='') ddply(test, ~ Src , colwise(paste, .(Target1)), collapse ="," ); Thanks On Mon, Jun 13, 2011 at 7:14 PM, Dr. D. P. Kreil (Boku) < david.kr...@boku.ac.at> wrote: > ?write.csv > > Cheers, > David. > > > On 14 June 2011 01:07, Mary Kindall wrote: > > Thanks for reply. > > The following code is working but only patially. How to get the condensed > > values separated by comma. > > > > dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 = > > c('aaa','bbb','ccc','aaa','ddd'))); > > dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 = > > c('','','','',''))); > > dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 = > > c('xx','yy','zz','tt','uu'))); > > merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE), by > = > > 'Src', all=TRUE) > > > > > > 1> merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE), > by > > = 'Src', all=TRUE) > >Src Target3 Target1 Target2 > > 11 xx aaa > > 21 xx bbb > > 31 xx ccc > > 43 yy ddd > > 55 zz > > 66 tt > > 76 uu > > 82 aaa > > 94 > > 10 4 > > 11 4 > > > > Thanks > > > > -- > > M > > > > > > On Mon, Jun 13, 2011 at 6:35 PM, Dr. D. P. Kreil (Boku) > > wrote: > >> > >> Hi, try > >> > >> ?merge > >> > >> Best, > >> David. > >> > >> > >> On 13 June 2011 23:48, Mary Kindall wrote: > >> > Hi R users, > >> > I am new to R and am trying to merge data frames in the following way. > >> > Suppose I have n data frames each with two fields. Field 1 is common > >> > among > >> > data frames but may have different entries. Field 2 is different. > >> > > >> > > >> > Data frame 1: > >> > > >> > Src Target1 > >> > 1aaa > >> > 1bbb > >> > 1ccc > >> > 2aaa > >> > 3ddd > >> > > >> > > >> > Data frame 2: > >> > > >> > Src Target2 > >> > 2 > >> > 3 > >> > 4 > >> > 4 > >> > 4 > >> > > >> > > >> > Data frame 3: > >> > > >> > Src Target3 > >> > 1xx > >> > 3yy > >> > 5zz > >> > 6tt > >> > 6uu > >> > > >> > And so on... > >> > > >> > I want to convert this into a data frame something similar to: > >> > Src Target1 target2 > >> > target3 > >> > 1 aaa,bbb,ccc- > >> > xx > >> > > >> > 2 aaa > >> > - > >> > 3 ddd > >> > yy > >> > 4 -,, > >> > - > >> > > >> > 5 - > >> > -zz > >> > 6 - > >> > - tt,uu > >> > > >> > > >> > Basically I am trying to make a consolidated table. > >> > > >> > Help appreciated. > >> > Thanks > >> > M > >> > > >> > > >> > - > >> > Mary Kindall > >> > Yorktown Heights > >> > USA > >> > > >> >[[alternative HTML version deleted]] > >> > > >> > __ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > > > > -- > > - > > Mary Kindall > > Yorktown Heights, NY > > USA > > > > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] combine the data frames into comma separated list.
Thanks for reply. The following code is working but only patially. How to get the condensed values separated by comma. dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 = c('aaa','bbb','ccc','aaa','ddd'))); dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 = c('','','','',''))); dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 = c('xx','yy','zz','tt','uu'))); merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE), by = 'Src', all=TRUE) 1> merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE), by = 'Src', all=TRUE) Src Target3 Target1 Target2 11 xx aaa 21 xx bbb 31 xx ccc 43 yy ddd 55 zz 66 tt 76 uu 82 aaa 94 10 4 11 4 Thanks -- M On Mon, Jun 13, 2011 at 6:35 PM, Dr. D. P. Kreil (Boku) < david.kr...@boku.ac.at> wrote: > Hi, try > > ?merge > > Best, > David. > > > On 13 June 2011 23:48, Mary Kindall wrote: > > Hi R users, > > I am new to R and am trying to merge data frames in the following way. > > Suppose I have n data frames each with two fields. Field 1 is common > among > > data frames but may have different entries. Field 2 is different. > > > > > > Data frame 1: > > > > Src Target1 > > 1aaa > > 1bbb > > 1ccc > > 2aaa > > 3ddd > > > > > > Data frame 2: > > > > Src Target2 > > 2 > > 3 > > 4 > > 4 > > 4 > > > > > > Data frame 3: > > > > Src Target3 > > 1xx > > 3yy > > 5zz > > 6tt > > 6uu > > > > And so on... > > > > I want to convert this into a data frame something similar to: > > Src Target1 target2 > > target3 > > 1 aaa,bbb,ccc- > xx > > > > 2 aaa > - > > 3 ddd > > yy > > 4 -,, > - > > > > 5 - > > -zz > > 6 - > > - tt,uu > > > > > > Basically I am trying to make a consolidated table. > > > > Help appreciated. > > Thanks > > M > > > > > > - > > Mary Kindall > > Yorktown Heights > > USA > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > -- - Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combine the data frames into comma separated list.
Hi R users, I am new to R and am trying to merge data frames in the following way. Suppose I have n data frames each with two fields. Field 1 is common among data frames but may have different entries. Field 2 is different. Data frame 1: Src Target1 1aaa 1bbb 1ccc 2aaa 3ddd Data frame 2: Src Target2 2 3 4 4 4 Data frame 3: Src Target3 1xx 3yy 5zz 6tt 6uu And so on... I want to convert this into a data frame something similar to: Src Target1 target2 target3 1 aaa,bbb,ccc- xx 2 aaa - 3 ddd yy 4 -,, - 5 - -zz 6 - - tt,uu Basically I am trying to make a consolidated table. Help appreciated. Thanks M - Mary Kindall Yorktown Heights USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] combine the data frames into comma separated list.
Hi R users, I am new to R and am trying to merge data frames in the following way. Suppose I have n data frames each with two fields. Field 1 is common among data frames but may have different entries. Field 2 is different. Data frame 1: Src Target1 1aaa 1bbb 1ccc 2aaa 3ddd Data frame 2: Src Target2 2 3 4 4 4 Data frame 3: Src Target3 1xx 3yy 5zz 6tt 6uu And so on... I want to convert this into a data frame something similar to: Src Target1 target2 target3 1 aaa,bbb,ccc- xx 2 aaa - 3 ddd yy 4 -,, - 5 - -zz 6 - - tt,uu Basically I am trying to make a consolidated table. Help appreciated. Thanks M - Mary Kindall Yorktown Heights USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.