Re: [R] Why 'gbm' is not giving me error when I change the response from numeric to categorical?

2013-10-05 Thread Mary Kindall
Thanks Peter and Marc.
I am sorry, I was wrong in dichotomizing the response. Thanks for pointing
to my mistake.

However, a correct dichotomization is not helping.

Also the link that you provided is very useful and I am thinking now not to
dichotomize my values.

Thanks again




On Fri, Oct 4, 2013 at 3:50 PM, Marc Schwartz  wrote:

>
> On Oct 4, 2013, at 2:35 PM, peter dalgaard  wrote:
>
> >
> > On Oct 4, 2013, at 21:16 , Mary Kindall wrote:
> >
> >> Y[Y < mean(Y)] = 0   #My edit
> >> Y[Y >= mean(Y)] = 1  #My edit
> >
> > I have no clue about gbm, but I don't think the above does what I think
> you think it does.
> >
> > Y <- as.integer(Y >= mean(Y))
> >
> > might be closer to the mark.
>
>
> Good catch Peter! I didn't pay attention to that initially.
>
> Here is an example:
>
> set.seed(1)
> Y <- rnorm(10)
>
> > Y
>  [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684
>  [7]  0.4874291  0.7383247  0.5757814 -0.3053884
>
> > mean(Y)
> [1] 0.1322028
>
> Before changing Y:
>
> > Y[Y < mean(Y)]
> [1] -0.6264538 -0.8356286 -0.8204684 -0.3053884
>
> > Y[Y >= mean(Y)]
> [1] 0.1836433 1.5952808 0.3295078 0.4874291 0.7383247 0.5757814
>
>
> However, the incantation that Mary is using, which calculates mean(Y)
> separately in each call, results in:
>
> Y[Y < mean(Y)]  = 0
>
> > Y
>  [1] 0.000 0.1836433 0.000 1.5952808 0.3295078 0.000
>  [7] 0.4874291 0.7383247 0.5757814 0.000
>
>
> # mean(Y) is no longer the original value from above
> > mean(Y)
> [1] 0.3909967
>
>
> Thus:
>
> Y[Y >= mean(Y)]  = 1
>
> > Y
>  [1] 0.000 0.1836433 0.000 1.000 0.3295078 0.000
>  [7] 1.000 1.000 1.000 0.000
>
>
> Some of the values in Y do not change because the threshold for modifying
> the values changed as a result of the recalculation of the mean after the
> first set of values in Y have changed. As Peter noted, you don't end up
> with a dichotomous vector.
>
> Using Peter's method:
>
> Y <- as.integer(Y >= mean(Y))
> > Y
>  [1] 0 1 0 1 1 0 1 1 1 0
>
>
> That being said, the original viewpoint stands, which is to not do this
> due to loss of information.
>
> Regards,
>
> Marc Schwartz
>
>


-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Not getting any result from 'gbm'?

2013-10-05 Thread Mary Kindall
Sorry David,
The formula that I use here is

fmla = as.formula(Y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)

Thanks





On Sat, Oct 5, 2013 at 2:02 AM, David Winsemius wrote:

>
> On Oct 3, 2013, at 3:07 PM, Mary Kindall wrote:
>
> > In the reproducible example given below, why I am not getting any result
> > with generalized boosted model (gbm).  Other methods does show me the
> > desired result.
> > In the example data file (attached) example.txt, the predictors x3 and x4
> > are correlated with response Y.
> >
> >
> >
> > tmpData = read.table("Desktop/example.txt", sep="\t",header=TRUE)
> > head(tmpData)
> > fmla = getTheFormulaFromDataFrame(tmpData)
>
> > fmla = getTheFormulaFromDataFrame(tmpData)
> Error: could not find function "getTheFormulaFromDataFrame"
>
>
> > fmla
> > gbm(fmla, distribution = "bernoulli", data = tmpData) #doesn't work
> >
> > #All the following works
> > bagging(fmla,  data=tmpData, control=control, coob=TRUE)
> > rpart(fmla,  dat=tmpData, method = "class", control=control )
> > glm(fmla, family="binomial", data = tmpData)
> >
> >
> > Thanks
> >
> >
> >
> > --
> > -
> > Mary Kindall
> > Yorktown Heights, NY
> > USA
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>


-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why 'gbm' is not giving me error when I change the response from numeric to categorical?

2013-10-04 Thread Mary Kindall
This reproducible example is from the help of 'gbm' in R.

I ran the following code in R, and works fine as long as the response is
numeric.  The problem starts when I convert the response from numeric to
binary (0/1). It gives me an error.

My question is, is converting the response from numeric to binary will have
this much effect.

Help page code:

N <- 1000
X1 <- runif(N)
X2 <- 2*runif(N)
X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1])
X4 <- factor(sample(letters[1:6],N,replace=TRUE))
X5 <- factor(sample(letters[1:3],N,replace=TRUE))
X6 <- 3*runif(N)
mu <- c(-1,0,1,2)[as.numeric(X3)]

SNR <- 10 # signal-to-noise ratio
Y <- X1**1.5 + 2 * (X2**.5) + mu
sigma <- sqrt(var(Y)/SNR)
Y <- Y + rnorm(N,0,sigma)

# introduce some missing values
X1[sample(1:N,size=500)] <- NA
X4[sample(1:N,size=300)] <- NA

data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)

# fit initial model
gbm1 <-
  gbm(Y~X1+X2+X3+X4+X5+X6, # formula
  data=data,   # dataset
  var.monotone=c(0,0,0,0,0,0), # -1: monotone decrease,
  # +1: monotone increase,
  #  0: no monotone restrictions
  distribution="gaussian", # see the help for other choices
  n.trees=1000,# number of trees
  shrinkage=0.05,  # shrinkage or learning rate,
  # 0.001 to 0.1 usually work
  interaction.depth=3, # 1: additive model, 2: two-way
interactions, etc.
  bag.fraction = 0.5,  # subsampling fraction, 0.5 is probably
best
  train.fraction = 0.5,# fraction of data for training,
  # first train.fraction*N used for training
  n.minobsinnode = 10, # minimum total weight needed in each
node
  cv.folds = 3,# do 3-fold cross-validation
  keep.data=TRUE,  # keep a copy of the dataset with the
object
  verbose=FALSE)   # don't print out progress

gbm1
summary(gbm1)


Now I slightly change the response variable to make it binary.

Y[Y < mean(Y)] = 0   #My edit
Y[Y >= mean(Y)] = 1  #My edit
data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)
fmla = as.formula(factor(Y)~X1+X2+X3+X4+X5+X6) #My edit

gbm2 <-
  gbm(fmla,# formula
  data=data,   # dataset
  distribution="bernoulli", # My edit
  n.trees=1000,# number of trees
  shrinkage=0.05,  # shrinkage or learning rate,
  # 0.001 to 0.1 usually work
  interaction.depth=3, # 1: additive model, 2: two-way
interactions, etc.
  bag.fraction = 0.5,  # subsampling fraction, 0.5 is probably
best
  train.fraction = 0.5,# fraction of data for training,
  # first train.fraction*N used for training
  n.minobsinnode = 10, # minimum total weight needed in each
node
  cv.folds = 3,# do 3-fold cross-validation
  keep.data=TRUE,  # keep a copy of the dataset with the
object
  verbose=FALSE)   # don't print out progress

gbm2


> gbm2
gbm(formula = fmla, distribution = "bernoulli", data = data,
n.trees = 1000, interaction.depth = 3, n.minobsinnode = 10,
shrinkage = 0.05, bag.fraction = 0.5, train.fraction = 0.5,
cv.folds = 3, keep.data = TRUE, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
1000 iterations were performed.
The best cross-validation iteration was .
The best test-set iteration was .
Error in 1:n.trees : argument of length 0


My question is, Is binarizing the response will have so much effect that it
does not find anythin useful in the predictors?

Thanks

-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to create an ROC with three possible classes?

2013-10-02 Thread Mary Kindall
I have a toy example reproduced below in which the response variable has
three possible classes. I am trying to create an ROC but not sure how to
deal with it when there are three classes.

library(ipred)
control = rpart.control(maxdepth = 20, minsplit = 20, cp = 0.01,
maxsurrogate=2, surrogatestyle = 0, xval=25)
n <- 500; p <- 10
f <- function(x,a,b,d) return( a*(x-b)^2+d )
x1 <- runif(n/2,0,4)
y1 <- f(x1,-1,2,1.7)+runif(n/2,-1,1)
x2 <- runif(n/2,2,6)
y2 <- f(x2,1,4,-1.7)+runif(n/2,-1,1)
y <- c(rep(-1,floor(n/3)),rep(0,ceiling(n/3)), rep(1,ceiling(n/3)))
dat <- data.frame(y=factor(y),x1=c(x1,x2),x2=c(y1,y2),
matrix(rnorm(n*(p-2)),ncol=(p-2)))
names(dat)<-c("y",paste("x",1:p,sep=""))
dat

plot(dat$x1,dat$x2,pch=c(1:2)[y], col=c(1,8)[y],
 xlab=names(dat)[2],ylab=names(dat)[3])
indtrain<-sample(1:n,300,replace=FALSE)
train<-dat[indtrain,]; dim(train)
test<-dat[setdiff(1:n,indtrain),]; dim(test)
test

mod <- bagging(y~.,  data=train, control=control, coob=TRUE, nbagg=25,
keepX = TRUE)
mod
pred<-predict(mod, newdata=test[,-1],type="prob", aggregation=
"average"); pred


For two class case, I use to do the following but it is no longer valid for
three classes.

yhat <- pred[,2]
    y = test[, -1]
plot.roc(y, yhat)



Any help will be appreciated.
Thanks
-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Which regression tree algorithm to use for large data?

2013-09-13 Thread Mary Kindall
I have a dataframe with 2 million rows and approximately 200 columns /
features. Approximately 30-40% of the entries are blank. I am trying to
find important features for a binary response variable. The predictors may
be categorical or continuous.

I started with applying logistic regression, but having so much missing
entries I feel that this is not a good approach as glm discard all records
which have any item blank. So I am now looking to apply tree based
algorithms (rpart or gbm) which are capable to handle missing data in a
better way.

Since my data is too big for rpart or gbm, I decided to randomly fetch
10,000 records from original data, apply rpart on that, and keep building a
pool of important variables. However, even this 10,000 records seem to be
too much for the rpart algorithm.

What can I do in this situation? Is there any switch that I can use to make
it fast? Or it is impossible to apply rpart on my data.

I am using the following rpart command:

varimp = rpart(fmla,  dat=tmpData, method = "class")$variable.importance

Thanks

-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pvalue calculate

2012-07-22 Thread Mary Kindall
I have a value
a=300

observation (x) = sample(1:50)

How to find a p-value from this. I need to show that "a" is different fom
mean(x).
Thanks

-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] which function/method to find agreement between two

2012-01-24 Thread Mary Kindall
Hi
I have data in the following format:

itemsperson1 person2
- -  --
car  honda,toyotahonda
bikesuzuki  suzuki
pant   Lee Levis, Lee
shirt   Van_housen Hollister
house  rented rented
--



How to summarize and visualize such type of data?
OR
How can we statistically find agreement or disagreement between the two
persons?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot- using geom_point and geom_line at the same time

2012-01-17 Thread Mary Kindall
Thanks Hadley for your input.
The following code works fine now.
Thanks again


con = textConnection("inputs  var1  var2  var3
100 10 5 2
1000 20 10 4
5000 30 15 8
1 40 20 16
3 50 25 32")
 data = read.table(con, header=TRUE)
 data
 data = melt(data, id="inputs")
 g <- ggplot(data,aes(x=inputs, value, colour= variable, shape=variable))
 g <- g + geom_line(lwd=0.8)
 g <- g + geom_point()
g <- g + scale_colour_discrete('my Custom Legend')
g <- g + scale_shape_discrete("my Custom Legend")
g


-

On Tue, Jan 17, 2012 at 10:07 AM, Hadley Wickham  wrote:

> On Mon, Jan 16, 2012 at 6:05 PM, Mary Kindall 
> wrote:
> > Thanks for reply
> > I wanted to have legend name with spaces. Right now I am using the
> > following code but it produce two legends. I have to use Gimp to cut the
> > redundant legend.
>
> Your basic problem is that you're using the fill and colour
> aesthetics, but you only need colour.
>
> Hadley
>
> --
> Assistant Professor / Dobelman Family Junior Chair
> Department of Statistics / Rice University
> http://had.co.nz/
>



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot- using geom_point and geom_line at the same time

2012-01-16 Thread Mary Kindall
Thanks for reply
I wanted to have legend name with spaces. Right now I am using the
following code but it produce two legends. I have to use Gimp to cut the
redundant legend.

--
con = textConnection("inputs  var1  var2 var3
100 10 5 2
1000 20 10 4
5000 30 15 8
1 40 20 16
3 50 25 32")
 data = read.table(con, header=TRUE)
 data
 data = melt(data, id="inputs")
 g <- ggplot(data,aes(x=inputs, value, colour= variable, fill = variable,
shape=variable))
 g <- g + geom_line(lwd=0.8)
 g <- g + geom_point()
g <- g + scale_colour_discrete('my Custom Legend')
g <- g + scale_shape_discrete("my Custom Legend")
g

 -

On Mon, Jan 16, 2012 at 6:55 PM, Felipe Carrillo
wrote:

> Mary:
> Here's one way.
> ## change the variable name to whatever title you want on your legend
> data = melt(data, id="inputs",variable_name="customName")
> data
> g <- ggplot(data,aes(x=inputs, value, colour= customName, fill =
> customName,
> shape=customName))
> g <- g + geom_line(lwd=0.8)
> g <- g + geom_point()
> g <- g + scale_x_continuous(name='Number of inputs')
> g <- g + scale_y_continuous('Conversion time (sec.)')
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
> http://www.fws.gov/redbluff/rbdd_jsmp.aspx
>
>   *From:* Mary Kindall 
> *To:* r-help@r-project.org
> *Sent:* Monday, January 16, 2012 1:14 PM
> *Subject:* [R] ggplot- using geom_point and geom_line at the same time
>
> Hi
> I am plotting line chart using ggplot and want to use geom_line and
> geom_point simultaneously. I want to rename my legend but uptonow I remain
> unsuccessful.
> Someone please point what to add for renaming the legend.
> I attached my example below.
> Thanks
>
>
>
> con = textConnection("inputs  var1  var2 var3
> 100 10 5 2
> 1000 20 10 4
> 5000 30 15 8
> 1 40 20 16
> 3 50 25 32")
> data = read.table(con, header=TRUE)
> data
> data = melt(data, id="inputs")
> g <- ggplot(data,aes(x=inputs, value, colour= variable, fill = variable,
> shape=variable))
> g <- g + geom_line(lwd=0.8)
> g <- g + geom_point()
> g <- g + scale_x_continuous(name='Number of inputs')
> g <- g + scale_y_continuous('Conversion time (sec.)')
> g
>
>
>
> --
> -
> Mary Kindall
> Yorktown Heights, NY
> USA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>


-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot- using geom_point and geom_line at the same time

2012-01-16 Thread Mary Kindall
Hi
I am plotting line chart using ggplot and want to use geom_line and
geom_point simultaneously. I want to rename my legend but uptonow I remain
unsuccessful.
Someone please point what to add for renaming the legend.
I attached my example below.
Thanks



con = textConnection("inputs  var1  var2 var3
100 10 5 2
1000 20 10 4
5000 30 15 8
1 40 20 16
3 50 25 32")
 data = read.table(con, header=TRUE)
 data
 data = melt(data, id="inputs")
 g <- ggplot(data,aes(x=inputs, value, colour= variable, fill = variable,
shape=variable))
 g <- g + geom_line(lwd=0.8)
 g <- g + geom_point()
 g <- g + scale_x_continuous(name='Number of inputs')
 g <- g + scale_y_continuous('Conversion time (sec.)')
g



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot- using geom_point and geom_line at the same time

2012-01-16 Thread Mary Kindall
Hi
I am plotting line chart using ggplot and want to use geom_line and
geom_point simultaneously.
I get the plot but now I have two legends.  None of the legend is
representing the true values. I need the legend with shape and color both.
Thanks



> con = textConnection("inputs  var1var2var3+ 100   10  5   
> 2+ 1000 20  10  4+ 5000 30  15  8+ 140  20
>   16+ 3   50  25  32")> data = read.table(con, header=TRUE)> 
> data  inputs var1 var2 var3
1100   1052
2   1000   20   104
3   5000   30   158
4  1   40   20   16
5  3   50   25   32> data = melt(data, id="inputs")> data   inputs
variable value
1 100 var110
21000 var120
35000 var130
4   1 var140
5   3 var150
6 100 var2 5
71000 var210
85000 var215
9   1 var220
10  3 var225
11100 var3 2
12   1000 var3 4
13   5000 var3 8
14  1 var316
15  3 var332> g <- ggplot(data,aes(x=inputs, value,
colour=variable, fill = variable))> g <- g +
geom_point(aes(shape=variable), size=3) > g <- g +  geom_line(lwd=1) +
ylab("time") + xlab("inputs") +  labs(colour="MyLegend",  fill =
"MyLegend")> g

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relative frequency plot using ggplot or other function

2012-01-12 Thread Mary Kindall
Hi this is exactly what i am looking for but I do not like to draw as
histogram instead I want two separate plot for this data.  Something like
the ones shown in the following link. Please disregard the legends of the
following fig.


http://had.co.nz/ggplot2/graphics/55078149a733dd1a0b42a57faf847036.png

http://had.co.nz/ggplot2/graphics/90983232ced45a93d9fbbe40afffd69a.png

Thanks

On Thu, Jan 12, 2012 at 12:13 PM, Justin Haynes  wrote:

> On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote:
>
>> Hi
>> I have a data frame in the following form. There are two groups and for
>> each 'width' relative frequency for group1 and group2 is given. How to
>> plot
>> this in R using ggplot or other package.
>>
>>
>>  Width   relativeFrequency1   relativeFrequency2
>> 1   100 0.0006388783 0.02265428
>> 2   200 0.0022677303 0.02948625
>> 3   300 0.0061182673 0.01739936
>> 4   400 0.0152237225 0.02569902
>> 5   500 0.0300215262 0.03639880
>> 6   600 0.0597610250 0.07717765
>>
>>
>> Thanks
>>
>>
> not sure exactly what you're looking for but...
>
>  dat<-data.frame(width=1:6*100,**rel1=runif(6), rel2=runif(6))
>> dat.melt<-melt(dat,id.var='**width')
>> ggplot(dat.melt,aes(x=factor(**width),y=value,fill=variable))**
>> +geom_bar(stat='identity',**position='dodge')
>>
>
>
>


-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] relative frequency plot using ggplot or other function

2012-01-12 Thread Mary Kindall
Hi
I have a data frame in the following form. There are two groups and for
each 'width' relative frequency for group1 and group2 is given. How to plot
this in R using ggplot or other package.


 Width   relativeFrequency1   relativeFrequency2
1   100 0.0006388783 0.02265428
2   200 0.0022677303 0.02948625
3   300 0.0061182673 0.01739936
4   400 0.0152237225 0.02569902
5   500 0.0300215262 0.03639880
6   600 0.0597610250 0.07717765


Thanks

-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mode of frequency distribution table

2012-01-08 Thread Mary Kindall
Thanks andrija
I was wondering is there any statistical test that can give me the most
frequent continuous interval range.

Your code will also give discontinuous frequency intervals. For example, In
the code below I dont want the entry with value 400. I am interested more
in the bell shape region.

1> x = c(1,2, rep(4,3), rep(5,6), rep(6,7),rep(7,8), rep(9,7), rep(10,4),
13, 17,17,30,100,300, rep(400,10))
1> barplot(table(x))

I am looking for some test that can give me an out of any of the 4-10,
 5-9, 5-10 etc intervals.

Thanks again.








On Sun, Jan 8, 2012 at 9:37 AM, andrija djurovic wrote:

> Hi. You can do something like this:
> #find the most frequent values of x
> > t <- table(x)
> > t[t==max(t)]
> 5
> 8
> #sort table t based on frequencies
> > t[order(as.numeric(t),decreasing = TRUE)]
> x
>  5   6   4  17   1   2  13  30 100 300
>  8   5   4   2   1   1   1   1   1   1
> #extract any range from sorted table
> > t[order(as.numeric(t),decreasing = TRUE)][1:3]
> x
> 5 6 4
> 8 5 4
>
> I hope this helps.
>
> Andrija
>
>
> On Sun, Jan 8, 2012 at 1:48 PM, Mary Kindall 
> wrote:
> > In a frequency distribution table (bell shaped), how can we find the most
> > frequent range?
> > for example:
> >
> >  x = c(1,2, 4,4,4,4, 5,5,5,6,6,5,5,5,5,5,6,6,6,13, 17,17,30,100,300)
> >
> > barplot(table(x))
> >
> >
> > In the code above, which function do we use to find that the most
> > frequent value range from 4 to 6.
> >
> > Thanks.
> >
> >
> >
> > --
> > -
> > Mary Kindall
> > Yorktown Heights, NY
> > USA
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mode of frequency distribution table

2012-01-08 Thread Mary Kindall
In a frequency distribution table (bell shaped), how can we find the most
frequent range?
for example:

 x = c(1,2, 4,4,4,4, 5,5,5,6,6,5,5,5,5,5,6,6,6,13, 17,17,30,100,300)

barplot(table(x))


In the code above, which function do we use to find that the most
frequent value range from 4 to 6.

Thanks.



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (Edited) cbind alternate for data frames

2012-01-06 Thread Mary Kindall
>
> I have two dataframes and want to perform cbind and then write into a
> file. The number of entries are more than a million in both frames. R is
> taking a lot of time performing this operation.
>
> Is there any alternate way to perform cbind?
>
> x = table1[1:100,1:4]
> y = table2[1:100,3:6]
>
> z = cbind(x,y)   //hanging the machine
>
> write.table(z,'out.txt)
>
>
>
> --
> -
> Mary Kindall
> Yorktown Heights, NY
> USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cbind alternate

2012-01-06 Thread Mary Kindall
I have two one dimensional list of elements and want to perform cbind and
then write into a file. The number of entries are more than a million in
both lists. R is taking a lot of time performing this operation.

Is there any alternate way to perform cbind?

x = table1[1:100,1]
y = table2[1:100,5]

z = cbind(x,y)   //hanging the machine

write.table(z,'out.txt)



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate function

2011-12-21 Thread Mary Kindall
Hi Jim

Thanks for reply but this is not working. I think I am missing something
over here.

1> x <- cbind(c(1,2,2,2,3,4), c('a','b', 'c','d','e','f'))
1> colnames(x) = c('param', 'case1')
1> x = as.data.frame(x)
1> x
  param case1
1 1 a
2 2 b
3 2 c
4 2 d
5 3 e
6 4 f
1> x[
1+  , list(case1 = paste(x$case1, collapse = ','))
1+  , by = x$param
1+   ]
Error in `[.data.frame`(x, , list(case1 = paste(x$case1, collapse = ",")),
 :
  unused argument(s) (by = x$param)


Hi David.
Thanks a lot for your help.

1> aggregate(x$case1, x['param'], FUN = paste, collapse=",")
  param x
1 1 a
2 2 b,c,d
3 3 e
4 4 f
1>


Thanks again
M


On Wed, Dec 21, 2011 at 11:56 AM, David Winsemius wrote:

>
> On Dec 21, 2011, at 11:31 AM, jim holtman wrote:
>
>  Here is an example using 'data.table'"
>>
>>  x <- read.table(text = "param   case1
>>>
>> + 1   a
>> + 2   b
>> + 2   c
>> + 2   d
>> + 3   e
>> + 4   f", header = TRUE, as.is = TRUE)
>>
>
> And the aggregate version:
>
> > aggregate(x$case1, x["param"], FUN=paste, collapse=",")
>  param x
> 1 1 a
> 2 2 b,c,d
> 3 3 e
> 4 4 f
>
> ( Generally one uses the "[[" function for extraction, but using  "["
> returns a list which is what aggregate is designed to process as its second
> argument, whereas you would get an error with either of these:
>
> aggregate(x$case1, x$param, FUN=paste, collapse=",")
> aggregate(x$case1, x[["param"]], FUN=paste, collapse=",")
>
>  )
>
>  require(data.table)
>>> x <- data.table(x)
>>> x[
>>>
>> + , list( case1 = paste(case1, collapse = ','))
>> + , by = param
>> +  ]
>>param case1
>> [1,] 1 a
>> [2,] 2 b,c,d
>> [3,] 3 e
>> [4,] 4 f
>>
>>>
>>>
>>
>> On Wed, Dec 21, 2011 at 11:26 AM, Mary Kindall 
>> wrote:
>>
>>> Hi
>>> I have a data frame with values in following format.
>>>
>>>
>>> param   case1
>>> 1   a
>>> 2   b
>>> 2   c
>>> 2   d
>>> 3   e
>>> 4   f
>>>
>>>
>>> how to use aggregate so that it I only one row for each 'param' value.
>>>
>>> the output for the above input should be
>>>
>>> param case1
>>> 1  a
>>> 2  b,c,d
>>> 3  e
>>> 4  f
>>>
>>> Thanks
>>> M
>>>
>>>
>>>
>>> --
>>> -
>>> Mary Kindall
>>> Yorktown Heights, NY
>>> USA
>>>
>>>   [[alternative HTML version deleted]]
>>>
>>> __**
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>>> PLEASE do read the posting guide http://www.R-project.org/**
>>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>


-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregate function

2011-12-21 Thread Mary Kindall
Hi
I have a data frame with values in following format.


param   case1
1   a
2   b
2   c
2   d
3   e
4   f


how to use aggregate so that it I only one row for each 'param' value.

the output for the above input should be

param case1
1  a
2  b,c,d
3  e
4  f

Thanks
M



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table : fill missing entry with "unAvailable" [edit]

2011-11-16 Thread Mary Kindall
I am very sorry for multiple mails.

Hi R users,
I try to read a data file (tab delimited format) in which some of the
entries in a particular field are missing. Is it possible to fill the
unavailable data with 'UnAvailable' string while performing read.table()

Something like
df = read.table(DataFile, header=FALSE,  fill_missing_entry = 'unAvailable')


Or remove the the complete row in which the missing entry appear.


thanks



-- 
-----
Mary Kindall
Yorktown Heights, NY
USA




-- 
-----
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table : fill missing entry with "unAvailable" [edit]

2011-11-16 Thread Mary Kindall
Hi R users,
I try to read a data file (tab delimited format) in which some of the
entries in a particular field are missing. Is it possible to fill the
unavailable data with 'UnAvailable' string while performing read.table()

Something like
df = read.table(DataFile, header=FALSE,  fill_missing_entry = 'unAvailable')


1



thanks



-- 
-----
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.table : fill missing entry with "unAvailable"

2011-11-16 Thread Mary Kindall
Hi R users,
I try to read a data file (tab delimited format) in which some of the
entries in a particular field are missing. Is it possible to fill the
unavailable data with 'UnAvailable' string while performing read.table()

Something like

df = read.table(DataFile, header=FALSE,  fill_missing_entry = 'unAvailable')



thanks

-- 
-----
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] list.dir() function

2011-11-11 Thread Mary Kindall
Hi
I have an organism directory that contains two folders galGal3 and hg19 and
many other files.

orgDir = '/home/mary/org'

When I try to use list.dir() function, it gives me the same answer, no
matter what is the value of full.names argument.


> list.dirs(path = indexDir, full.names = FALSE)[1] "/home/mary/org"

[2] "/home/mary/org/galGal3"

[3] "/home/mary/org/hg19"
> list.dirs(path = indexDir, full.names = TRUE)

[1] "/home/mary/org"

[2] "/home/mary/org/galGal3"

[3] "/home/mary/org/hg19"


Also, It prints the directory itself which I don't want to be printed.


Why it is so? Any workaround for this problem?



Thanks


-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] download.file

2011-11-08 Thread Mary Kindall
I am downloading say 100 files from ucsc website and storing it into dest
folder.
download.file function create a file in destination folder even if the file
is not present which is something I dont want.
So I wrote if condition to remove the file if the download function has non
zero value.

Now it exits when there is an error or file not present. How can I use
"try" and "if" condition together so that the program does not exit on
error and delete the created file in destination folder.

for (i in 1: 100)
{
fileUrl = ucscfilenames[i]
if (download.file(fileUrl, destFile, 'wget' , quiet = TRUE) != 0)
{
file.remove(destFile)
}
}



thanks

-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] download files using ftp: avoid error

2011-09-16 Thread Mary Kindall
wget worked for me.
Thanks Willian and Rainer.
-M


On Fri, Sep 16, 2011 at 4:18 PM, William Dunlap  wrote:

> Wrap the call that may abort with try() or tryCatch().
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf Of Rainer
> > Schuermann
> > Sent: Friday, September 16, 2011 1:09 PM
> > To: r-help@r-project.org
> > Subject: Re: [R] download files using ftp: avoid error
> >
> > I haven't tested it thoroughly but what worked here is replacing
> > > download.file(url, destfile, quiet = FALSE)
> > with
> >
> > sys_call <- paste( "wget", url, ">", destfile, sep=" " )
> > system( sys_call )
> >
> > Program execution continues, whether or not the download from url was
> > successful. However, wget is, I believe, not available on Windows.
> >
> > Rgds,
> > Rainer
> >
> >
> > On Friday 16 September 2011 15:07:15 Mary Kindall wrote:
> > > I am planning to download a large number of files from some website. I
> am
> > > using the following script.
> > >
> > > files2down = c('aaa', 'bbb', )
> > > for (i in 1: len)
> > > {
> > > print(paste('downloading file', i, ' of total ', len));
> > > url = paste(urlPrefix, files2down[i], sep='')
> > > destfile = paste (dest, 'inDir', files2down[i], sep='/' )
> > > download.file(url, destfile, quiet = FALSE)
> > > }
> > >
> > > It works fine as long as the file is present. When the file is not
> present,
> > > it exit from loop. Is there a way to continue looping if error occurs.
> > > Thanks
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] download files using ftp: avoid error

2011-09-16 Thread Mary Kindall
I am planning to download a large number of files from some website. I am
using the following script.

files2down = c('aaa', 'bbb', )
for (i in 1: len)
{
print(paste('downloading file', i, ' of total ', len));
url = paste(urlPrefix, files2down[i], sep='')
destfile = paste (dest, 'inDir', files2down[i], sep='/' )
download.file(url, destfile, quiet = FALSE)
}

It works fine as long as the file is present. When the file is not present,
it exit from loop. Is there a way to continue looping if error occurs.
Thanks



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] installing packages systemwide

2011-08-19 Thread Mary Kindall
I installed some downloaded packages in R. I always do
$sudo R CMD INSTALL 


By default it is storing these packages into my directory
/home/mary/R/x86_64-pc-linux-gnu-library/2.13/.

However I want them to be systemwide into /usr/local/lib/R/site-library/
folder.

I tried
$sudo R
R> install.packages("anRpackage", dep=TRUE)

I did not succeed into getting them install in req folder.
Any idea?



-- 
-----
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] save and load in R

2011-06-19 Thread Mary Kindall
Thanks  Jeff and Duncan
Assign worked for me. I did not check the other methods suggested by you.  I
am repreducing my code below:

##
filenames = list.files(path = ".", pattern = '.txt$', all.files = FALSE,
full.names = TRUE, ignore.case = FALSE)  #all input files
numFiles = length(filenames)

outfilenames <- paste("./file", 1:numFiles, '.Rdata', sep="") # output files


for ( i in 1:numFiles)
{
dataFile = read.table(filenames[i], header=TRUE, sep='\t');
save(dataFile, file = outfilenames[i])   #Saving into the output files
}


newnames <- paste("file", 1:numFiles, sep="")  #output variables to load
into

for ( i in 1:numFiles)
{
load(file = outfilenames[i]);
assign(newnames[i], dataFile)   #assign into corresponding output variables;

}
#


Regards
-
M



On Sun, Jun 19, 2011 at 10:38 AM, Duncan Murdoch
wrote:

> On 11-06-19 10:26 AM, Mary Kindall wrote:
>
>> I have a list of txt files that I want to convert into .rdata R data
>> object.
>>
>> filenames
>> 1. "./file1.txt"
>> 2. "./file2.txt"
>> 3. "./file3.txt"
>> 4. "./file4.txt"
>> 5. "./file5.txt"
>> 6. "./file6.txt"
>> 7. "./file7.txt"
>> 8. "./file8.txt"
>> 9. "./file9.txt"
>> 10. "./file10.txt"
>>
>> I saved these files as
>>
>> for ( i in 1:10)
>> {
>> dataFile = read.table(filenames[i], header=TRUE, sep='\t');
>> save (dataFile, file = outfilenames[i])
>> }
>>
>> The inpt files are saves as:
>> outfilenames
>> 1. "./file1.Rdata"
>> 2. "./file2.Rdata"
>> 3. "./file3.Rdata"
>> 4. "./file4.Rdata"
>> 5. "./file5.Rdata"
>> 6. "./file6.Rdata"
>> 7. "./file7.Rdata"
>> 8. "./file8.Rdata"
>> 9. "./file9.Rdata"
>> 10. "./file10.Rdata"
>>
>>
>> Now I want to load these out files in such a way that the data is loaded
>> into a variable that is same as the file name without extension.
>>
>> file1 = load (file = './file1.Rdata')
>> file2 = load (file = './file2.Rdata')
>> file3 = load (file = './file3.Rdata')
>> file4 = load (file = './file4.Rdata')
>>
>> How can I do that.
>>
>
> When you load() a file, the variables in it are restored with the same
> names that were saved.  So you would need something like
>
> newnames <- paste("file", 1:10, sep="") # file1, file2, etc.
>
>
> for (i in 1:10) {
>  load(file=outfilenames[i]) # assuming that's still around...
>  assign(newnames[i], dataFile)
> }
>
> It would be a little simpler to use saveRDS() and readRDS() to save and
> load your files.  They don't save the object names.
>
> A more R-like version of this would be to create a list of datasets, e.g.
>
> files <- list()
>
> for (i in 1:10) {
>  load(file=outfilesnames[i])
>  files[[i]] <- dataFile
> }
>
> Then you don't end up creating 10 objects, but you can still access them
> separately.
>
> Duncan Murdoch
>
>


-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] save and load in R

2011-06-19 Thread Mary Kindall
I have a list of txt files that I want to convert into .rdata R data
object.

filenames
1. "./file1.txt"
2. "./file2.txt"
3. "./file3.txt"
4. "./file4.txt"
5. "./file5.txt"
6. "./file6.txt"
7. "./file7.txt"
8. "./file8.txt"
9. "./file9.txt"
10. "./file10.txt"

I saved these files as

for ( i in 1:10)
{
dataFile = read.table(filenames[i], header=TRUE, sep='\t');
save (dataFile, file = outfilenames[i])
}

The inpt files are saves as:
outfilenames
1. "./file1.Rdata"
2. "./file2.Rdata"
3. "./file3.Rdata"
4. "./file4.Rdata"
5. "./file5.Rdata"
6. "./file6.Rdata"
7. "./file7.Rdata"
8. "./file8.Rdata"
9. "./file9.Rdata"
10. "./file10.Rdata"


Now I want to load these out files in such a way that the data is loaded
into a variable that is same as the file name without extension.

file1 = load (file = './file1.Rdata')
file2 = load (file = './file2.Rdata')
file3 = load (file = './file3.Rdata')
file4 = load (file = './file4.Rdata')

How can I do that.
Regards
-
M

-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [Resolved] combine the data frames into comma separated list.

2011-06-14 Thread Mary Kindall
Hi
Thanks Gabor for your suggestion. I am posting the code that worked for me.


dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 =
c('aaa','bbb','ccc','aaa','ddd')));  #must be data frame
dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 =
c('','','','','')));
dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 =
c('xx','yy','zz','tt','uu')));
dataframe4 = data.frame(cbind(Src = c(3,5,'y','z','z'), Target4 =
c('xx','yy','zz','tt','uu')));
L <- list(dataframe1, dataframe2, dataframe3, dataframe4)
merge.all <- function(...) merge(..., all = TRUE)
Reduce(merge.all, lapply(L, function(x) aggregate(x[2], x[1], toString)))

 Cheers !!!

-
M


On Tue, Jun 14, 2011 at 11:27 AM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

> On Tue, Jun 14, 2011 at 11:21 AM, Mary Kindall 
> wrote:
> > I resolved it. There was a problem in type casting at some point in my
> > program.
> > Thanks again.
> > -
>
> Please post a corrected version of L for benefit of others who were
> following this.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine the data frames into comma separated list.

2011-06-13 Thread Mary Kindall
Superb Gabor,
Though I dont know what is happening, but yes it is workin and without fail.

Thanks
-
M

On Mon, Jun 13, 2011 at 8:20 PM, Gabor Grothendieck  wrote:

> On Mon, Jun 13, 2011 at 5:17 PM, Mary Kindall 
> wrote:
> > Hi R users,
> > I am new to R and am trying to merge data frames in the following way.
> > Suppose I have n data frames each with two fields. Field 1 is common
> among
> > data frames but may have different entries. Field 2 is different.
> >
> >
> > Data frame 1:
> >
> > Src   Target1
> > 1aaa
> > 1bbb
> > 1ccc
> > 2aaa
> > 3ddd
> >
> >
> > Data frame 2:
> >
> > Src   Target2
> > 2
> > 3
> > 4
> > 4
> > 4
> >
> >
> > Data frame 3:
> >
> > Src   Target3
> > 1xx
> > 3yy
> > 5zz
> > 6tt
> > 6uu
> >
> > And so on...
> >
> > I want to convert this into a data frame something similar to:
> > Src   Target1   target2
> > target3
> > 1  aaa,bbb,ccc-
> xx
> >
> > 2  aaa
>   -
> > 3  ddd
> > yy
> > 4  -,,
>   -
> >
> > 5  -
> > -zz
> > 6  -
> > -   tt,uu
> >
> >
>
> Try this where DF1, DF2 and DF3 are the data frames:
>
> L <- list(DF1, DF2, DF3)
> merge.all <- function(...) merge(..., all = TRUE)
> Reduce(merge.all, lapply(L, function(x) aggregate(x[2], x[1], toString)))
>
> The last line gives this:
>
>  Src   Target1  Target2 Target3
> 1   1 aaa, bbb, ccc   xx
> 2   2   aaa 
> 3   3   ddd   yy
> 4   4   , , 
> 5   5 zz
> 6   6 tt, uu
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine the data frames into comma separated list.

2011-06-13 Thread Mary Kindall
How??
I dont think there is any parameter that does this job.

I came up with ddply function in plyr package but having tens of dataframe
and  doing it in a for loop may not be a good idea.

ddply(test, ~ Src , colwise(paste, .(Target1)), collapse ="," );

Can you please write how it can be done by write.csv.

Or is there any efficient method that can do this for me.

dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 =
c('aaa','bbb','ccc','aaa','ddd')));
dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 =
c('','','','','')));
dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 =
c('xx','yy','zz','tt','uu')));
test = merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE,
incomparables=''), by = 'Src', all=TRUE, incomparables='')
ddply(test, ~ Src , colwise(paste, .(Target1)), collapse ="," );


Thanks







On Mon, Jun 13, 2011 at 7:14 PM, Dr. D. P. Kreil (Boku) <
david.kr...@boku.ac.at> wrote:

> ?write.csv
>
> Cheers,
> David.
>
>
> On 14 June 2011 01:07, Mary Kindall  wrote:
> > Thanks for reply.
> > The following code is working but only patially. How to get the condensed
> > values separated by comma.
> >
> > dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 =
> > c('aaa','bbb','ccc','aaa','ddd')));
> > dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 =
> > c('','','','','')));
> > dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 =
> > c('xx','yy','zz','tt','uu')));
> > merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE), by
> =
> > 'Src', all=TRUE)
> >
> >
> > 1> merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE),
> by
> > = 'Src', all=TRUE)
> >Src Target3 Target1 Target2
> > 11  xx aaa
> > 21  xx bbb
> > 31  xx ccc
> > 43  yy ddd
> > 55  zz
> > 66  tt
> > 76  uu
> > 82 aaa
> > 94
> > 10   4
> > 11   4
> >
> > Thanks
> >
> > --
> > M
> >
> >
> > On Mon, Jun 13, 2011 at 6:35 PM, Dr. D. P. Kreil (Boku)
> >  wrote:
> >>
> >> Hi, try
> >>
> >> ?merge
> >>
> >> Best,
> >> David.
> >>
> >>
> >> On 13 June 2011 23:48, Mary Kindall  wrote:
> >> > Hi R users,
> >> > I am new to R and am trying to merge data frames in the following way.
> >> > Suppose I have n data frames each with two fields. Field 1 is common
> >> > among
> >> > data frames but may have different entries. Field 2 is different.
> >> >
> >> >
> >> > Data frame 1:
> >> >
> >> > Src   Target1
> >> > 1aaa
> >> > 1bbb
> >> > 1ccc
> >> > 2aaa
> >> > 3ddd
> >> >
> >> >
> >> > Data frame 2:
> >> >
> >> > Src   Target2
> >> > 2
> >> > 3
> >> > 4
> >> > 4
> >> > 4
> >> >
> >> >
> >> > Data frame 3:
> >> >
> >> > Src   Target3
> >> > 1xx
> >> > 3yy
> >> > 5zz
> >> > 6tt
> >> > 6uu
> >> >
> >> > And so on...
> >> >
> >> > I want to convert this into a data frame something similar to:
> >> > Src   Target1   target2
> >> > target3
> >> > 1  aaa,bbb,ccc-
> >> >   xx
> >> >
> >> > 2  aaa
> >> >   -
> >> > 3  ddd
> >> > yy
> >> > 4  -,,
> >> >   -
> >> >
> >> > 5  -
> >> > -zz
> >> > 6  -
> >> > -   tt,uu
> >> >
> >> >
> >> > Basically I am trying to make a consolidated table.
> >> >
> >> > Help appreciated.
> >> > Thanks
> >> > M
> >> >
> >> >
> >> > -
> >> > Mary Kindall
> >> > Yorktown Heights
> >> > USA
> >> >
> >> >[[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-help@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >> >
> >
> >
> >
> > --
> > -
> > Mary Kindall
> > Yorktown Heights, NY
> > USA
> >
> >
>



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] combine the data frames into comma separated list.

2011-06-13 Thread Mary Kindall
Thanks for reply.
The following code is working but only patially. How to get the condensed
values separated by comma.

dataframe1 = data.frame(cbind(Src = c(1,1,1,2,3), Target1 =
c('aaa','bbb','ccc','aaa','ddd')));
dataframe2 = data.frame(cbind(Src = c(2,3,4,4,4), Target2 =
c('','','','','')));
dataframe3 = data.frame(cbind(Src = c(1,3,5,6,6), Target3 =
c('xx','yy','zz','tt','uu')));
merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE), by =
'Src', all=TRUE)


1> merge(dataframe3, merge(dataframe1,dataframe2, by = 'Src', all=TRUE), by
= 'Src', all=TRUE)
   Src Target3 Target1 Target2
11  xx aaa
21  xx bbb
31  xx ccc
43  yy ddd
55  zz
66  tt
76  uu
82 aaa
94
10   4
11   4        

Thanks

--
M


On Mon, Jun 13, 2011 at 6:35 PM, Dr. D. P. Kreil (Boku) <
david.kr...@boku.ac.at> wrote:

> Hi, try
>
> ?merge
>
> Best,
> David.
>
>
> On 13 June 2011 23:48, Mary Kindall  wrote:
> > Hi R users,
> > I am new to R and am trying to merge data frames in the following way.
> > Suppose I have n data frames each with two fields. Field 1 is common
> among
> > data frames but may have different entries. Field 2 is different.
> >
> >
> > Data frame 1:
> >
> > Src   Target1
> > 1aaa
> > 1bbb
> > 1ccc
> > 2aaa
> > 3ddd
> >
> >
> > Data frame 2:
> >
> > Src   Target2
> > 2
> > 3
> > 4
> > 4
> > 4
> >
> >
> > Data frame 3:
> >
> > Src   Target3
> > 1xx
> > 3yy
> > 5zz
> > 6tt
> > 6uu
> >
> > And so on...
> >
> > I want to convert this into a data frame something similar to:
> > Src   Target1   target2
> > target3
> > 1      aaa,bbb,ccc-
> xx
> >
> > 2  aaa
>   -
> > 3  ddd
> > yy
> > 4  -,,
>   -
> >
> > 5  -
> > -zz
> > 6  -
> > -   tt,uu
> >
> >
> > Basically I am trying to make a consolidated table.
> >
> > Help appreciated.
> > Thanks
> > M
> >
> >
> > -
> > Mary Kindall
> > Yorktown Heights
> > USA
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



-- 
-
Mary Kindall
Yorktown Heights, NY
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combine the data frames into comma separated list.

2011-06-13 Thread Mary Kindall
Hi R users,
I am new to R and am trying to merge data frames in the following way.
Suppose I have n data frames each with two fields. Field 1 is common among
data frames but may have different entries. Field 2 is different.


Data frame 1:

Src   Target1
1aaa
1bbb
1ccc
2aaa
3ddd


Data frame 2:

Src   Target2
2
3
4
4
4


Data frame 3:

Src   Target3
1xx
3yy
5zz
6tt
6uu

And so on...

I want to convert this into a data frame something similar to:
Src   Target1   target2
target3
1  aaa,bbb,ccc-   xx

2  aaa -
3  ddd
yy
4  -,, -

5  -
-zz
6  -
-   tt,uu


Basically I am trying to make a consolidated table.

Help appreciated.
Thanks
M


-
Mary Kindall
Yorktown Heights
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combine the data frames into comma separated list.

2011-06-13 Thread Mary Kindall
Hi R users,
I am new to R and am trying to merge data frames in the following way.
Suppose I have n data frames each with two fields. Field 1 is common among
data frames but may have different entries. Field 2 is different.


Data frame 1:

Src   Target1
1aaa
1bbb
1ccc
2aaa
3ddd


Data frame 2:

Src   Target2
2
3
4
4
4


Data frame 3:

Src   Target3
1xx
3yy
5zz
6tt
6uu

And so on...

I want to convert this into a data frame something similar to:
Src   Target1   target2
target3
1  aaa,bbb,ccc-   xx

2  aaa -
3  ddd
yy
4  -,, -

5  -
-zz
6  -
-   tt,uu


Basically I am trying to make a consolidated table.

Help appreciated.
Thanks
M


-
Mary Kindall
Yorktown Heights
USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.