[R] Random Forest, Giving More Importance to Some Data

2013-03-24 Thread Lorenzo Isella

Dear All,
I am using randomForest to predict the final selling price of some items.
As it often happens, I have a lot of (noisy) historical data, but the  
question is not so much about data cleaning.
The dataset for which I need to carry out some predictions are fairly  
recent sales or even some sales that will took place in the near future.
As a consequence, historical data should be somehow weighted: the older  
they are, the less they should matter for the prediction.

Any idea about how this could be achieved?
Please find below a snippet showing how I use the randomForest library (on  
a multi-core machine).

Any suggestion is appreciated.
Cheers

Lorenzo

###
rf_model - foreach(iteration=1:cores,
 ntree = rep(50, 4),
 .combine = combine,
 .packages = randomForest) %dopar%{
   sink(log.txt, append=TRUE)
   cat(paste(Starting iteration,iteration,\n))
   randomForest(trainRF,
   prices_train,   ## mtry=20,
  nodesize=5,
  ## maxnodes=140,
 importance=FALSE, do.trace=10,ntree=ntree)
###

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Parallelizing GBM

2013-03-24 Thread Lorenzo Isella

Dear All,
I am far from being a guru about parallel programming.
Most of the time, I rely or randomForest for data mining large datasets.
I would like to give a try also to the gradient boosted methods in GBM,  
but I have a need for parallelization.
I normally rely on gbm.fit for speed reasons, and I usually call it this  
way




gbm_model - gbm.fit(trainRF,prices_train,
offset = NULL,
misc = NULL,
distribution = multinomial,
w = NULL,
var.monotone = NULL,
n.trees = 50,
interaction.depth = 5,
n.minobsinnode = 10,
shrinkage = 0.001,
bag.fraction = 0.5,
nTrain = (n_train/2),
keep.data = FALSE,
verbose = TRUE,
var.names = NULL,
response.name = NULL)


Does anybody know an easy way to parallelize the model (in this case it  
means simply having 4 cores on the same machine working on the problem)?

Any suggestion is welcome.
Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parallelizing GBM

2013-03-24 Thread Max Kuhn
See this:

   https://code.google.com/p/gradientboostedmodels/issues/detail?id=3

and this:


https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel


Max


On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella lorenzo.ise...@gmail.comwrote:

 Dear All,
 I am far from being a guru about parallel programming.
 Most of the time, I rely or randomForest for data mining large datasets.
 I would like to give a try also to the gradient boosted methods in GBM,
 but I have a need for parallelization.
 I normally rely on gbm.fit for speed reasons, and I usually call it this
 way



 gbm_model - gbm.fit(trainRF,prices_train,
 offset = NULL,
 misc = NULL,
 distribution = multinomial,
 w = NULL,
 var.monotone = NULL,
 n.trees = 50,
 interaction.depth = 5,
 n.minobsinnode = 10,
 shrinkage = 0.001,
 bag.fraction = 0.5,
 nTrain = (n_train/2),
 keep.data = FALSE,
 verbose = TRUE,
 var.names = NULL,
 response.name = NULL)


 Does anybody know an easy way to parallelize the model (in this case it
 means simply having 4 cores on the same machine working on the problem)?
 Any suggestion is welcome.
 Cheers

 Lorenzo

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Max

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot

2013-03-24 Thread John Kane

   Unless you have a really large number of wells I'd just use the brute force
   approach of reading in each data set with a simple read.table or
   read.csv  like

   well1  -  read.csv(well1.csv) type of statement and repeat for each well.
   Here is a simple example that may give you an idea of how to do the boxplots
   . I have done them two ways, one using base graphics and the other using
   ggplot2.  You will probably have to install the ggplot2 package -- just
   issue the command install.packages(ggplot2)
   The base approach is initially a lot simpler but in the longer term, if you
   expect to do a lot of graphing work in R, the grid packages like ggplot2 or
   lattice seem to offer a lot more control for less actual typing, especially
   if you need publication/report quality graphics.
   ##===start code=
   set.seed(345)  #reproducable sample
 # create three sample data sets,
 well_1  -  data.frame(arsenic = rnorm(12))
 well_2  -  data.frame (arsenic = rnorm(10))
 well_3  -  data.frame (arsenic = rnorm(15))

 wells  -  rbind(well_1, well_2, well_3)  # create single data.frame

 #create an id value for each well
  well_id   -  c(rep(1,nrow(well_1)),  rep(2,  nrow(well_2)), rep(3,
   nrow(well_3)))

 #add the well identifier
 wells  -  cbind(wells , well_id)
 str(wells) # check to see what we have

 boxplot(arsenic ~ well_id, data = wells) # plot vertical boxplot
 boxplot(arsenic ~ well_id, data = wells,
   horizontal = TRUE,col=c(red,green,blue)) #horizontal box
   plot

 # vertical boxplot using ggplot2
 library(ggplot2)

 p  -  ggplot(wells, aes(as.factor(well_id), arsenic)) + geom_boxplot()
 p
 # horizontal boxplot
 p1   -  p + coord_flip()
 p1

  p2   -   ggplot(wells,  aes(as.factor(well_id),  arsenic,  fill  =
   as.factor(well_id) )) +
   geom_boxplot() + coord_flip() +
scale_fill_discrete(guide=FALSE)

   ##===end code==



   John Kane
   Kingston ON Canada

   -Original Message-
   From: annij...@gmail.com
   Sent: Sat, 23 Mar 2013 10:22:02 -0400
   To: jrkrid...@inbox.com
   Subject: Re: [R] boxplot

   Hello John,

   I apologize for the delayed response.  Yes I am referring to the same type
   of  data in the data sets.  For example, the arsenic concentrations in
   individual groundwater monitoring wells at a groundwater contaminated site,
   where one well may have 12 concentration measurements, another well has 10,
   etc.

   Thanks
   Janh
   On Fri, Mar 22, 2013 at 5:31 PM, John Kane [1]jrkrid...@inbox.com wrote:

 Hi Janh,
 When you say that you have multiple data sets of unequal sample sizes
 are you speaking of the same kind of data  For example are you speaking
 of data from a set of experiments where the variables measured are all the
 same and where when you graph them you expect the same x and y scales?
 Or are you talking about essentilly independent data sets that it makes
 sense to graph in a grid ?
 John Kane
 Kingston ON Canada

-Original Message-
From: [2]annij...@gmail.com
Sent: Fri, 22 Mar 2013 10:46:21 -0400
To: [3]dcarl...@tamu.edu
Subject: Re: [R] boxplot
   
Hello All,
   
On the subject of boxplots, I have multiple data sets of unequal sample
sizes and was wondering what would be the most efficient way to read in
the
data and plot side-by-side boxplots, with options for controlling the
orientation of the plots (i.e. vertical or horizontal) and the spacing?
Your
assistance is greatly appreciated, but please try to be explicit as I am
no
R expert.  Thanks
   
Janh
   
   
   
On Thu, Mar 21, 2013 at 9:19 AM, David L Carlson [4]dcarl...@tamu.edu
wrote:
   
Your variable loc_type combines information from two variables (loc and
type). Since you are subsetting on loc, why not just plot by type?
   
boxplot(var1~type, data[data$loc==nice,])
   
--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352
   
-Original Message-
From: [5]r-help-boun...@r-project.org [mailto:[6]r-help-bounces@r-
[7]project.org] On Behalf Of Jim Lemon
Sent: Thursday, March 21, 2013 4:05 AM
To: carol white
Cc: [8]r-h...@stat.math.ethz.ch
Subject: Re: [R] boxplot
   
On 03/21/2013 07:40 PM, carol white wrote:
Hi,
It must be an easy question but how to boxplot a subset of data:
   
data = read.table(my_data.txt, header = T)
boxplot(data$var1[data$loc == nice]~data$loc_type[data$loc ==
nice])
#in this case, i want to display only the boxplot loc == nice
#doesn't display the boxplot of only loc == nice. It also displays
loc == mice
   
Hi Carol,
It's them old factors sneakin' 

Re: [R] Parallelizing GBM

2013-03-24 Thread Lorenzo Isella

Thanks a lot for the quick answer.
However, from what I see, the parallelization affects only the  
cross-validation part in the gbm interface (but it changes nothing when  
you call gbm.fit).

Am I missing anything here?
Is there any fundamental reason why gbm.fit cannot be parallelized?

Lorenzo



On Sun, 24 Mar 2013 12:45:39 +0100, Max Kuhn mxk...@gmail.com wrote:


See this:

  https://code.google.com/p/gradientboostedmodels/issues/detail?id=3


and this:

  https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel



Max


On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella  
lorenzo.ise...@gmail.com wrote:



Dear All,

I am far from being a guru about parallel programming.

Most of the time, I rely or randomForest for data mining large datasets.

I would like to give a try also to the gradient boosted methods in GBM,  
but I have a need for parallelization.


I normally rely on gbm.fit for speed reasons, and I usually call it  
this way








gbm_model - gbm.fit(trainRF,prices_train,

offset = NULL,

misc = NULL,

distribution = multinomial,

w = NULL,

var.monotone = NULL,

n.trees = 50,

interaction.depth = 5,

n.minobsinnode = 10,

shrinkage = 0.001,

bag.fraction = 0.5,

nTrain = (n_train/2),

keep.data = FALSE,

verbose = TRUE,

var.names = NULL,

response.name = NULL)





Does anybody know an easy way to parallelize the model (in this case it  
means simply having 4 cores on the same machine working on the  
problem)?


Any suggestion is welcome.

Cheers



Lorenzo



__

R-help@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guide  
http://www.R-project.org/posting-guide.html


and provide commented, minimal, self-contained, reproducible code.





--
Max


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parallelizing GBM

2013-03-24 Thread Mxkuhn
Yes, I think the second link is a test build of a parallelized cv loop within 
gbm(). 


On Mar 24, 2013, at 9:28 AM, Lorenzo Isella lorenzo.ise...@gmail.com wrote:

 Thanks a lot for the quick answer.
 However, from what I see, the parallelization affects only the 
 cross-validation part in the gbm interface (but it changes nothing when you 
 call gbm.fit).
 Am I missing anything here?
 Is there any fundamental reason why gbm.fit cannot be parallelized?
 
 Lorenzo
 
 
 
 On Sun, 24 Mar 2013 12:45:39 +0100, Max Kuhn mxk...@gmail.com wrote:
 
 See this:
 
  https://code.google.com/p/gradientboostedmodels/issues/detail?id=3
 
 
 and this:
 
  https://code.google.com/p/gradientboostedmodels/source/browse/?name=parallel
 
 
 
 Max
 
 
 On Sun, Mar 24, 2013 at 7:31 AM, Lorenzo Isella lorenzo.ise...@gmail.com 
 wrote:
 
 Dear All,
 
 I am far from being a guru about parallel programming.
 
 Most of the time, I rely or randomForest for data mining large datasets.
 
 I would like to give a try also to the gradient boosted methods in GBM, but 
 I have a need for parallelization.
 
 I normally rely on gbm.fit for speed reasons, and I usually call it this way
 
 
 
 
 
 
 
 gbm_model - gbm.fit(trainRF,prices_train,
 
 offset = NULL,
 
 misc = NULL,
 
 distribution = multinomial,
 
 w = NULL,
 
 var.monotone = NULL,
 
 n.trees = 50,
 
 interaction.depth = 5,
 
 n.minobsinnode = 10,
 
 shrinkage = 0.001,
 
 bag.fraction = 0.5,
 
 nTrain = (n_train/2),
 
 keep.data = FALSE,
 
 verbose = TRUE,
 
 var.names = NULL,
 
 response.name = NULL)
 
 
 
 
 
 Does anybody know an easy way to parallelize the model (in this case it 
 means simply having 4 cores on the same machine working on the problem)?
 
 Any suggestion is welcome.
 
 Cheers
 
 
 
 Lorenzo
 
 
 
 __
 
 R-help@r-project.org mailing list
 
 https://stat.ethz.ch/mailman/listinfo/r-help
 
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 --
 Max

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a boxplot from a data summary

2013-03-24 Thread Josh Hall
Hi,
I'm trying to create a boxplot from the summary of a large data set and I'm
having trouble finding any way to do this.  I'm familiar with, but by no
means good at, using R, so the only two websites I've found pertaining to
this issue have been way over my head.  I was hoping for a simple set of
instructions that I could follow to produce a boxplot, in R, for three
groups of data, with 8 weighted responses.  For example, the groups are
three different professions, each asked to fill out and rank 8 statements
in order from 1-8.  Ideally these would be on one graphic output, but if
that can't be done, one output per group would be suitable.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LOOCV over SVM,KNN

2013-03-24 Thread Nicolás Sánchez
Thanks you very much! Your help has been very useful!

Regards!



2013/3/23 mxkuhn mxk...@gmail.com

 train() in caret. See

http://caret.r-forge.r-project.org/

 Also, the C5.0 function in the C50 is much more effective than J48.

 Max

 On Mar 23, 2013, at 2:57 PM, Nicolás Sánchez eni...@gmail.com wrote:

 Good afternoon.

 I would like to know if there is any function in R to do LOOCV with these
 classifiers:

 1)SVM
 2)Neural Networks
 3)C4.5 ( J48)
 4)KNN

 Thanks a lot!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating a boxplot from a data summary

2013-03-24 Thread Robert Baer

On 3/24/2013 11:39 AM, Josh Hall wrote:

Hi,
I'm trying to create a boxplot from the summary of a large data set and I'm
having trouble finding any way to do this.  I'm familiar with, but by no
means good at, using R, so the only two websites I've found pertaining to
this issue have been way over my head.  I was hoping for a simple set of
instructions that I could follow to produce a boxplot, in R, for three
groups of data, with 8 weighted responses.  For example, the groups are
three different professions, each asked to fill out and rank 8 statements
in order from 1-8.  Ideally these would be on one graphic output, but if
that can't be done, one output per group would be suitable.


Look at the help for bxp:
?bxp

# The following give you insight into a boxplot structure
bp = boxplot(list(a=rnorm(10),b = rnorm(10), c = rnorm(10)))
bp



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--

Robert W. Baer, Ph.D.
Professor of Physiology
Kirksille College of Osteopathic Medicine
A. T. Still University of Health Sciences
Kirksville, MO 63501 USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rscript does not load/capture all shell arguments

2013-03-24 Thread Paulo van Breugel
Hi,

I am working on a GRASS script (bash script), which should run a R script.
I am working on Ubuntu 12.10, with R 2.15.3 and GRASS GIS 7.0 (I am not
sure the latter isn't really relevant as the grass script is just a bash
script). The R script is evoked with a call to Rscript ($RGRASSSCRIPT is a
shell variable with the file name of the R script, the rest are variables I
want to read into R) :

Rscript --no-save --no-restore $RGRASSSCRIPT $GIS_OPT_INMAP $GIS_OPT_PRES
$GIS_OPT_ENV $GIS_OPT_OUTMAP_GLM $GIS_OPT_FSTATS $GIS_FLAG_M $GIS_OPT_SPP
$GIS_OPT_SPA $GIS_OPT_FAM $GIS_OPT_TERMS $GIS_FLAG_Q $GIS_OPT_OUTMAP_PROJ
$GIS_OPT_ENV_PROJ $GIS_OPT_KFOLD $GIS_OPT_SFOLD

In the R script, I use the commandArgs function to capture the arguments
supplied to Rscript:

args - commandArgs(trailingOnly=TRUE)

The problem is that args only contains the first 12 arguments (up to and
including $GIS_OPT_OUTPUT_PROJ).

I also tried using a call to R instead of Rscript:

R --no-save --no-restore --no-site-file --no-init-file --args
${GIS_OPT_INMAP} ${GIS_OPT_PRES} ${GIS_OPT_ENV}
${GIS_OPT_OUTMAP_GLM} ${GIS_OPT_FSTATS} ${GIS_FLAG_M}
${GIS_OPT_SPP} ${GIS_OPT_SPA} ${GIS_OPT_FAM} ${GIS_OPT_TERMS}
${GIS_FLAG_Q} ${GIS_OPT_OUTMAP_PROJ} ${GIS_OPT_ENV_PROJ}
${GIS_OPT_KFOLD} ${GIS_OPT_SFOLD}  $RGRASSSCRIPT  $LOGFILE 21

This gives me all the 15 arguments when using commandArgs (The reason I am
not using that option is because evoking the script that way gives me
another problem with R jumping back to the top at a seemingly random place
somewhere halfway the script and start to run the first lines code again -
but that is probably better left for another email).

So, my question is, I guess, are there limits on the number of arguments
that can be supplied to Rscript? Or, perhaps more likely, am I doing
something wrong?

Best wishes,

Paulo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Random Forest, Giving More Importance to Some Data

2013-03-24 Thread Wensui Liu
your question doesn't seem to specifically related to either R or random
forest. instead, it is about how to assign weights to training
observations.


On Sun, Mar 24, 2013 at 6:43 AM, Lorenzo Isella lorenzo.ise...@gmail.comwrote:

 Dear All,
 I am using randomForest to predict the final selling price of some items.
 As it often happens, I have a lot of (noisy) historical data, but the
 question is not so much about data cleaning.
 The dataset for which I need to carry out some predictions are fairly
 recent sales or even some sales that will took place in the near future.
 As a consequence, historical data should be somehow weighted: the older
 they are, the less they should matter for the prediction.
 Any idea about how this could be achieved?
 Please find below a snippet showing how I use the randomForest library (on
 a multi-core machine).
 Any suggestion is appreciated.
 Cheers

 Lorenzo

 ##**##**
 ###
 rf_model - foreach(iteration=1:cores,
  ntree = rep(50, 4),
  .combine = combine,
  .packages = randomForest) %dopar%{
sink(log.txt, append=TRUE)
cat(paste(Starting iteration,iteration,\n))
randomForest(trainRF,
prices_train,   ## mtry=20,
   nodesize=5,
   ## maxnodes=140,
  importance=FALSE, do.trace=10,ntree=ntree)
 ##**##**
 ###

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
==
WenSui Liu
Credit Risk Manager, 53 Bancorp
wensui@53.com
513-295-4370
==

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] boxplot

2013-03-24 Thread Janh Anni
Hello John,

Thank you so much for your kind assistance and the detailed descriptions.
I will play with the scripts and see which one is the easiest that serves
the purpose..

Best regards,
Janh


On Sun, Mar 24, 2013 at 7:50 AM, John Kane jrkrid...@inbox.com wrote:

 **
 Unless you have a really large number of wells I'd just use the brute
 force approach of reading in each data set with a simple read.table or
 read.csv  like

 well1  -  read.csv(well1.csv) type of statement and repeat for each well.

 Here is a simple example that may give you an idea of how to do the
 boxplots . I have done them two ways, one using base graphics and the other
 using ggplot2.  You will probably have to install the ggplot2 package --
 just issue the command install.packages(ggplot2)

 The base approach is initially a lot simpler but in the longer term, if
 you expect to do a lot of graphing work in R, the grid packages like
 ggplot2 or lattice seem to offer a lot more control for less actual typing,
 especially if you need publication/report quality graphics.

 ##===start code=
 set.seed(345)  #reproducable sample
   # create three sample data sets,
   well_1  -  data.frame(arsenic = rnorm(12))
   well_2  -  data.frame (arsenic = rnorm(10))
   well_3  -  data.frame (arsenic = rnorm(15))

   wells  -  rbind(well_1, well_2, well_3)  # create single data.frame

   #create an id value for each well
   well_id  - c(rep(1,nrow(well_1)), rep(2, nrow(well_2)), rep(3,
 nrow(well_3)))

   #add the well identifier
   wells  -  cbind(wells , well_id)
   str(wells) # check to see what we have

   boxplot(arsenic ~ well_id, data = wells) # plot vertical boxplot
   boxplot(arsenic ~ well_id, data = wells,
 horizontal = TRUE,col=c(red,green,blue)) #horizontal box
 plot

   # vertical boxplot using ggplot2
   library(ggplot2)

   p  -  ggplot(wells, aes(as.factor(well_id), arsenic)) + geom_boxplot()
   p

   # horizontal boxplot
   p1   -  p + coord_flip()
   p1

   p2  -  ggplot(wells, aes(as.factor(well_id), arsenic, fill =
 as.factor(well_id) )) +
 geom_boxplot() + coord_flip() +
  scale_fill_discrete(guide=FALSE)


 ##===end code==



 John Kane
 Kingston ON Canada


 -Original Message-
 *From:* annij...@gmail.com
 *Sent:* Sat, 23 Mar 2013 10:22:02 -0400
 *To:* jrkrid...@inbox.com
 *Subject:* Re: [R] boxplot

 Hello John,

 I apologize for the delayed response.  Yes I am referring to the same type
 of data in the data sets.  For example, the arsenic concentrations in
 individual groundwater monitoring wells at a groundwater contaminated site,
 where one well may have 12 concentration measurements, another well has 10,
 etc.

 Thanks
 Janh


 On Fri, Mar 22, 2013 at 5:31 PM, John Kane jrkrid...@inbox.com wrote:

 Hi Janh,

 When you say that you have multiple data sets of unequal sample sizes
 are you speaking of the same kind of data  For example are you speaking of
 data from a set of experiments where the variables measured are all the
 same and where when you graph them you expect the same x and y scales?

 Or are you talking about essentilly independent data sets that it makes
 sense to graph in a grid ?


 John Kane
 Kingston ON Canada


  -Original Message-
  From: annij...@gmail.com
  Sent: Fri, 22 Mar 2013 10:46:21 -0400
  To: dcarl...@tamu.edu
  Subject: Re: [R] boxplot
 
  Hello All,
 
  On the subject of boxplots, I have multiple data sets of unequal sample
  sizes and was wondering what would be the most efficient way to read in
  the
  data and plot side-by-side boxplots, with options for controlling the
  orientation of the plots (i.e. vertical or horizontal) and the spacing?
  Your
  assistance is greatly appreciated, but please try to be explicit as I am
  no
  R expert.  Thanks
 
  Janh
 
 
 
  On Thu, Mar 21, 2013 at 9:19 AM, David L Carlson dcarl...@tamu.edu
  wrote:
 
  Your variable loc_type combines information from two variables (loc and
  type). Since you are subsetting on loc, why not just plot by type?
 
  boxplot(var1~type, data[data$loc==nice,])
 
  --
  David L Carlson
  Associate Professor of Anthropology
  Texas AM University
  College Station, TX 77843-4352
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
  project.org] On Behalf Of Jim Lemon
  Sent: Thursday, March 21, 2013 4:05 AM
  To: carol white
  Cc: r-h...@stat.math.ethz.ch
  Subject: Re: [R] boxplot
 
  On 03/21/2013 07:40 PM, carol white wrote:
  Hi,
  It must be an easy question but how to boxplot a subset of data:
 
  data = read.table(my_data.txt, header = T)
  boxplot(data$var1[data$loc == nice]~data$loc_type[data$loc ==
  nice])
  #in this case, i want to display only the boxplot loc == nice
  #doesn't display the boxplot of only loc == nice. It also displays
  loc == mice
 
  Hi Carol,
  It's them old factors sneakin' up on you. 

Re: [R] Ordering a matrix by row value in R2.15

2013-03-24 Thread Pete Brecknock
fitz_ra wrote
 I know this is posted a lot, I've been through about 40 messages reading
 how to do this so let me apologize in advance because I can't get this
 operation to work unlike the many examples shown.
 
 I have a 2 row matrix 
 temp
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
 [,9][,10]
 [1,] 17.000 9.00 26.0  5.0 23.0 21.0 19.0 17.0
 10.0  63.
 [2,] 15.554 7.793718 33.29079 15.53094 20.44825 14.34443 11.83552 11.62997
 10.16019 115.2602
 
 I want to order the matrix using the second row in ascending order.  From
 the many examples (usually applied to columns) the typical solution
 appears to be: 
 temp[order(temp[2,]),]
 Error: subscript out of bounds
 
 However as you can see I get an error here.
 
 When I run this one line command:
 sort(temp[2,])
  [1]   7.793718  10.160190  11.629973  11.835520  14.344426  15.530939 
 15.553999  20.448249  33.290789
 [10] 115.260192
 
 This works but I want the matrix to update and the corresponding values of
 row 1 to switch with the sort.

Maybe consider the order function 

orig - matrix(c(10,20,30,3,1,2), nrow=2, byrow=TRUE)

new -t(apply(orig,1,function(x) x[order(orig[2,])]))

 orig
 [,1] [,2] [,3]
[1,]   10   20   30
[2,]312
 new
 [,1] [,2] [,3]
[1,]   20   30   10
[2,]123

HTH 

Pete




--
View this message in context: 
http://r.789695.n4.nabble.com/Ordering-a-matrix-by-row-value-in-R2-15-tp4662337p4662340.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integrate with vectors and varying upper limit

2013-03-24 Thread Pete Brecknock
sunny0 wrote
 I'd like to integrate vectors 't' and 'w' for log(w)/(1-t)^2 where i can
 vary the upper limit of the integral to change with each value of 't' and
 'w', and then put the output into another vector. 
 
 So, something like this...
 
 w=c(.33,.34,.56)
 t=c(.2,.5,.1)
 k-c(.3,.4,.5)
 
 integrand - function(t) {log(w)/(1-t)^2}
 integrate(integrand, lower = 0, upper = k)
 
 or maybe...
 
 integrand - function(tt) {tt}
 integrate(integrand, lower = 0, upper = k)
 
 ... with sapply or something similar to create the output vector. How can
 this be done?

Something like this?

w=c(.33,.34,.56) 
t=c(.2,.5,.1) 
k=c(.3,.4,.5) 

integrand - function(t) {log(w)/(1-t)^2} 

out - sapply(k,function(x) integrate(integrand, lower = 0, upper = x ,
subdivisions=1000))

# Output
 [,1] [,2][,3]
value-0.4017303   -0.6249136  -0.9373696  
abs.error2.798235e-05 9.17413e-05 9.209191e-05
subdivisions 32   208 91  
message  OK OKOK
call Expression   Expression  Expression  

HTH

Pete



--
View this message in context: 
http://r.789695.n4.nabble.com/Integrate-with-vectors-and-varying-upper-limit-tp4662338p4662341.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ordering a matrix by row value in R2.15

2013-03-24 Thread soon yi
or this with Pete's example

orig[,order(orig[2,])]





Pete Brecknock wrote
 
 fitz_ra wrote
 I know this is posted a lot, I've been through about 40 messages reading
 how to do this so let me apologize in advance because I can't get this
 operation to work unlike the many examples shown.
 
 I have a 2 row matrix 
 temp
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [,8] [,9][,10]
 [1,] 17.000 9.00 26.0  5.0 23.0 21.0 19.0
 17.0 10.0  63.
 [2,] 15.554 7.793718 33.29079 15.53094 20.44825 14.34443 11.83552
 11.62997 10.16019 115.2602
 
 I want to order the matrix using the second row in ascending order.  From
 the many examples (usually applied to columns) the typical solution
 appears to be: 
 temp[order(temp[2,]),]
 Error: subscript out of bounds
 
 However as you can see I get an error here.
 
 When I run this one line command:
 sort(temp[2,])
  [1]   7.793718  10.160190  11.629973  11.835520  14.344426  15.530939 
 15.553999  20.448249  33.290789
 [10] 115.260192
 
 This works but I want the matrix to update and the corresponding values
 of row 1 to switch with the sort.
 Maybe consider the order function 
 
 orig - matrix(c(10,20,30,3,1,2), nrow=2, byrow=TRUE)
 
 new -t(apply(orig,1,function(x) x[order(orig[2,])]))
 
 orig
  [,1] [,2] [,3]
 [1,]   10   20   30
 [2,]312
 new
  [,1] [,2] [,3]
 [1,]   20   30   10
 [2,]123
 
 HTH 
 
 Pete





--
View this message in context: 
http://r.789695.n4.nabble.com/Ordering-a-matrix-by-row-value-in-R2-15-tp4662337p4662342.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ordering a matrix by row value in R2.15

2013-03-24 Thread William Dunlap
fitz_ra no address
  I want to order the matrix using the second row in ascending order.  From
  the many examples (usually applied to columns) the typical solution
  appears to be:
  temp[order(temp[2,]),]
  Error: subscript out of bounds

That tries to reorder the rows of temp according the values in its second
row, which causes the error (unless you are unlucky and have more rows than
columns, in which case you silently get a wrong answer).  You want to reorder
the columns of temp, so make the output of order() the column (second)
argument to [,]:

   temp[, order(temp[2,])]
   [,1] [,2] [,3] [,4] [,5] [,6]   [,7] [,8]
  [1,] 9.00 10.0 17.0 19.0 21.0  5.0 17.000 23.0
  [2,] 7.793718 10.16019 11.62997 11.83552 14.34443 15.53094 15.554 20.44825
   [,9][,10]
  [1,] 26.0  63.
  [2,] 33.29079 115.2602

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Pete Brecknock
 Sent: Sunday, March 24, 2013 5:12 PM
 To: r-help@r-project.org
 Subject: Re: [R] Ordering a matrix by row value in R2.15
 
 fitz_ra wrote
  I know this is posted a lot, I've been through about 40 messages reading
  how to do this so let me apologize in advance because I can't get this
  operation to work unlike the many examples shown.
 
  I have a 2 row matrix
  temp
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
  [,9][,10]
  [1,] 17.000 9.00 26.0  5.0 23.0 21.0 19.0 17.0
  10.0  63.
  [2,] 15.554 7.793718 33.29079 15.53094 20.44825 14.34443 11.83552 11.62997
  10.16019 115.2602
 
  I want to order the matrix using the second row in ascending order.  From
  the many examples (usually applied to columns) the typical solution
  appears to be:
  temp[order(temp[2,]),]
  Error: subscript out of bounds
 
  However as you can see I get an error here.
 
  When I run this one line command:
  sort(temp[2,])
   [1]   7.793718  10.160190  11.629973  11.835520  14.344426  15.530939
  15.553999  20.448249  33.290789
  [10] 115.260192
 
  This works but I want the matrix to update and the corresponding values of
  row 1 to switch with the sort.
 
 Maybe consider the order function 
 
 orig - matrix(c(10,20,30,3,1,2), nrow=2, byrow=TRUE)
 
 new -t(apply(orig,1,function(x) x[order(orig[2,])]))
 
  orig
  [,1] [,2] [,3]
 [1,]   10   20   30
 [2,]312
  new
  [,1] [,2] [,3]
 [1,]   20   30   10
 [2,]123
 
 HTH
 
 Pete
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Ordering-a-matrix-by-row-
 value-in-R2-15-tp4662337p4662340.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a contrast question

2013-03-24 Thread Erin Hodgess
Dear R People:

I have the following in a file:

resp factA factB
39.5 low B-
38.6 high B-
27.2 low B+
24.6 high B+
43.1 low B-
39.5 high B-
23.2 low B+
24.2 high B+
45.2 low B-
33.0 high B-
24.8 low B+
22.2 high B+

and I construct the data frame:

 collard.df - read.table(collard.txt,header=TRUE)
 collard.aov - aov(resp~factA*factB,data=collard.df)
 summary(collard.aov)
Df Sum Sq Mean Sq F value   Pr(F)
factA1   36.436.4   5.511   0.0469 *
factB1  716.1   716.1 108.419 6.27e-06 ***
factA:factB  1   13.013.0   1.971   0.1979
Residuals8   52.8 6.6
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 tapply(collard.df$resp,list(collard.df$factA,collard.df$factB),mean)
   B-   B+
high 37.0 23.7
low  42.6 25.06667


Fair enough.  Let's pretend for a second that interaction existed.  Then I
want to set up contrasts such that mean(high) - mean(low) = 0 and mean(B+)
- mean(B-)  = 0.

I know that this is really simple, but I've tried all kinds of things with
glht and am not sure that I'm on the right track.  Sorry for the trouble
for the simple question.

Thanks,
erin




-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a contrast question

2013-03-24 Thread Erin Hodgess
I found the solution:

http://stats.stackexchange.com/questions/12993/how-to-setup-and-interpret-anova-contrasts-with-the-car-package-in-r

Sorry for the trouble.


On Sun, Mar 24, 2013 at 8:58 PM, Erin Hodgess erinm.hodg...@gmail.comwrote:

 Dear R People:

 I have the following in a file:

 resp factA factB
 39.5 low B-
 38.6 high B-
 27.2 low B+
 24.6 high B+
 43.1 low B-
 39.5 high B-
 23.2 low B+
 24.2 high B+
 45.2 low B-
 33.0 high B-
 24.8 low B+
 22.2 high B+

 and I construct the data frame:

  collard.df - read.table(collard.txt,header=TRUE)
  collard.aov - aov(resp~factA*factB,data=collard.df)
  summary(collard.aov)
 Df Sum Sq Mean Sq F value   Pr(F)
 factA1   36.436.4   5.511   0.0469 *
 factB1  716.1   716.1 108.419 6.27e-06 ***
 factA:factB  1   13.013.0   1.971   0.1979
 Residuals8   52.8 6.6
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  tapply(collard.df$resp,list(collard.df$factA,collard.df$factB),mean)
B-   B+
 high 37.0 23.7
 low  42.6 25.06667
 

 Fair enough.  Let's pretend for a second that interaction existed.  Then I
 want to set up contrasts such that mean(high) - mean(low) = 0 and mean(B+)
 - mean(B-)  = 0.

 I know that this is really simple, but I've tried all kinds of things with
 glht and am not sure that I'm on the right track.  Sorry for the trouble
 for the simple question.

 Thanks,
 erin




 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com




-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error with paired t-test

2013-03-24 Thread Charlotte Rayner
This error keeps appearing when i perform a paired t-test in R
Error in t.test.default(payoff, paired = T) :   'y' is missing for paired test
This is the method i have used 
 read.table(MeanPayoff.txt,header=T)   Open  Closed1  47.5 42.37502  
 49.25000 50.3  50.0 49.80004  33.5 20.5  34.75000 33.88006  
 35.5 20.50007  33.35000 12.87508  50.0 22.50009  47.15625 34.937510 
 44.38000 43.250011 50.0 47.500012 42.12500 26.750013 27.35000 26.625014 
 31.75000 36.5000
attach(payoff)names(payoff)t.test(payoff,paired=T)
then the error keeps coming up 
Please help 
Charlotte 
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clip a contour with shapefile while using contourplot

2013-03-24 Thread Paul Murrell

Hi

Below is some code that does what I think you want by drawing a path 
based on the map data.  This does some grubby low-level work with the 
'sp' objects that someone else may be able to tidy up



# The 21st polygon in 'hello' is the big outer boundary
# PLUS the 20 other inner holes
map - as(hello, SpatialPolygons)[21]
# Convert map outline to path information
polygonsToPath - function(ps) {
# Turn the list of polygons into a single set of x/y
x - do.call(c,
 sapply(ps,
function(p) { p@coords[,1] }))
y - do.call(c,
 sapply(ps,
function(p) { p@coords[,2] }))
id.lengths - sapply(ps, function(p) { nrow(p@coords) })
# Generate vertex set lengths
list(x=x, y=y, id.lengths=id.lengths)
}
path - polygonsToPath(map@polygons[[1]]@Polygons)
# Generate rect surrounding the path
xrange - range(path$x)
yrange - range(path$y)
xbox - xrange + c(-5, 5)
ybox - yrange + c(-5, 5)
# Invert the path
invertpath - list(x=c(xbox, rev(xbox), path$x),
   y=c(rep(ybox, each=2), path$y),
   id.lengths=c(4, path$id.lengths))
# Draw inverted map over contourplot
contourplot(Salinity ~ Eastings+Northings | Time, mydata,
cuts=30, pretty=TRUE,
panel=function(...) {
panel.contourplot(...)
grid.path(invertpath$x, invertpath$y,
  id.lengths=invertpath$id.lengths,
  default=native,
  gp=gpar(col=green, fill=white))
})


The final result is far from perfect, but I hope it might be of some help.

One issue is that most of the contour labels are obscured, though that 
might be ameliorated by filling the inverted map with a semi-transparent 
colour like rgb(1,1,1,.8).


Paul

On 15/02/13 08:58, Janesh Devkota wrote:

Hi, I have done the interpolation for my data and I was able to create the
contours in multipanel with the help of Pascal. Now, I want to clip the
contour with the shapefile. I want only the portion of contour to be
displayed which falls inside the boundary of the shapefile.

The data mydata.csv can be found on
https://www.dropbox.com/s/khi7nv0160hi68p/mydata.csv

The data for shapefile can be found on
https://www.dropbox.com/sh/ztvmibsslr9ocmc/YOtiwB8p9p

THe code I have used so far is as follows:

# Load Libraries
library(latticeExtra)
library(sp)
library(rgdal)
library(lattice)
library(gridExtra)

#Read Shapefile
hello - readOGR(shape,
  layer=Export_Output_4)
## Project the shapefile to the UTM 16 zone
proj4string(hello) - CRS(+proj=utm +zone=16 +ellps=WGS84)

## Read Contour data
mydata - read.csv(mydata.csv)
head(mydata )
summary(mydata)

#Create a contourplot
contourplot(Salinity ~ Eastings+Northings | Time, mydata,
cuts=30,pretty=TRUE)

Thank you so much. I would welcome any other ways to do this aside from
contourplot and lattice.

Best Regards,
Janesh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error with paired t-test

2013-03-24 Thread Pascal Oettli

Hi,

The error message is explicit enough. You need 'y' for the paired test.

with(payoff, t.test(Open, Closed1, paired=TRUE))

HTH,
Pascal


On 25/03/13 07:42, Charlotte Rayner wrote:

This error keeps appearing when i perform a paired t-test in R
Error in t.test.default(payoff, paired = T) :   'y' is missing for paired test
This is the method i have used

read.table(MeanPayoff.txt,header=T)   Open  Closed1  47.5 42.37502  
49.25000 50.3  50.0 49.80004  33.5 20.5  34.75000 33.88006  35.5 
20.50007  33.35000 12.87508  50.0 22.50009  47.15625 34.937510 44.38000 43.250011 
50.0 47.500012 42.12500 26.750013 27.35000 26.625014 31.75000 36.5000

attach(payoff)names(payoff)t.test(payoff,paired=T)
then the error keeps coming up
Please help
Charlotte

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.