Re: [R] Producing a table with mean values

2012-09-09 Thread John Kane
My stupdity  I made a late edit for clarity and forgot to run it to be sure I 
had changed everything.
It should read:
 tabx -  ddply(meltx, .(Seamount, variable),  summarize, mean = mean(value),
 sd = sd(value))

My appologies.

John Kane
Kingston ON Canada


 -Original Message-
 From: smartpink...@yahoo.com
 Sent: Sat, 8 Sep 2012 11:14:11 -0700 (PDT)
 To: jrkrid...@inbox.com
 Subject: Re: [R] Producing a table with mean values
 
 Hi John,
 
 I am getting error messages with your solution. tabx -  ddply(nn,
 .(Seamount, variable),  summarize, mean = mean(value), sd = sd(value))
 #Error in empty(.data) : object 'nn' not found
 
 A.K.
 
 
 
 - Original Message -
 From: John Kane jrkrid...@inbox.com
 To: Tinus Sonnekus tsonne...@gmail.com; r-help@r-project.org
 Cc:
 Sent: Saturday, September 8, 2012 1:19 PM
 Subject: Re: [R] Producing a table with mean values
 
 x  -    Seamount     Pico    Nano   Micro    Total_Ch
 Off_Mount1 0.0691 0.24200 0.00100  0.31210
 Off_Mount1 0.0938 0.00521 0.02060  0.11961
 Off_Mount1 0.1130 0.2 0.06620  0.37920
 Off_Mount1 0.0864 0.15900 0.22300  0.46840
 Off_Mount1 0.0262 0.04570 0.00261  0.07451
 Off_Mount2 0.0314 0.17400 0.12800  0.33340
 Off_Mount2 0.0314 0.17400 0.12800  0.23340
 Off_Mount2 0.0414 0.17400 0.02800  0.23340
 
 xx - read.table(textConnection(x), header=TRUE, as.is=TRUE)
 
 library(reshape)
 meltx  -  melt(xx)
 
 tabx -  ddply(nn, .(Seamount, variable),  summarize, mean = mean(value),
 sd = sd(value))
 
 tabx
 
 John Kane
 Kingston ON Canada
 
 
 -Original Message-
 From: tsonne...@gmail.com
 Sent: Fri, 7 Sep 2012 22:49:55 +0200
 To: r-help@r-project.org
 Subject: [R] Producing a table with mean values
 
 Hi All,
 
 I have a data set wit three size classes (pico, nano and micro) and 12
 different sites (Seamounts). I want to produce a table with the mean and
 standard deviation values for each site.
 
       Seamount     Pico    Nano   Micro    Total_Ch
 1 Off_Mount 1 0.0691 0.24200 0.00100  0.31210
 2 Off_Mount 1 0.0938 0.00521 0.02060  0.11961
 3 Off_Mount 1 0.1130 0.2 0.06620  0.37920
 4 Off_Mount 1 0.0864 0.15900 0.22300  0.46840
 5 Off_Mount 1 0.0262 0.04570 0.00261  0.07451
 6 Off_Mount 2 0.0314 0.17400 0.12800  0.33340
 
 I tried the following script but get an error message
 
 *Error in results[i, u.Pico, u.Nano, u.Micro] - sapply(z, mean) :
 *
 *  incorrect number of subscripts *
 
 The code I used:
 
 *SChla - read.csv(SM_Chla_data.csv)*
 *sm - as.character(unique(SChla$Seamount))*
 *
 *
 *results -
 matrix(NA,nrow=length(sm),ncol=6,dimnames=list(sm,c(u.Pico,u.Nano,u.Micro,sd.Pico,sd.Nano,sd.Micro)))
 *
 *
 *
 *for (i in sm){*
 *z - subset(SChla, Seamount==i, select=c(Pico, Nano, Micro))*
 *results[i,u.Pico,u.Nano,u.Micro] - sapply(z, mean)*
 *results[i,sd.Pico,sd.Nano,sd.Micro] - sapply(z, sd)*
 *}*
 *
 *
 *print(results)*
 
 Please can some one advise me how to fix the error or maybe have an
 alternative solution I will appreciate it.
 
 Thank you.
 Tinus
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 GET FREE SMILEYS FOR YOUR IM  EMAIL - Learn more at
 http://www.inbox.com/smileys
 Works with AIM®, MSN® Messenger, Yahoo!® Messenger, ICQ®, Google Talk™
 and most webmails
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error msg in rpanel

2012-09-09 Thread Dr Subramanian S
I am working on the r panel package. Now if i have a function that uses a
radiogroup button, and if i attempt to run the function from inside the
rpanel menu, i get this error:


Error in panel$intname : $ operator is invalid for atomic vectors

However if i run the function per se i.e. not from inside the rpanel
menu, but by calling it independently, the above error doesn't appear.
Here is a simple example. Try running the whole code versus just
running the add() function. The former results in the above error and
the latter doesn't.

install.packages(c(rpanel,tkrplot))
my.menu - function(panel) {
library(rpanel,tkrplot)
if (panel$menu==Add){
add()
  }
else
panel
}

main.panel - rp.control(title = Main Menu,size=c(200,150))
rp.menu(panel = main.panel, var = menu,
labels = list(list(Addition, Add)),
action = my.menu)

#  function to do adddition

add - function(){
  my.draw - function(panel) {
 if(panel$vals==numbers){
  val-as.numeric(panel$nmbr1)+as.numeric(panel$nmbr2)
}
else if(panel$vals==strings){
  val - paste(as.character(panel$nmbr1), and ,as.character(panel$nmbr2))
}
plot(1:10, 1:10, type=n, xlab=, ylab=,
   axes=FALSE, frame = TRUE)
text(5, 5, paste(Result: , val),cex=1.4)

panel
  }

  my.redraw - function(panel) {
rp.tkrreplot(panel, my.tkrplot)
panel
  }

  my.panel - rp.control(title = Addition)
  rp.textentry(panel = my.panel, var = nmbr1,
   labels = First: , action = my.redraw, initval=100)
  rp.textentry(panel = my.panel, var = nmbr2,
   labels = Second:, action = my.redraw, initval=200)
  rp.radiogroup(panel = my.panel, var = vals,
values = c(numbers, strings),
action = my.redraw, title = Type)
  rp.tkrplot(panel = my.panel, name = my.tkrplot, plotfun = my.draw)

}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] method or package to make special boxplot

2012-09-09 Thread Jim Lemon

On 09/09/2012 12:14 AM, Zhang Qintao wrote:

Hi, All,

I am trying to use R to make the following type of boxplot while I couldn't
find a way to do it.

My dataset looks like X1 Y1 X2 Y2 SPLIT. The split highlights my
experiment details and both x and y are continuous numerical values.  I
need to plot y vs. x with split as legend and boxplot has to be used for
all splits. May I ask how to get it? Currently available boxplot only
applies on the case that X axis is character.


Hi Qintao,
Do you want a sort of 2D boxplot? The example below gives a rough idea 
as to what it would look like, with boxplots for your Xs and Ys centered 
at their medians and an abcissa with the labels for your splits. Needs a 
bit of work to turn this into a function, so let me know if it does what 
you want.


Jim

x1-rnorm(10)
y1-rnorm(10)
y2-rnorm(10)
x2-rnorm(10)
x1sum-boxplot(x1)
y1sum-boxplot(y1)
offset=4
x2sum-boxplot(x2,at=median(y2)+offset,add=TRUE)
y2sum-boxplot(y2+offset)
bxp(x1sum,at=median(y1),xlim=c(y1sum$stats[1],y2sum$stats[5]),
 ylim=c(min(c(x1sum$stats[1],x2sum$stats[1])),
 max(c(x1sum$stats[5],x2sum$stats[5]))),axes=FALSE)
bxp(y1sum,at=median(x1),add=TRUE,horizontal=TRUE,axes=FALSE)
bxp(x2sum,at=median(y2+offset),add=TRUE,axes=FALSE)
bxp(y2sum,at=median(x2),horizontal=TRUE,add=TRUE,axes=FALSE)
box()
axis(2)
axis(1,at=c(median(y1),median(y2)+offset),labels=c(Split1,Split2))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] use subset to trim data but include last per category

2012-09-09 Thread Giovanni Azua
Hello,

I bumped into the following funny use-case. I have too much data for a given 
plot. I have the following data frame df: 

 str(df)
'data.frame':   5015 obs. of  5 variables:
 $ n  : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1 ...
 $ iter   : int  10 20 30 40 50 60 70 80 90 100 ...
 $ Error  : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
 $ Duality_Gap: num  20080 3789 855 443 321 ...
 $ Runtime: num  0.00536 0.01353 0.01462 0.01571 0.01681 ...

But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations due 
to taking a snapshot every 10 iterations rather than say 500 and the plot looks 
very cluttered. So I would like to trim the data frame including only those 
records for which iter is multiple of 500 and so I do this:

df - subset(df, iter %% 500 == 0)

This gives me almost exactly what I need except that the last and most 
important Duality Gap observations are of course gone due to the filtering ... 
I would like to change the subset clause to be iter %% 500 _or_ the record is 
the last per n (n is my problem size and category in this case) ... how can I 
do that?

I thought of adding a new column that flags whether a given row is the last 
element per category as last Boolean but this is a bit too complicated .. is 
there a simpler condition construct that can be used with the subset command?

TIA,
Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with duplicates in row.names

2012-09-09 Thread Fred
Thanks Arun, 
I can manage something with that, just need then to delete the first raw
with photoshop !

Thanks

Fred



--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-duplicates-in-row-names-tp4642518p4642604.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread STADLER Frederic
Hey, I am still working on my heat map (for those who are read my previous post 
about row.names)…
Now, I would like to save my heat map.2 in .png or .tiff in order being able to 
work on the picture in photoshop, but it doesn't work.
I'am using (as I have found on some forum)
 png(heatmap.2.png)  # and it just doesn't work. when I try doing it with::
jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb file. 
completely use less !!!

I really need to have high quality image, as I will have to work on photoshop 
and also I will have to cut and zoom in just some lines of my heatmap.

#here is the code I use for my heatmap.2 :
heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60), scale=column, 
margins=c(7,10), trace=none, density.info=c(none))

Does someone know what I have to do in order to get my heatmap.2.png ??? Do I 
need some other package (I only use gplots, to allow the heatpmap.2)

THANKS for your help
Fred

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread Sarah Goslee
Normally the workflow is:

png(heatmap.png) # don't forget the second quote, as you did below
my.plot.code # whatever you need to draw the heatmap
dev.off() # people often forget this step - did you?

You'll probably want to adjust the size and resolution settings for
png() to get the desired high-resolution output; see ?png for details.

Sarah

On Sun, Sep 9, 2012 at 10:04 AM, STADLER Frederic
frederic.stad...@unifr.ch wrote:
 Hey, I am still working on my heat map (for those who are read my previous 
 post about row.names)…
 Now, I would like to save my heat map.2 in .png or .tiff in order being able 
 to work on the picture in photoshop, but it doesn't work.
 I'am using (as I have found on some forum)
 png(heatmap.2.png)  # and it just doesn't work. when I try doing it with::
jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb file. 
completely use less !!!

 I really need to have high quality image, as I will have to work on photoshop 
 and also I will have to cut and zoom in just some lines of my heatmap.

 #here is the code I use for my heatmap.2 :
heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60), scale=column, 
margins=c(7,10), trace=none, density.info=c(none))

 Does someone know what I have to do in order to get my heatmap.2.png ??? Do I 
 need some other package (I only use gplots, to allow the heatpmap.2)

 THANKS for your help
 Fred




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread Jeff Newmiller
It just doesn't work could mean anything... and for those of us for whom it 
does work, that leaves a lot of possible differences between your case and 
ours. This is your cue to read the Posting Guide.

Some issues I have encountered:

If you are using Windows, and you have opened a graphics file in an editor, and 
you try  to write a new version out with R, the editor will prevent this change 
in most cases. You have to remember to close the graphics file first.

Also, you need to remember to close the file in R using dev.off() when you are 
done writing to it for similar reasons.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

STADLER Frederic frederic.stad...@unifr.ch wrote:

Hey, I am still working on my heat map (for those who are read my
previous post about row.names)�
Now, I would like to save my heat map.2 in .png or .tiff in order being
able to work on the picture in photoshop, but it doesn't work.
I'am using (as I have found on some forum)
 png(heatmap.2.png)  # and it just doesn't work. when I try doing it
with::
jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb
file. completely use less !!!

I really need to have high quality image, as I will have to work on
photoshop and also I will have to cut and zoom in just some lines of my
heatmap.

#here is the code I use for my heatmap.2 :
heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60),
scale=column, margins=c(7,10), trace=none, density.info=c(none))

Does someone know what I have to do in order to get my heatmap.2.png
??? Do I need some other package (I only use gplots, to allow the
heatpmap.2)

THANKS for your help
Fred

   [[alternative HTML version deleted]]





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use subset to trim data but include last per category

2012-09-09 Thread Jeff Newmiller
dfthin - df[ c(which(iter %% 500 == 0),nrow(df) ]

or

 dfthin - subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df)))

N.B. You should avoid using the name df for your variables, because it is the 
name of a built-in function that you are hiding by doing so. Others may be 
confused, and eventually you may want to use that function yourself. One 
solution is to use DF for your variables... another is to use more descriptive 
names.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Giovanni Azua brave...@gmail.com wrote:

Hello,

I bumped into the following funny use-case. I have too much data for a
given plot. I have the following data frame df: 

 str(df)
'data.frame':  5015 obs. of  5 variables:
$ n  : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1
...
 $ iter   : int  10 20 30 40 50 60 70 80 90 100 ...
 $ Error  : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
 $ Duality_Gap: num  20080 3789 855 443 321 ...
 $ Runtime: num  0.00536 0.01353 0.01462 0.01571 0.01681 ...

But if I plot e.g. Runtime vs log(Duality Gap) I have too many
observations due to taking a snapshot every 10 iterations rather than
say 500 and the plot looks very cluttered. So I would like to trim the
data frame including only those records for which iter is multiple of
500 and so I do this:

df - subset(df, iter %% 500 == 0)

This gives me almost exactly what I need except that the last and most
important Duality Gap observations are of course gone due to the
filtering ... I would like to change the subset clause to be iter %%
500 _or_ the record is the last per n (n is my problem size and
category in this case) ... how can I do that?

I thought of adding a new column that flags whether a given row is the
last element per category as last Boolean but this is a bit too
complicated .. is there a simpler condition construct that can be used
with the subset command?

TIA,
Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use subset to trim data but include last per category

2012-09-09 Thread Giovanni Azua
Hi Jeff,

Thanks for your help, but this doesn't work, there are two problems. First and 
most important I need to keep the last _per category_ where my category is n 
and not the last globally. Second, there seems to be an issue with the subset 
variation that ends up not filtering anything ... but this is a minor thing.

Best.
Giovanni

On Sep 9, 2012, at 5:59 PM, Jeff Newmiller wrote:

 dfthin - df[ c(which(iter %% 500 == 0),nrow(df) ]
 
 or
 
 dfthin - subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df)))
 
 N.B. You should avoid using the name df for your variables, because it is 
 the name of a built-in function that you are hiding by doing so. Others may 
 be confused, and eventually you may want to use that function yourself. One 
 solution is to use DF for your variables... another is to use more 
 descriptive names.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 --- 
 Sent from my phone. Please excuse my brevity.
 
 Giovanni Azua brave...@gmail.com wrote:
 
 Hello,
 
 I bumped into the following funny use-case. I have too much data for a
 given plot. I have the following data frame df: 
 
 str(df)
 'data.frame':5015 obs. of  5 variables:
 $ n  : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1
 ...
 $ iter   : int  10 20 30 40 50 60 70 80 90 100 ...
 $ Error  : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
 $ Duality_Gap: num  20080 3789 855 443 321 ...
 $ Runtime: num  0.00536 0.01353 0.01462 0.01571 0.01681 ...
 
 But if I plot e.g. Runtime vs log(Duality Gap) I have too many
 observations due to taking a snapshot every 10 iterations rather than
 say 500 and the plot looks very cluttered. So I would like to trim the
 data frame including only those records for which iter is multiple of
 500 and so I do this:
 
 df - subset(df, iter %% 500 == 0)
 
 This gives me almost exactly what I need except that the last and most
 important Duality Gap observations are of course gone due to the
 filtering ... I would like to change the subset clause to be iter %%
 500 _or_ the record is the last per n (n is my problem size and
 category in this case) ... how can I do that?
 
 I thought of adding a new column that flags whether a given row is the
 last element per category as last Boolean but this is a bit too
 complicated .. is there a simpler condition construct that can be used
 with the subset command?
 
 TIA,
 Best regards,
 Giovanni
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use subset to trim data but include last per category

2012-09-09 Thread William Dunlap
 I would like to change the
 subset clause to be iter %% 500 _or_ the record is the last per n 

If your data.frame df is sorted by n you can define the function
   isLastInRun - function(x) c(x[-1] != x[-length(x)], TRUE)
and use it as
   subset(df, iter %% 500 == 0 | isLastInRun(n)) 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Giovanni Azua
 Sent: Sunday, September 09, 2012 8:14 AM
 To: r-help@r-project.org
 Subject: [R] use subset to trim data but include last per category
 
 Hello,
 
 I bumped into the following funny use-case. I have too much data for a given 
 plot. I have
 the following data frame df:
 
  str(df)
 'data.frame': 5015 obs. of  5 variables:
  $ n  : Factor w/ 5 levels 1000,2000,..: 1 1 1 1 1 1 1 1 1 1 ...
  $ iter   : int  10 20 30 40 50 60 70 80 90 100 ...
  $ Error  : num  1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
  $ Duality_Gap: num  20080 3789 855 443 321 ...
  $ Runtime: num  0.00536 0.01353 0.01462 0.01571 0.01681 ...
 
 But if I plot e.g. Runtime vs log(Duality Gap) I have too many observations 
 due to taking a
 snapshot every 10 iterations rather than say 500 and the plot looks very 
 cluttered. So I
 would like to trim the data frame including only those records for which iter 
 is multiple of
 500 and so I do this:
 
 df - subset(df, iter %% 500 == 0)
 
 This gives me almost exactly what I need except that the last and most 
 important Duality
 Gap observations are of course gone due to the filtering ... I would like to 
 change the
 subset clause to be iter %% 500 _or_ the record is the last per n (n is my 
 problem size and
 category in this case) ... how can I do that?
 
 I thought of adding a new column that flags whether a given row is the last 
 element per
 category as last Boolean but this is a bit too complicated .. is there a 
 simpler condition
 construct that can be used with the subset command?
 
 TIA,
 Best regards,
 Giovanni
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] use subset to trim data but include last per category

2012-09-09 Thread Giovanni Azua
Hello,

This solves my problem in a horribly inelegant way that works:

df - data.frame(n=newInput$n, iter=newInput$iter, Error=newInput$Error, 
Duality_Gap=newInput$Duality, Runtime=newInput$Acc)
df_last - aggregate(x=df$iter, by=list(df$n), FUN=max)
names(df_last)[names(df_last)==Group.1] - n
names(df_last)[names(df_last)==x] - iter
# n  iter
#1 1000  2518
#2 2000  5700
#3 3000 10026
#4 4000 13916
#5 5000 17962

df$last - FALSE
df$last[df$n == 1000  df$iter == 2518] - TRUE
df$last[df$n == 2000  df$iter == 5700] - TRUE
df$last[df$n == 3000  df$iter == 10026] - TRUE
df$last[df$n == 4000  df$iter == 13916] - TRUE
df$last[df$n == 5000  df$iter == 17962] - TRUE

df - subset(df, (iter %% 500 == 0) | (df$last == TRUE))

How can I do the same without hardwiring these numbers?

TIA,
Best regards,
Giovanni
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Rscript] difficulty passing named arguments from commandline

2012-09-09 Thread Tom Roche

https://github.com/TomRoche/GEIA_to_netCDF/commit/62ad6325d339c61ac4e7de5e7d4d26fa21ed918c
 # - Rscript ./netCDF.stats.to.stdout.r netcdf.fp=./GEIA_N2O_oceanic.nc 
 var.name=emi_n2o
 # fails

 # + Rscript ./netCDF.stats.to.stdout.r 'netcdf.fp=./GEIA_N2O_oceanic.nc' 
 'var.name=emi_n2o'
 # succeeds

https://stat.ethz.ch/pipermail/r-help/2012-September/323287.html
 The trailling arguments to Rscript, generally read by
 commandArgs(TRUE), come into R as a vector of character strings.
 Your script can interpret those character strings in many ways.
 The [script linked above] processed them all with

eval(parse(text=arg[i]))

 so all the arguments had to be valid R expressions: strings must be
 quoted, unquoted things are treated as names of R objects, slash
 means division, = and - mean assignment, etc.

That explains the need for strict quoting--thanks.

 If that is a problem, don't use parse() to interpret the strings;
 use sub() or strsplit() to extract substrings and do what you want
 with them. (This is somewhat safer than using eval(parse(text=))
 because it can do less.)

Assigning arguments via strsplit() does seem to be more of a PITA, but it works 
now @

https://github.com/TomRoche/GEIA_to_netCDF/blob/master/netCDF.stats.to.stdout.r

your assistance is appreciated, Tom Roche tom_ro...@pobox.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PCA legend outside of PCA plot

2012-09-09 Thread Tinus Sonnekus
Hi All,

I have been trying to get to plot my PCA legend outside of the PCA plot,
but success still alludes me.

Can you guys please advise how I can achieve this. I used locater() to
obtain coordinates for below the Comp.1 axis. Using these coordinates the
legend disappears.

Below is the code for the PCA and legend.

Thanks in advance for the help.

Regards
Tinus


r.cols - rainbow(length(unique(SEData$Seamount)))
pca1 - princomp(SEData3, scores=TRUE, cor=TRUE)
biplot(pca1, var.axes= TRUE, xlabs=rep(,nrow(SEData3)),main=Seamounts
PCA)
rrr - apply(pca1$scores[,1:2],2, range)
par(usr=as.vector(rrr))
points(pca1$scores[,1:2], col=r.cols , pch=20)
legend(-8, 2.95, sm, col = r.cols, text.col = black, lty = NULL,
pch = 20,horiz = F,)


-- 
M.J. Sonnekus
PhD Candidate (The Phytoplankton of the southern Agulhas Current Large
Marine Ecosystem (ACLME))
Department of Botany
South Campus
Nelson Mandela Metropolitan University
PO Box 77000
Port Elizabeth
South Africa
6031

Cell: 082 080 9638
E-mail: tsonne...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error msg in rpanel

2012-09-09 Thread Subramanian
If I run the whole code and click on the addition menu and then click Add,
the error comes. But not when I just call add(). So I guess the problem is
not with the rpanel package. Also tried panel[vars] instead of panel$vars.
no luck. Same error when I call the add function from the rpanel GUI.



--
View this message in context: 
http://r.789695.n4.nabble.com/Error-msg-in-rpanel-tp4642603p4642616.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread Fred
hey Sarah,
thanks for your help !!

Of Course I put the second quote also (I forgot to put it on the last post).
Sorry, I don't get the my.plot.code... # I'm new in R and use it only to
draw heatmaps right now.
Well, I did forget the dev.off(). # but I got 
null device (1) # when quartz is turned off and when it's on I get : 
quartz 2.

But I don't have any files called heatmap.2.png  on my computer.
I really don't understand why I don't get anything !

and when I do:
 jpeg (heatmap.jpg) # it works but I get only a 20kb picture which is
 useless in my case (edit and work on it in photoshop)

Fred



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-save-a-heatmap-2-in-png-jpeg-tiff-tp4642607p4642615.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread Fred
Hey Jeff, sorry for the it just doesn't work, but it's really what it
does... 
I can't find any file called heatmap.2.png on my computer after creating my
heatmap.2 (that I can see on Quartz) and typing:
 png(heatmap.2.png)
Maybe I should add, that I am working on mac with MAC OS X (v.10.8.1).!!

It is not a problem of having a picture editor open when I try to create
this file.

Any other idea ??
Thanks

Fred



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-save-a-heatmap-2-in-png-jpeg-tiff-tp4642607p4642617.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread Hasan Diwan
Mr Stadler,

On 9 September 2012 10:36, Fred frederic.stad...@unifr.ch wrote:

 But I don't have any files called heatmap.2.png  on my computer.
 I really don't understand why I don't get anything !


What does getwd() print out as a path? Check there for the your file. -- H
-- 
Sent from my mobile device
Envoyait de mon portable

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PCA legend outside of PCA plot

2012-09-09 Thread David L Carlson
Try adding the parameter xpd=TRUE to your legend() statement. Without
reproducible code it is pretty hard to be sure what the problem is.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352



 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Tinus Sonnekus
 Sent: Sunday, September 09, 2012 1:37 PM
 To: r-help@r-project.org
 Subject: [R] PCA legend outside of PCA plot
 
 Hi All,
 
 I have been trying to get to plot my PCA legend outside of the PCA
 plot,
 but success still alludes me.
 
 Can you guys please advise how I can achieve this. I used locater() to
 obtain coordinates for below the Comp.1 axis. Using these coordinates
 the
 legend disappears.
 
 Below is the code for the PCA and legend.
 
 Thanks in advance for the help.
 
 Regards
 Tinus
 
 
 r.cols - rainbow(length(unique(SEData$Seamount)))
 pca1 - princomp(SEData3, scores=TRUE, cor=TRUE)
 biplot(pca1, var.axes= TRUE,
 xlabs=rep(,nrow(SEData3)),main=Seamounts
 PCA)
 rrr - apply(pca1$scores[,1:2],2, range)
 par(usr=as.vector(rrr))
 points(pca1$scores[,1:2], col=r.cols , pch=20)
 legend(-8, 2.95, sm, col = r.cols, text.col = black, lty = NULL,
 pch = 20,horiz = F,)
 
 
 --
 M.J. Sonnekus
 PhD Candidate (The Phytoplankton of the southern Agulhas Current Large
 Marine Ecosystem (ACLME))
 Department of Botany
 South Campus
 Nelson Mandela Metropolitan University
 PO Box 77000
 Port Elizabeth
 South Africa
 6031
 
 Cell: 082 080 9638
 E-mail: tsonne...@gmail.com
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sum of column from another df based of row values of df1

2012-09-09 Thread Shivam
Dear All,

I need to sum a column from another dataframe based on the row values
of one dataframe. I am stuck in a loop trying to accomplish it and at
current speed it will take more than 80 hours to complete. Needless to
say I am looking for a more elegant/quicker solution. Really need some
help here. Here is the issue:

I have a dataframe CALL (the dput of head is given below) which has
close to a million rows. There are 2 date columns which are of
importance, DATE and EXPDATE. There is another dataframe, VOL (dput of
head given), which has 2 columns, DATE and VOL. It has the volatility
corresponding to each day and it has a total of 124 records
(corresponding to 6 months). I want to add another column in the CALL
dataframe which would contain the sum of all the volatilities from the
VOL df for the period specified by the interval of DATE and EXPDATE in
each row of CALL df.

For ex: In the first row, DATE is '03-01-2011' and EXPDATE is
'27-01-2011'. So I want the SUM column (A new column in CALL df) to
contain the sum of volatilities of 03-01, 04-01, 05-01  till 27-01
from the VOL dataframe.

I have to repeat this process for all the rows in the dataframe. Here
is the for-loop version of the solution:

for (k in 1:nrow(CALL)){
CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k]  VOL$DATE
= CALL$EXPDATE[k]))
}

The loop will run for close to a million times, it has been running
for more than 10 hours and its just 12% complete. It would take more
than 80 hours to complete, not the mention the toll it would take on
my laptop. So is there a better way that I can accomplish this task?
Any input would be greatly appreciated. Below are the dput of the two
dataframes.

One point of note is that there are only 124 DISTINCT values of DATE
and 6 DISTINCT values of EXPDATE, in case it can be used in some way.

 dput(CALL)
structure(list(NAME = c(STK, STK, STK, STK, STK,
STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029,
15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L,
6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30,
15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3,
101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L,
2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977,
14977), class = Date), DTTM = structure(c(1294044516, 1294048422,
1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct,
POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME,
EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM,
TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame)


 dput(VOL)
structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800,
1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt
), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05,
5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865,
0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA,
6L), class = data.frame)

Please do let me know if any more information from my side would help
or if I need to explain the issue more clearly.

Any minor improvement will be great help.

Thanks in advance.

-Shivam


-- 
*Victoria Concordia Crescit*

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of column from another df based of row values of df1

2012-09-09 Thread jim holtman
How about an improvement to 16 seconds.  The first thing to do is to
convert you data to a matrix because accessing data in a dataframe is
very expensive.  If you run Rprof on your code you will see that all
the time is spent in retrieving the information.  Converting to a
matrix and using matrix accessing is considerably faster.  I did
convert the POSIXct to Date.  You were also paying a lot in the
constant conversion of POSIXct to Date for your comparisons.  I just
replicated your CALL to 1 million rows for testing.


 CALL -
+ structure(list(NAME = c(STK, STK, STK, STK, STK,
+ STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029,
+ 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L,
+ 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30,
+ 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3,
+ 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L,
+ 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977,
+ 14977), class = Date), DTTM = structure(c(1294044516, 1294048422,
+ 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct,
+ POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME,
+ EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM,
+ TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame)

 VOL -
+ structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800,
+ 1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt
+ ), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05,
+ 5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865,
+ 0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA,
+ 6L), class = data.frame)

 # convert to matrices for faster testing
 mCALL - cbind(CALL$DATE, CALL$EXPDATE)
 mVOL - cbind(as.Date(VOL$DATE), VOL$VOL)  # convert POSIXct to Date

 # create 1M rows in mCALL
 mCALL - rbind(mCALL, mCALL[rep(1L, 1e6),])

 result - numeric(nrow(mCALL))
 system.time({
+ for (i in 1:nrow(mCALL)){
+ result[i] - sum(mVOL[(mVOL[, 1L] = mCALL[i,1L])
+  (mVOL[, 1L] = mCALL[i, 2L]), 2L])
+ }
+ })
   user  system elapsed
  15.940.00   16.07






On Sun, Sep 9, 2012 at 2:58 PM, Shivam shivamsi...@gmail.com wrote:
 Dear All,

 I need to sum a column from another dataframe based on the row values
 of one dataframe. I am stuck in a loop trying to accomplish it and at
 current speed it will take more than 80 hours to complete. Needless to
 say I am looking for a more elegant/quicker solution. Really need some
 help here. Here is the issue:

 I have a dataframe CALL (the dput of head is given below) which has
 close to a million rows. There are 2 date columns which are of
 importance, DATE and EXPDATE. There is another dataframe, VOL (dput of
 head given), which has 2 columns, DATE and VOL. It has the volatility
 corresponding to each day and it has a total of 124 records
 (corresponding to 6 months). I want to add another column in the CALL
 dataframe which would contain the sum of all the volatilities from the
 VOL df for the period specified by the interval of DATE and EXPDATE in
 each row of CALL df.

 For ex: In the first row, DATE is '03-01-2011' and EXPDATE is
 '27-01-2011'. So I want the SUM column (A new column in CALL df) to
 contain the sum of volatilities of 03-01, 04-01, 05-01  till 27-01
 from the VOL dataframe.

 I have to repeat this process for all the rows in the dataframe. Here
 is the for-loop version of the solution:

 for (k in 1:nrow(CALL)){
 CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k]  VOL$DATE
 = CALL$EXPDATE[k]))
 }

 The loop will run for close to a million times, it has been running
 for more than 10 hours and its just 12% complete. It would take more
 than 80 hours to complete, not the mention the toll it would take on
 my laptop. So is there a better way that I can accomplish this task?
 Any input would be greatly appreciated. Below are the dput of the two
 dataframes.

 One point of note is that there are only 124 DISTINCT values of DATE
 and 6 DISTINCT values of EXPDATE, in case it can be used in some way.

 dput(CALL)
 structure(list(NAME = c(STK, STK, STK, STK, STK,
 STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029,
 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L,
 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30,
 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3,
 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L,
 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977,
 14977), class = Date), DTTM = structure(c(1294044516, 1294048422,
 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct,
 POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME,
 EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM,
 TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame)


 dput(VOL)
 structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800,
 1294252200, 1294338600, 1294597800), class = 

Re: [R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread Rui Barradas

Hello,


Em 09-09-2012 18:36, Fred escreveu:

hey Sarah,
thanks for your help !!

Of Course I put the second quote also (I forgot to put it on the last post).
Sorry, I don't get the my.plot.code...


What Sarah meant is that you must put your.plot.code between the 
instructions that open and close the graphics device. This is example 1 
from the 'gplots::heatmap.2' help page, adapted.


# From ?gplots::heatmap.2
library(gplots)
data(mtcars)
x  - as.matrix(mtcars)


# Plot nothing, but like Jeff said (suggested) it does something
# it opens the device and closes it
png(file = myplot.png, bg = transparent)
dev.off()  # 318 bytes file in current directory

# Plot an heatmap.2, example 1 in ?gplots::heatmap.2
png(file = heatmap2.png)
heatmap.2(x)  ## default - dendrogram plotted and reordering done.
dev.off()  ## 10Kb file in current dir

# The same but to a jpeg graphics device
jpeg(file = heatmap2.jpeg)
heatmap.2(x)  ## same as above
dev.off()  ## 46Kb file


Hope this helps,

Rui Barradas

# I'm new in R and use it only to
draw heatmaps right now.
Well, I did forget the dev.off(). # but I got
null device (1) # when quartz is turned off and when it's on I get : 
quartz 2.

But I don't have any files called heatmap.2.png  on my computer.
I really don't understand why I don't get anything !

and when I do:

jpeg (heatmap.jpg) # it works but I get only a 20kb picture which is
useless in my case (edit and work on it in photoshop)

Fred



--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-save-a-heatmap-2-in-png-jpeg-tiff-tp4642607p4642615.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum of column from another df based of row values of df1

2012-09-09 Thread Shivam
Thanks a lot Jim, it works a treat. Just had to change the date format
in the mCALL as well. But you saved me 80 hours of fretting and
frustration. Really thankful for it.

Regards,
Shivam

On Mon, Sep 10, 2012 at 1:33 AM, jim holtman jholt...@gmail.com wrote:
 How about an improvement to 16 seconds.  The first thing to do is to
 convert you data to a matrix because accessing data in a dataframe is
 very expensive.  If you run Rprof on your code you will see that all
 the time is spent in retrieving the information.  Converting to a
 matrix and using matrix accessing is considerably faster.  I did
 convert the POSIXct to Date.  You were also paying a lot in the
 constant conversion of POSIXct to Date for your comparisons.  I just
 replicated your CALL to 1 million rows for testing.


 CALL -
 + structure(list(NAME = c(STK, STK, STK, STK, STK,
 + STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029,
 + 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L,
 + 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30,
 + 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3,
 + 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L,
 + 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977,
 + 14977), class = Date), DTTM = structure(c(1294044516, 1294048422,
 + 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct,
 + POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = c(NAME,
 + EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM,
 + TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame)

 VOL -
 + structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800,
 + 1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt
 + ), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05,
 + 5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865,
 + 0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA,
 + 6L), class = data.frame)

 # convert to matrices for faster testing
 mCALL - cbind(CALL$DATE, CALL$EXPDATE)
 mVOL - cbind(as.Date(VOL$DATE), VOL$VOL)  # convert POSIXct to Date

 # create 1M rows in mCALL
 mCALL - rbind(mCALL, mCALL[rep(1L, 1e6),])

 result - numeric(nrow(mCALL))
 system.time({
 + for (i in 1:nrow(mCALL)){
 + result[i] - sum(mVOL[(mVOL[, 1L] = mCALL[i,1L])
 +  (mVOL[, 1L] = mCALL[i, 2L]), 2L])
 + }
 + })
user  system elapsed
   15.940.00   16.07






 On Sun, Sep 9, 2012 at 2:58 PM, Shivam shivamsi...@gmail.com wrote:
 Dear All,

 I need to sum a column from another dataframe based on the row values
 of one dataframe. I am stuck in a loop trying to accomplish it and at
 current speed it will take more than 80 hours to complete. Needless to
 say I am looking for a more elegant/quicker solution. Really need some
 help here. Here is the issue:

 I have a dataframe CALL (the dput of head is given below) which has
 close to a million rows. There are 2 date columns which are of
 importance, DATE and EXPDATE. There is another dataframe, VOL (dput of
 head given), which has 2 columns, DATE and VOL. It has the volatility
 corresponding to each day and it has a total of 124 records
 (corresponding to 6 months). I want to add another column in the CALL
 dataframe which would contain the sum of all the volatilities from the
 VOL df for the period specified by the interval of DATE and EXPDATE in
 each row of CALL df.

 For ex: In the first row, DATE is '03-01-2011' and EXPDATE is
 '27-01-2011'. So I want the SUM column (A new column in CALL df) to
 contain the sum of volatilities of 03-01, 04-01, 05-01  till 27-01
 from the VOL dataframe.

 I have to repeat this process for all the rows in the dataframe. Here
 is the for-loop version of the solution:

 for (k in 1:nrow(CALL)){
 CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k]  VOL$DATE
 = CALL$EXPDATE[k]))
 }

 The loop will run for close to a million times, it has been running
 for more than 10 hours and its just 12% complete. It would take more
 than 80 hours to complete, not the mention the toll it would take on
 my laptop. So is there a better way that I can accomplish this task?
 Any input would be greatly appreciated. Below are the dput of the two
 dataframes.

 One point of note is that there are only 124 DISTINCT values of DATE
 and 6 DISTINCT values of EXPDATE, in case it can be used in some way.

 dput(CALL)
 structure(list(NAME = c(STK, STK, STK, STK, STK,
 STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029,
 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L,
 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30,
 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3,
 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L,
 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977,
 14977), class = Date), DTTM = structure(c(1294044516, 1294048422,
 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct,
 POSIXt), tzone = ), TTE 

Re: [R] Sum of column from another df based of row values of df1

2012-09-09 Thread Shivam
Just to add, I did not know that the speed of data access is so much
different in matrix and dataframes. This is one for the future.

Thanks again Jim :)

-Shivam

On Mon, Sep 10, 2012 at 3:29 AM, Shivam shivamsi...@gmail.com wrote:
 Thanks a lot Jim, it works a treat. Just had to change the date format
 in the mCALL as well. But you saved me 80 hours of fretting and
 frustration. Really thankful for it.

 Regards,
 Shivam

 On Mon, Sep 10, 2012 at 1:33 AM, jim holtman jholt...@gmail.com wrote:
 How about an improvement to 16 seconds.  The first thing to do is to
 convert you data to a matrix because accessing data in a dataframe is
 very expensive.  If you run Rprof on your code you will see that all
 the time is spent in retrieving the information.  Converting to a
 matrix and using matrix accessing is considerably faster.  I did
 convert the POSIXct to Date.  You were also paying a lot in the
 constant conversion of POSIXct to Date for your comparisons.  I just
 replicated your CALL to 1 million rows for testing.


 CALL -
 + structure(list(NAME = c(STK, STK, STK, STK, STK,
 + STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029,
 + 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L,
 + 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30,
 + 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3,
 + 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 2000L, 1700L, 2000L,
 + 2000L), DATE = structure(c(14977, 14977, 14977, 14977, 14977,
 + 14977), class = Date), DTTM = structure(c(1294044516, 1294048422,
 + 1294048350, 1294048453, 1294048327, 1294048347), class = c(POSIXct,
 + POSIXt), tzone = ), TTE = c(38, 38, 38, 38, 38, 38)), .Names = 
 c(NAME,
 + EXPDATE, STRIKE, TMSTMP, PRICE, QUANT, DATE, DTTM,
 + TTE), row.names = c(1, 2, 3, 4, 5, 6), class = data.frame)

 VOL -
 + structure(list(DATE = structure(c(1293993000, 1294079400, 1294165800,
 + 1294252200, 1294338600, 1294597800), class = c(POSIXct, POSIXt
 + ), tzone = ), VOL = c(2.32666706461792e-05, 6.79164443640051e-05,
 + 5.66390788200039e-05, 7.25422438459608e-05, 0.000121727951296865,
 + 0.000216076713994619)), .Names = c(DATE, VOL), row.names = c(NA,
 + 6L), class = data.frame)

 # convert to matrices for faster testing
 mCALL - cbind(CALL$DATE, CALL$EXPDATE)
 mVOL - cbind(as.Date(VOL$DATE), VOL$VOL)  # convert POSIXct to Date

 # create 1M rows in mCALL
 mCALL - rbind(mCALL, mCALL[rep(1L, 1e6),])

 result - numeric(nrow(mCALL))
 system.time({
 + for (i in 1:nrow(mCALL)){
 + result[i] - sum(mVOL[(mVOL[, 1L] = mCALL[i,1L])
 +  (mVOL[, 1L] = mCALL[i, 2L]), 2L])
 + }
 + })
user  system elapsed
   15.940.00   16.07






 On Sun, Sep 9, 2012 at 2:58 PM, Shivam shivamsi...@gmail.com wrote:
 Dear All,

 I need to sum a column from another dataframe based on the row values
 of one dataframe. I am stuck in a loop trying to accomplish it and at
 current speed it will take more than 80 hours to complete. Needless to
 say I am looking for a more elegant/quicker solution. Really need some
 help here. Here is the issue:

 I have a dataframe CALL (the dput of head is given below) which has
 close to a million rows. There are 2 date columns which are of
 importance, DATE and EXPDATE. There is another dataframe, VOL (dput of
 head given), which has 2 columns, DATE and VOL. It has the volatility
 corresponding to each day and it has a total of 124 records
 (corresponding to 6 months). I want to add another column in the CALL
 dataframe which would contain the sum of all the volatilities from the
 VOL df for the period specified by the interval of DATE and EXPDATE in
 each row of CALL df.

 For ex: In the first row, DATE is '03-01-2011' and EXPDATE is
 '27-01-2011'. So I want the SUM column (A new column in CALL df) to
 contain the sum of volatilities of 03-01, 04-01, 05-01  till 27-01
 from the VOL dataframe.

 I have to repeat this process for all the rows in the dataframe. Here
 is the for-loop version of the solution:

 for (k in 1:nrow(CALL)){
 CALL$SUM[k] = sum(subset(VOL$VOL, VOL$DATE = CALL$DATE[k]  VOL$DATE
 = CALL$EXPDATE[k]))
 }

 The loop will run for close to a million times, it has been running
 for more than 10 hours and its just 12% complete. It would take more
 than 80 hours to complete, not the mention the toll it would take on
 my laptop. So is there a better way that I can accomplish this task?
 Any input would be greatly appreciated. Below are the dput of the two
 dataframes.

 One point of note is that there are only 124 DISTINCT values of DATE
 and 6 DISTINCT values of EXPDATE, in case it can be used in some way.

 dput(CALL)
 structure(list(NAME = c(STK, STK, STK, STK, STK,
 STK), EXPDATE = structure(c(15029, 15029, 15029, 15029, 15029,
 15029), class = Date), STRIKE = c(6300L, 6300L, 6300L, 6300L,
 6300L, 6300L), TMSTMP = c(14:18:36, 15:23:42, 15:22:30,
 15:24:13, 15:22:07, 15:22:27), PRICE = c(107, 102.05, 101.3,
 101.5, 101.2, 101.2), QUANT = c(1850L, 2000L, 

Re: [R] how to save a heatmap.2 in png /jpeg /tiff

2012-09-09 Thread David Winsemius

On Sep 9, 2012, at 7:04 AM, STADLER Frederic wrote:

 Hey, I am still working on my heat map (for those who are read my previous 
 post about row.names)∑
 Now, I would like to save my heat map.2 in .png or .tiff in order being able 
 to work on the picture in photoshop, but it doesn't work.
 I'am using (as I have found on some forum)
 png(heatmap.2.png)  # and it just doesn't work. when I try doing it with::
 jpeg(heatmap.2.jpeg) # it works once every 10 times, but it's a 22kb file. 
 completely use less !!!

Neither of those should have _ever_ worked, since they both are missing 
closing quotes.

Furthermore, just emitting the command jpeg(filename.jpg) even with proper 
closing quotes will be completely useless, as you say, unless you follow the 
plot() command with dev.off().

?Devices
?jpeg  # and please DO the examples

 
 I really need to have high quality image, as I will have to work on photoshop 
 and also I will have to cut and zoom in just some lines of my heatmap.
 
 #here is the code I use for my heatmap.2 :
 heatmap.2(a_matrix, Rowv=NA, Colv =NA, col=greenred(60), scale=column, 
 margins=c(7,10), trace=none, density.info=c(none))
 
 Does someone know what I have to do in order to get my heatmap.2.png ??? Do I 
 need some other package (I only use gplots, to allow the heatpmap.2)
 
Pleaese include complete code. What you have provided so far should, as you 
say, be completely useless!!!.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.