[R] system command to a specific shell (bash)

2012-04-16 Thread Justin Haynes
I need to run a bash command, but when you call system() the default shell
is sh (see my sessionInfo below).
I found the shell command (
but it seems to be disappeared in current versions of R?
I am running all this from R CMD BATCH  with system calls to other R

For a little more info, I'm generating sphinx documents (a python
documentation library) through R and need to use a python virtual
So I need to call system('source bin/activate'), but source isn't a
recognized command in the sh shell...

Any help is appreciated,


R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)


attached base packages:
[1] graphics  grDevices utils datasets  stats grid  methods

other attached packages:
[1] ggplot2_0.9.0  reshape2_1.2.1 plyr_1.7.1

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4digest_0.5.1   MASS_7.3-16
 [7] proto_0.3-9.2  RColorBrewer_1.0-5 scales_0.2.0   stringr_0.6

Re: [R] system command to a specific shell (bash)

2012-04-16 Thread Justin Haynes
Thanks Jeff, but I'm running a python program that expects certain
functionality that bash provides and sh doesn't...  I can just stop using
github checkouts and use system packages though and fix this.

I'm mostly wondering where the shell command went in base R... it sounds
like it completely solves this issue but doesn't exist in my R

Re: [R] A little exercise in R!

2012-04-14 Thread Justin Haynes
Since I thought this was a cool question, I posted it to StackOverflow.
 Vincent Zookynd's  answer is amazing and really exercises the power of R.


Re: [R] A little exercise in R!

2012-04-13 Thread Justin Haynes
I thought this was kinda cool!  Here's my solution, its not robust or
probably efficient

I'd to hear improvements or other solutions!


sq.test - function(a, b) {
  ## test for number pairs that sum to squares.
  sqrt(sum(a, b)) == floor(sqrt(sum(a, b)))

ok.pairs - function(n, vec) {
  ## given n as a member of vec,
  ## which other members of vec satisfiy sq.test
  vec - vec[vec!=n]
  vec[sapply(vec, sq.test, b=n)]

grow.seq - function(y) {
  ## given a starting point (y) and a pairs list (pl)
  ## grow the squaring sequence.
  ly - length(y)
  if(ly == y[1]) return(y)

  ## this line is the one that breaks down on other number sets...
  y - c(y, max(pl[[y[ly]]][!pl[[y[ly]]] %in% y]))
  y - grow.seq(y)


## start vector
x - 1:17

## get list of possible pairs
pl - lapply(x, ok.pairs, vec=x)

## pick start at max since few combinations there.
y - max(x)

Re: [R] Remove carriage return in writing tab-delimited file.

2012-04-04 Thread Justin Haynes
take a look at ?paste

paste(yourmatrix, sep='\t', collapse='')

Re: [R] sampling rows from a list

2012-04-02 Thread Justin Haynes
## recreating your data
mydata-list(matrix(1:9, nrow=3, byrow=T),
  matrix(10:15, nrow=2, byrow=T),
  matrix(16:30, nrow=5, byrow=T))

## get the shortest matrix in your list
n - min(unlist(lapply(mydata, nrow)))

## subset the list into random samples of length n
out - lapply(mydata, function(x, n) x[sample(1:nrow(x), n),], n=n)
## this  structure is still a list though...

## converting directly to an array:
out.array - array(unlist(out), dim=c(dim(out[[1]]), length(out)))

not totally sure about what structure you're wanting in the last step,
so if i missed i apologize...

Hope that helps,


Re: [R] list assignment syntax?

2012-03-30 Thread Justin Haynes
You can also take a look at


which has some additional solutions.

Re: [R] scanning data into r

2012-03-28 Thread Justin Haynes
What have you tried?

What type of file are you trying to import from?

What do you want your data to look like in R?

take a look at ?read.table and ?readLines

Re: [R] Why does this work? plyr within-subset normalization

2012-03-28 Thread Justin Haynes
To those without access to nabble, the code in reference is:

relative - ddply(ranktable, .(Timestamp), function(x)
data.frame(relative = x[,5]/max(x[,5])))

I may be misunderstanding your question, but:

ddply splits your data.frame, ranktable, by the column Timestamp into
many smaller data.frames, one for each unique Timestamp value.

Those new small data.frames are sent one at a time to the function you
So, when you call max(x[,5]) you're taking the max of the data.frame
sent to the function rather than the max of the larger ranktable

Re: [R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Justin Haynes
In most regexs the carrot( ^ ) signifies the start of a line and the
dollar sign ( $ ) signifies the end.

gsub('^S S', 'S', a)

gsub('^S S', 'S', '3421 BIGS St')

you can use logical or inside your pattern too:

gsub('^S S|S S$| S S ', 'S', a)

the  S S  condition is difficult.

gsub('^S S|S S$| S S ', 'S', 'foo S S bar')

gives the wrong output. as does:

gsub('^S S | S S$| S S ', ' S ', 'foo S S bar')
gsub('^S S | S S$| S S ', ' S ', a)

so you might have to catch that with a second gsub.

gsub(' S S ', ' S ', 'foo S S bar')

Re: [R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Justin Haynes
wow!  and here I thought I was starting to know most things about regexes...

Re: [R] Convert day of year back into a date format.

2012-03-27 Thread Justin Haynes
There may very well be a better solution, but this works.

format(strptime(dayofyear, format=%j), format=%m-%d)

Re: [R] Remove a word from a character vector value XXXX

2012-03-07 Thread Justin Haynes
Hadley's package stringr is wonderful for all things string.




?str_replace are what you want.  (the base R equivalent of these two
would be ?gsub and some regular expressions)

str_trim(str_replace(d5.Region, 'Average', ''))

should do the trick.

hope that helps,

Re: [R] logical to vector?

2012-03-07 Thread Justin Haynes

 as.numeric(c(TRUE, FALSE))
[1] 1 0

Re: [R] GPS handling libraries or (String manipulation)

2012-03-07 Thread Justin Haynes
Take a look at:

But I've always just parsed the string...

This is from the last time I did this, its not quite the same but you
can see the similarities.

## if data is presented as 43°02'46.60059 N need to split on the °
symbol, ' and .
to.decimal - function(vec){
  # convert all symbols to _
  vec - gsub('°','_',vec)
  vec - gsub('\'','_',vec)
  vec - gsub('\','_',vec)

  split - str_split(vec,'_')
  deg - as.numeric(sapply(split,'[',1))
  min - as.numeric(sapply(split,'[',2))
  sec - as.numeric(sapply(split,'[',3))

  deg - deg + min/60 + sec/3600

Re: [R] GPS handling libraries or (String manipulation)

2012-03-07 Thread Justin Haynes
Wow... that is WAY better!

Thanks Gabor!

Re: [R] regular expression

2012-02-29 Thread Justin Haynes
gsub('.+; (.+);.+','\\1',x)

or if you just want the value out:

gsub('.+; Surv\\(months\\): ([0-9]+);.+','\\1',x)

You can also look at strsplit:
[1] 99-625: Cell type: S Surv(months): 21   
STATUS(0=alive, 1=dead): 1

[1]  Surv(months): 21

But i would follow David's second suggestion and just read them in with
sep=';' instead.


Re: [R] Problem building up ggplot graph in a loop.

2012-02-16 Thread Justin Haynes
ggplot is looking for thisData as a column of coffs.  the most
'ggplotesque' way of doing this would be:

# melt your data to a long format:
coffs.melt - melt(coffs, id.vars = 'levels')

# plot using colour aes parameter:
ggplot(coffs.melt, aes(x=levels, y=value, colour=variable)) + geom_line() +
ylab('Total Chargeoffs')

this is untested since there is no sample data!


Re: [R] Change dataframe-structure

2012-02-13 Thread Justin Haynes
There is probably a more ellegant way, but:

 df -
 as.data.frame(t(apply(df,1,function(x) names(x)[match(1:6,x)])))
  V1 V2 V3 V4 V5 V6
1 p1 p3 p2 p5 p4 p6
2 p3 p1 p2 p5 p6 p4
3 p1 p2 p3 p4 p6 p5

Re: [R] debug in a loop

2012-02-10 Thread Justin Haynes
You can add

if(is.na(tab[i])) browser()


if(is.na(tab[i])) break

see inline

Re: [R] Memory allocation problem (again!)

2012-02-08 Thread Justin Haynes
32 bit windows has a memory limit of 2GB.  Upgrading to a computer thats
less than 10 years old is the best path.

But short of that, if you're just generating random data, why not do it in
two or more pieces and combine them later?

mat.1 - matrix(rnorm(5*2000),nrow=5)
mat.2 - matrix(rnorm(5*2000),nrow=5)
mat.3 - matrix(rnorm(5*2000),nrow=5)

mat.1.sums - rowSums(mat.1)
mat.2.sums - rowSums(mat.2)
mat.3.sums - rowSums(mat.3)

mat.sums - c(mat.1.sums,mat.2.sums,mat.3.sums)

Re: [R] Help need

2012-02-07 Thread Justin Haynes
Instead of a for loop, why not use the vectorization inherent in R?

sigmasqaured - 1
i - complex(real = 0, imaginary =1)
f - seq(0,0.5,0.1)

[1] 9.632720e+00 1.411130e+03 2.947753e+00 6.479994e-02 1.295175e-02

Re: [R] I bet apply has a solution

2012-02-06 Thread Justin Haynes
How bout:

 apply(Data..,1, function(vec) !all(vec==vec[1]))

Re: [R] Select elements from text

2012-01-24 Thread Justin Haynes
how bout using read.table(... , sep= ).

That would give you a vector of single words.  then


will return a boolean vector

[1] [bracket] [bar]

You might need a more complex reg-ex to catch them all incase of
([citation]) instances for example.


Re: [R] drop columns whose rows are all 0

2012-01-24 Thread Justin Haynes
 apply(dataset,2,function(x) all(x==0))
a b c

 dataset[,!apply(dataset,2,function(x) all(x==0))]
a b
1   1 0
2   2 0
3   3 0
4   4 1
5   5 0
6   6 0
7   7 0
8   8 0
9   9 1
10 10 0

Re: [R] How can I access information stored after I run a command in R?

2012-01-23 Thread Justin Haynes
?str tells you about the object.


from that you can see the names of the various parts including p.value.

foo - MAX3(a,'asy',1)$p.value

Re: [R] colored outliers

2012-01-20 Thread Justin Haynes
sep=;, dec=,, encoding=UTF-8)
plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3)  ## this line is coloring
all points because you're using TOC_NI still

points(NI~TOC,data=circ,col='red',pch=1,size=3)  ## now we're only plotting
the four points in circ.

sorry for the confusion.  however, in the future please provide a
reproducible data set along with your question so we can more easily help.


On Fri, Jan 20, 2012 at 5:49 AM, Geophagus

 Dear Petr and Justin,
 my problem ist, that I only want to have the 4 highest values for Ni as a
 red point or with a red circle. The other points should not be modificated.
 In your proposals always all points get a red circle or a red point not
 the 4 highest Ni values!
 I hope you could understand me!
 Thanks  for your help!

 View this message in context:
 Sent from the R help mailing list archive at Nabble.com.

Re: [R] Stacked barchart in ggplot (or other library)

2012-01-20 Thread Justin Haynes
to use ggplot:


Re: [R] Establishing groups using something other than ifelse()

2012-01-19 Thread Justin Haynes
how bout

levels(df$z)[grep('A',levels(df$z))] - 'A'
levels(df$z)[grep('B',levels(df$z))] - 'B'
levels(df$z)[grep('C',levels(df$z))] - 'C'

does that do what you're wanting?

On Thu, Jan 19, 2012 at 3:05 PM, Sam Albers tonightstheni...@gmail.comwrote:

 Hello all,

 This is one of those Is there a better way to do this questions. Say
 I have a dataframe (df) with a grouping variable (z). This is my base
 data. Now I know that there is a higher order level of grouping that
 exist for my group variable. So what I want to do is create a new
 column that express that higher order level of grouping based on
 values in the sub-group (z  in this case). In the past I have used
 ifelse() but this tends to get fairly redundant and messy with a large
 amount of sub-groupings (z). I've created a sample dataset below. Can
 anyone recommend a better way of achieving what I am currently
 achieving with ifelse()? A long series of ifelse statements makes me
 think that there is something better for this.

 ## Dataframe creation
 df - data.frame(x=runif(36, 0, 120),
   y=runif(36, 0, 120),


 ## Current method is grouping
 df$Big.Group - with(df, ifelse(df$z==A1,A, ifelse(df$z==A2,A,
 ifelse(df$z==B1, B, ifelse(df$z==B2, B, C)

[R] png output on a server?

2012-01-18 Thread Justin Haynes
I've got R running on a gentoo server that doesn't have X11 installed.  Its
a custom build to keep those dependencies at bay!  However, some of my
scripts use the base png() function and ggplot2. But, png uses X11.

A google search suggests using the Cairo package, which works... but
changes the fonts (specifically the size of the font).  Adjusting the
pointsize doesn't seem to have much effect.

Aside from tuning the CairoPNG function to make my graphs look right, has
anyone found a good way to avoid the X11 dependency but still use the base
png function?

If anyone has experience with CairoPNG and making it look like the base png
function, id love to hear what you've learned!



Re: [R] Points inside a polygon

2012-01-12 Thread Justin Haynes

On Wed 11 Jan 2012 08:28:03 PM PST, Hasan Diwan wrote:

I have a list of bounds for a series of polygons. I do understand the
formula to determine whether point i is within polygon X (X[x1]  i[x]
  X[x2]  i[x]  X[y1]  i[y]  X[y2]  i[y]), and I can apply this
throughout the dataset. However, this naive algorithm doesn't scale
very well. The data set contains 10,000 points consisting of (n,e)
pairs where I'm interested in which are inside polygons denoted by
vertices (V[x1]/V[y1],V[x2],V[y2]). Is there a shortcut to accomplish
this goal? Many thanks!  -- H

Check out the splancs package.  particularly the inout function.


and provide commented, minimal, self-contained, reproducible code.

Re: [R] relative frequency plot using ggplot or other function

2012-01-12 Thread Justin Haynes

On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote:

I have a data frame in the following form. There are two groups and for
each 'width' relative frequency for group1 and group2 is given. How to plot
this in R using ggplot or other package.

  Width   relativeFrequency1   relativeFrequency2
1   100 0.0006388783 0.02265428
2   200 0.0022677303 0.02948625
3   300 0.0061182673 0.01739936
4   400 0.0152237225 0.02569902
5   500 0.0300215262 0.03639880
6   600 0.0597610250 0.07717765


not sure exactly what you're looking for but...

dat-data.frame(width=1:6*100,rel1=runif(6), rel2=runif(6))

Re: [R] relative frequency plot using ggplot or other function

2012-01-12 Thread Justin Haynes

the fill and colour variables can be removed if you want.



same with this version.

Re: [R] Add color to Boxplot by value

2012-01-12 Thread Justin Haynes
how bout:


Re: [R] colored outliers

2012-01-10 Thread Justin Haynes
# find top 4 points

# add them to your plot!
plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)


Re: [R] colored outliers

2012-01-10 Thread Justin Haynes
woops! see inline.

Hope that helps, and enjoy R.


Re: [R] match matrices of different lengths

2012-01-05 Thread Justin Haynes
see ?merge

x   y   b
1 2.00112e+11 1.0 1.2
2 2.00112e+11 1.1 1.9

making the two matricies time series does not mean that R knows that the
first column is a datetime.
and depending on your desired result, that may not be important.

hope that helps,


On Thu, Jan 5, 2012 at 5:51 AM, Thijs vanden Bergh 
bergh.thijsvan...@gmail.com wrote:

 was trying to match different matrices of different lengths with in
 the first collumn date and time info (yearmonthdayhourminute). the
Re: [R] ggplot2 - tricky problem

2012-01-05 Thread Justin Haynes
how bout:


  aes(x=factor(city),y=value)) +
geom_point() +

the line drawing is a bit more tricky...  Since the x values are factors
rather than continuous, fitting a line to them is kind of nonsense.  It
matters which order they are in for example.  If instead you want to plot
something like:


You could draw fit lines that make a bit more sense.  Forgive me if I'm
Re: [R] [newbie] stack operations, or functions with side effects (or both)

2012-01-04 Thread Justin Haynes
do s[1] and s[-1] do what you're looking for?
those are just to display... if you want to change s, you need to reassign
it or fiddle with namespacing.  however, I'd say it is better to write R
code as though data structures are immutable until you explicitly re-assign
them rather than trying to deal with side effects and state...

 pop - function(vec){
+   print(vec[1])
+   print(vec[-1])
+   return(vec[-1])
 s - 1:5
 s - pop(s)
[1] 1
[1] 2 3 4 5
[1] 2 3 4 5

Re: [R] Combining characters

2012-01-04 Thread Justin Haynes
apply(expand.grid(x, y, z, stringsAsFactors=F), 1, paste, collapse=' ')

Re: [R] a quick question about rbinom

2012-01-04 Thread Justin Haynes
homework or not,


should be plenty.

Re: [R] Applyiing mode() or class() to each column of a data.frame XXXX

2011-12-30 Thread Justin Haynes
there is also colwise in the plyr package.

  v13 v14   v15 f4 v16
1 integer numeric character factor logical


Re: [R] Help with code

2011-12-20 Thread Justin Haynes
the short answer... which is a guess cause you didn't provide a
reproducible example... is:

your column (i think its called t1d_ptype[1:25]) is a factor and using
factors is dangerous at best.

you can check with ?str.

see ?factor for how to convert back to strings and see if your code works.

to answer your second question, yes I'm sure there is a better simple way
to do this, but i can't follow what you're doing... for example, I don't
know what c1 is...

but, the place I would look is at the plyr package.  its excellent at
splitting and reordering data.

and one final note, you should avoid naming things with pre-existing R
functions (e.g. data).


Re: [R] Help with code

2011-12-20 Thread Justin Haynes
Fair enough and good point.  How about, dangerous when used unknowingly!

On Tue, Dec 20, 2011 at 1:01 PM, William Dunlap wdun...@tibco.com wrote:

  your column (i think its called t1d_ptype[1:25]) is a factor and using
  factors is dangerous at best.

 This depends on how you want to define dangerous.  If t1d_ptype ought
 take values from a certain set of strings then making it a factor gives
 you some safety, since it warns you when you go outside of that set and
 try to give it an illegal value.  E.g.,
 sex - factor(c(M,F,F), levels=c(F, M))
 sex[2] - no
Warning message:
In `[-.factor`(`*tmp*`, 2, value = no) :
   invalid factor level, NAs generated

 It does take more work to set up, since you need to enumerate the set
 of good strings.  That is tedium, not danger.

 If t1d_ptype might take any value, then make it a character vector.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

   for( i in 1:dim(c1)[1])
for (j in 1:dim(c1)[2])
   if (c1[i,j]==2) num_comp=num_comp+1  #Y=2
for (j in 1:dim(c1)[2])
if (data$t1d_ptype[i] == T1D  c1[i ,j] == 2)
  if (data$t1d_ptype[i] == T1D  c1[i, j] == 1)
  if(substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 2)
  if (substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 1)
  if(data$t1d_ptype[i] == T1D) c2[i,j]-T1D_noc
  if(substr(data$t1d_ptype[i],1,4) == Ctrl)
   it is giving me error
   In `[-.factor`(`*tmp*`, iseq, value = structure(c(NA,  ... :
invalid factor level, NAs generated
   Also it there a simple way to do this.
   View this message in context:
   Sent from the R help mailing list archive at Nabble.com.
Re: [R] how to manually enter an double quote as data feed?

2011-12-13 Thread Justin Haynes
\ is how its displayed on the screen.  however, if you write your object
to a csv it will be correct.  r cant display  as it is so it is escaping
the second double quote for you

however, ' (double quote single quote double quote) does display
correctly as well as save correctly.

If that doesn't answer your question, some more back story on what you're
trying to do would help.


 I'm doing a text mining project where I have to manually enter a double
 as an element inside a vector.

 I tried

 char[10]=''#where i enclosed the double quote in a pair of single quotes.

 But the result is [1] \. Somehow a back slash is added automatically.

 I also tried to enclose the double quote in a pair of double quotes. That
 didn't work either.

 I'm using Mac and latest release of R.

 Thank you!

 Bonnie Yuan

 View this message in context:
 Sent from the R help mailing list archive at Nabble.com.

Re: [R] using sample

2011-12-07 Thread Justin Haynes

If you haven't spent much time on the r-help forums, please do read the
posting guide.

You need to provide reproducible examples for us to help you.

We don't know anything about your data...

what is event.details, (if you can't provide the data often ?str will do)

since I don't know what event.details is, I can't figure out waht the line:

obs = (1:133429)[event.details[,2] == i]

is supposed to do.

But if I had to guess... ?sample says it expects the first argument as a
vector.  I assume obs is not a vector but a larger structure?

Feel free to post more info about your data (see ?str and ?dput) or if you
can generate made up data that replicates your problem that works too.


 Can anyone help sort out the problem with the following script - I am a R
 newbie and I am self taught.

 obs.all = c()
 for(i in 1:386){
  if (n.sim[i]0){
obs = (1:133429)[event.details[,2] == i]
obs.all = c(obs.all, sample(obs[obs  n.sim[i]], size = n.sim[i],

 Basically, in the sample bit, I only want to get obs.all if the value of
 is less than the value of n.sim[i]. I get the error message

 Error in sample(obs[obs  n.sim[i]], size = n.sim[i], replace = T) :
  invalid first argument

 length(n.sim)  is 386

 Thanks in advance for your suggestions


 View this message in context:
 Sent from the R help mailing list archive at Nabble.com.

Re: [R] hour in x-axis

2011-11-29 Thread Justin Haynes
without knowing much about your data or the base plotting...

I'd use the library ggplot2.

First, you'll need to format your dates to POSIXct

AggData$time - as.POSIXct(AggData$time,format='%H:%M')

Then plotting is trivial.

or +geom_line() if you'd rather.

Hope that helps,


On Tue, Nov 29, 2011 at 10:07 AM, threshold r.kozar...@gmail.com wrote:

 Dear R useres, got the following problem. Given the AggData (listed below)
 I need to plot AggData[,2] vs time (AggData[,1]) for chosen 'rows'. Ive

 plot(AggData[rows,2], xaxt='n')
 axis(1,at=seq(1,length(rows),1),sub(,, AggData[rows,1]))

 which works, but I need to list only chosen data points, say full hours or
 every 60th point, something like:

 axis(1,at=seq(1,seq(1,length(rows),60)),sub(, ,

 but does not work. Could be nice if time on the x-axis is in H:m format (no

 In the original data time bout is 1 minute, e.g. 17:19:35, 17:20:35,
 17:21:35 . Taken every 100th for brevity yields


  time value
 101  18:59:3580.97230
 201  20:39:3578.30810
 301  22:19:3580.41558
 401  23:59:3577.01051
 501  01:39:3577.19687
 601  03:19:3578.20762
 701  04:59:3577.13315
 801  06:39:3576.29110
 901  08:19:3575.32090
 1001 09:59:3585.32890
 1101 11:39:3579.86978
 1201 13:19:3583.32418
 1301 14:59:3578.26018
 1401 16:39:3579.06434

 Thanks in advance.
 Best, robert

 View this message in context:
 Sent from the R help mailing list archive at Nabble.com.

Re: [R] aggregate syntax for grouped column means

2011-11-29 Thread Justin Haynes
look at just your data that is in that first id category and I bet you can
figure it out!

var1  var2   id
10 30.79 32.15 0m11
11 30.79 32.39 0m11
12 30.94NA 0m11

aggregate performs the na.rm step on the entire row thus, a mean of 30.79.
 data.table and plyr perform the na.rm on each column.


On Tue, Nov 29, 2011 at 12:21 PM, Juliet Hannah juliet.han...@gmail.comwrote:

 I am calculating the mean of each column grouped by the variable 'id'.
 I do this using aggregate, data.table, and plyr. My aggregate results
 do not match the other two, and I am trying to figure out what is
 incorrect with my syntax. Any suggestions? Thanks.

 Here is the data.

 myData - structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61,
 30.59, 30.84, 30.98, 30.79, 30.79, 30.94, 31.08, 31.27, 31.11,
 30.42, 30.37, 30.29, 30.06, 30.3, 30.43, 30.61, 30.64, 30.75,
 30.39, 30.1, 30.25, 31.55, 31.96, 31.87, 30.29, 30.15, 30.37,
 29.59, 29.52, 28.96, 29.69, 29.58, 29.52, 30.21, 30.3, 30.25,
 30.23, 30.29, 30.39), var2 = c(33.78, 33.25, NA, 32.05, 32.59,
 NA, 32.24, NA, NA, 32.15, 32.39, NA, 32.4, 31.6, NA, 30.5, 30.66,
 NA, 30.6, 29.95, NA, 31.24, 30.73, NA, 30.51, 30.43, 31.17, 31.44,
 31.17, 31.18, 31.01, 30.98, 31.25, 30.44, 30.47, NA, 30.47, 30.56,
 NA, 30.6, 30.57, NA, 31, 30.8, NA), id = c(0m4, 0m4, 0m4,
 0m5, 0m5, 0m5, 0m6, 0m6, 0m6, 0m11, 0m11, 0m11,
 0m12, 0m12, 0m12, 205m1, 205m1, 205m1, 205m4, 205m4,
 205m4, 205m5, 205m5, 205m5, 205m6, 205m6, 205m6,
 205m7, 205m7, 205m7, 600m1, 600m1, 600m1, 600m3,
 600m3, 600m3, 600m4, 600m4, 600m4, 600m5, 600m5,
 600m5, 600m7, 600m7, 600m7)), .Names = c(var1, var2,
 id), row.names = c(NA, -45L), class = data.frame)

   var1  var2  id
 1 31.59 33.78 0m4
 2 32.21 33.25 0m4
 3 31.78NA 0m4
 4 31.34 32.05 0m5
 5 31.61 32.59 0m5
 6 31.61NA 0m5

 results1 - aggregate(. ~  id ,data=myData,FUN=mean,na.rm=T)
 #id  var1  var2
 # 1 0m11 30.79 32.27

 mydt - data.table(myData)
 results2 - mydt[,lapply(.SD,mean,na.rm=TRUE),by=id]
 #   id  var1  var2
 # [1,] 0m11 30.84 32.27

 results3 - ddply(myData,.(id),colwise(mean),na.rm=TRUE)
 #id  var1  var2
 # 1 0m11 30.84 32.27

 R version 2.14.0 (2011-10-31)
 Platform: i386-pc-mingw32/i386 (32-bit)

 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
 States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] plyr_1.6 data.table_1.7.3

Re: [R] tip: large plots

2011-11-18 Thread Justin Haynes
Very cool.  Sadly, as far as I can tell, it doesn't work with ggplot though

   user  system elapsed
  0.824   0.012   0.845
   user  system elapsed
 33.422   0.016  33.545
   user  system elapsed
 45.142   0.228  45.687
   user  system elapsed
 47.483   1.060  49.040
   user  system elapsed
 44.807   0.689  45.710

On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi all,

 I'm working with a bunch of large graphs, and stumbled across
 something useful. Probably many of you know this, but I didn't and so
 others might benefit.

 Using pch=. speeds up plotting considerably over using symbols.

  x - runif(100)
  y - runif(100)
  system.time(plot(x, y, pch=.))
   user  system elapsed
  1.042   0.030   1.077
  system.time(plot(x, y))
   user  system elapsed
  37.865   0.033  38.122

 If you have enough points, the result is also more legible.

 Choice of which pch symbol makes a difference too, the default pch=1 being
 the slowest of what I tried, but . is by far the speediest.

  system.time(plot(x, y, pch=0))
   user  system elapsed
  11.191   0.011  11.270
  system.time(plot(x, y, pch=1))
   user  system elapsed
  38.024   0.008  38.245
  system.time(plot(x, y, pch=2))
   user  system elapsed
  14.140   0.027  14.270
  system.time(plot(x, y, pch=3))
   user  system elapsed
  15.696   0.011  15.799
  system.time(plot(x, y, pch=4))
   user  system elapsed
  18.770   0.007  18.888

 This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I
 haven't tried it on any other OS, but it's making my life a lot
 smoother right now.


 Sarah Goslee

Re: [R] tip: large plots

2011-11-18 Thread Justin Haynes
That is a function I did not know about, thanks Hadley!

I still don't see the speed increase that you do with the base plot
package, but I'm sticking with ggplot anyway!

   user  system elapsed
 42.234   0.520  43.061
   user  system elapsed
 32.370   0.204  33.868

On Fri, Nov 18, 2011 at 12:39 PM, Hadley Wickham had...@rice.edu wrote:

 You need: system.time(print(qplot(x,y,pch=I('.'


 On Fri, Nov 18, 2011 at 1:30 PM, Justin Haynes jto...@gmail.com wrote:
  Very cool.  Sadly, as far as I can tell, it doesn't work with ggplot
user  system elapsed
   0.824   0.012   0.845
user  system elapsed
   33.422   0.016  33.545
user  system elapsed
   45.142   0.228  45.687
user  system elapsed
   47.483   1.060  49.040
user  system elapsed
   44.807   0.689  45.710
  On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.com
  Hi all,
  I'm working with a bunch of large graphs, and stumbled across
  something useful. Probably many of you know this, but I didn't and so
  others might benefit.
  Using pch=. speeds up plotting considerably over using symbols.
   x - runif(100)
   y - runif(100)
   system.time(plot(x, y, pch=.))
user  system elapsed
   1.042   0.030   1.077
   system.time(plot(x, y))
user  system elapsed
   37.865   0.033  38.122
  If you have enough points, the result is also more legible.
  Choice of which pch symbol makes a difference too, the default pch=1
  the slowest of what I tried, but . is by far the speediest.
   system.time(plot(x, y, pch=0))
user  system elapsed
   11.191   0.011  11.270
   system.time(plot(x, y, pch=1))
user  system elapsed
   38.024   0.008  38.245
   system.time(plot(x, y, pch=2))
user  system elapsed
   14.140   0.027  14.270
   system.time(plot(x, y, pch=3))
user  system elapsed
   15.696   0.011  15.799
   system.time(plot(x, y, pch=4))
user  system elapsed
   18.770   0.007  18.888
  This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I
  haven't tried it on any other OS, but it's making my life a lot
  smoother right now.
  Sarah Goslee
  R-help@r-project.org mailing list
  PLEASE do read the posting guide
  and provide commented, minimal, self-contained, reproducible code.
 [[alternative HTML version deleted]]
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University

Re: [R] apply on rows and columns?

2011-11-16 Thread Justin Haynes
To expand on what Sarah and Michael said:

if you have a 3d array:

, , 1

 [,1] [,2]

, , 2

 [,1] [,2]

, , 3

 [,1] [,2]

, , 4

 [,1] [,2]

 [,1] [,2]
[1,]4   12
[2,]8   16

a margin of c(1,2) makes more sense.  Hope that clarifies things.


On Wed, Nov 16, 2011 at 12:18 PM, Sarah Goslee sarah.gos...@gmail.com wrote:

 On Wed, Nov 16, 2011 at 3:13 PM,  rkevinbur...@charter.net wrote:

 I have the following scenario:

 m - matrix(1:4, ncol=2)
      [,1] [,2]
 [1,]    1    3
 [2,]    2    4
 apply(m, 2, sum)
 [1] 3 7
 apply(m, 1, sum)
 [1] 4 6

 So I can apply to rows *or* columns. According to the documentation

 MARGIN a vector giving the subscripts which the function will be applied
 over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2)
 indicates rows and columns. Where X has named dimnames, it can be a
 character vector selecting dimension names.

 But I get the following results:

 apply(m, c(1,2), sum)
      [,1] [,2]
 [1,]    1    3
 [2,]    2    4

 How am I to interpret this result?

 I'm pretty sure R is taking the sum of m[1,1] and putting it [1,1],
 and the sum of m[1,2] and putting it in [1,2] and so on. You
 instructed apply() to work on rows and columns *simultaneously*,
 rather than sequentially.

 apply() on c(1,2) is useful if you have a matrix that's three-dimensional,
 but not so much if it's two dimensional.

 What are you trying to accomplish?


 Sarah Goslee

 R-help@r-project.org mailing list
R-help@r-project.org mailing list
Re: [R] Extract pattern from string

2011-11-15 Thread Justin Haynes
take a look at the structure of what Sys.time returns.


and now at ?strptime!

[1] 15-09-55-55

[1] 2011
[1] 11

Hope that helps,


On Tue, Nov 15, 2011 at 9:48 AM, syrvn ment...@gmx.net wrote:

 with Sys.time() you get the following string:

 2011-11-15 16:25:55 GMT

 How can I extract the following substrings:

 year - 2011

 month - 11

 day_time - 15_16_25_55



 View this message in context: 
 Sent from the R help mailing list archive at Nabble.com.

 R-help@r-project.org mailing list
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Create design matrix

2011-11-03 Thread Justin Haynes

  Var1 Var2


On Thu, Nov 3, 2011 at 10:56 AM, Bond, Stephen stephen.b...@cibc.com wrote:
 Greetings useRs,

 What is the easiest way to create a design matrix of several factor 
 variables? Function gendata in Design seems to do that for a fitted model, 
 but how to do that only on several factor vectors??

 The result should be a df with one row for each distinct combination of 
 levels of factors eg for (M,F) (Y,O)
 We get
 M Y
 M O
 F Y
 F O

 In reality I will have more than 1000 rows so doing by hand not good.
 Maybe there is a way with outer, but I couldn't see it.
 All the best to everybody.


R-help@r-project.org mailing list
[R] mysterious warning message regarding bytecode...

2011-11-02 Thread Justin Haynes
While running a long script which source()s other scripts I get the
following warning:

Warning message:
In t(object$S[[1]]) : bytecode version mismatch; using eval

I cannot replicate it if I run the sourced files line by line though...

What is that error?  And do I care about it?  It doesn't seem to
affect my output as far as I can tell.


R version 2.13.2 (2011-09-30)
Platform: x86_64-pc-linux-gnu (64-bit)


attached base packages:
[1] grid  stats graphics  grDevices utils datasets
methods   base

other attached packages:
 [1] mgcv_1.7-9stringr_0.5   RPostgreSQL_0.2-0 biglm_0.8
  DBI_0.2-5 doMC_1.2.3multicore_0.1-7
 [8] foreach_1.3.2 codetools_0.2-8   iterators_1.0.5
cairoDevice_2.19  pixmap_0.4-11 gridExtra_0.8.5   splancs_2.01-29
[15] sp_0.9-91 ellipse_0.3-5 ggplot2_0.8.9
proto_0.3-9.2 reshape_0.8.4 plyr_1.6  MASS_7.3-14

loaded via a namespace (and not attached):
[1] compiler_2.13.2 digest_0.5.1lattice_0.19-33 Matrix_1.0-1nlme_3.1-102

Re: [R] factor level issue after subsetting

2011-11-01 Thread Justin Haynes
first of all, the subsetting line is overly complicated.


will work just fine.  R does exactly what you're describing.  It knows
the levels of the factor.  Once you remove 'cont' from the data, that
doesn't mean that the level is removed from the factor:

'data.frame':   100 obs. of  2 variables:
 $ let: Factor w/ 5 levels a,b,c,d,..: 1 5 1 4 3 5 2 2 1 3 ...
 $ num: num  0.224 -0.523 0.974 -0.268 -0.61 ...

'data.frame':   82 obs. of  2 variables:
 $ let: Factor w/ 5 levels a,b,c,d,..: 5 4 3 5 2 2 3 3 5 3 ...
 $ num: num  -0.523 -0.268 -0.61 -1.383 -0.193 ...

[1] e d c b
Levels: a b c d e

[1] e d c b
Levels: e d c b

 Factor w/ 4 levels e,d,c,b: 1 2 3 1 4 4 3 3 1 3 ...

by redefining your factor you can eliminate the problem.  the other
option, if you don't want factors to begin with is:

options(stringsAsFactors=FALSE)  # to set the global option


dat-read.csv(~/MyFiles/data.csv,stringsAsFactors=FALSE)  # to set
the option locally for this single read.csv call.

On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan
stefan.schrei...@ales.ualberta.ca wrote:
 Dear list,

 I cannot figure out why, after sub-setting my data, that particular item
 which I don't want to plot is still in the newly created subset (please
 see example below). R somehow remembers what was in the original data
 set. A work around is exporting and importing the new subset. Then it's
 all fine; but I don't like this idea and was wondering what am I missing


 P.S. I am using R 2.13.2 for Mac.

 [1] factor
   treat yield
 1   cont  98.7
 2   cont  97.2
 3   cont  96.1
 4   cont  98.1
 5     10 103.0
 6     10 101.3
 7     10 102.1
 8     10 101.9
 9     30 121.1
 10    30 123.1
 11    30 119.7
 12    30 118.9
 13    60 109.9
 14    60 110.1
 15    60 113.1
 16    60 112.3
 [1] factor
   treat yield
 5     10 103.0
 6     10 101.3
 7     10 102.1
 8     10 101.9
 9     30 121.1
 10    30 123.1
 11    30 119.7
 12    30 118.9
 13    60 109.9
 14    60 110.1
 15    60 113.1
 16    60 112.3

Re: [R] reshape2: Lost Values Between melt() and dcast()

2011-10-31 Thread Justin Haynes
The reason dcast would give that warning (not a failure) is if the
formula you gave did not specify unique values.  Thus, dcast needs an
aggregating function, which defaults to length.

However, the dcast calls that failed can be helpful for determining
the source of your error.  I'd look at the outputs of those two dcast
calls and find cells where the length is  1.  Those are duplicated
entries in your initial data.frames (when I've run into this is was
usually due to NA values somewhere unexpected).

Hope that clarifies things.


On Mon, Oct 31, 2011 at 9:32 AM, Rich Shepard rshep...@appl-ecosys.com wrote:
  Working with 5 subset streams from my source data frame, three of them
 successfully call dcast(), but two fail:

 jerritt.cast - dcast(jerritt.melt, site + sampdate ~ param)
 Aggregation function missing: defaulting to length


 winters.cast - dcast(winters.melt, site + sampdate ~ param)
 Aggregation function missing: defaulting to length

  Yet both data frames have the values in their .melt data frames:

      site         sampdate              param       variable
  JCM-1  :2178   Min.   :1978-03-28   pH     : 292   quant:7519
  JCM-20A:2149   1st Qu.:1996-05-24   As     : 286
  JC-E   : 476   Median :2000-05-31   SO4    : 271
  JC     : 400   Mean   :2001-02-04   TDS    : 271
  GD-1   : 395   3rd Qu.:2006-05-31   Cl     : 253
  JC-2   : 349   Max.   :2009-12-30   Zn     : 250
  (Other):1572                        (Other):5896
  Min.   :    0.000
  1st Qu.:    0.005
  Median :    0.650
  Mean   :  317.588
  3rd Qu.:   27.000
  Max.   :20450.000
  NA's   : 2134.000


      site        sampdate              param      variable
  WC     :601   Min.   :1987-07-23   As     : 96   quant:1189
  WC-2   :327   1st Qu.:1994-06-15   TDS    : 79
  WC-1   :261   Median :1995-07-27   NO3-N  : 74
  BC-0.5 :  0   Mean   :1997-05-15   pH     : 72
  BC-1   :  0   3rd Qu.:1996-07-29   SO4    : 69
  BC-1.5 :  0   Max.   :2011-06-06   Cl     : 64
  (Other):  0                        (Other):735
  Min.   :   0.00
  1st Qu.:   0.05
  Median :   7.59
  Mean   :  79.20
  3rd Qu.:  75.00
  Max.   :2587.00
  NA's   : 252.00

  What might be causing dcast() to fail with these two data frames while it
 succeeds with three others processed using the same syntax? If additional
 information would help, let me know and I'll provide it.



Re: [R] Replacing matching values by related values

2011-09-18 Thread Justin Haynes
in your assignment for t3 you use nt which is undefined.  thus t.n$treatment
is NAs




should get you started

On Sun, Sep 18, 2011 at 7:56 AM, Janssen, K.J.M. 
k.j.m.jans...@umcutrecht.nl wrote:

 Apologies, I wanted to make life easier by shortly describing my problem.
 Indeed, it is better to post the full code.
 I am not familiar with the dput, but I have pasted the code that I have
 used below.

 d - matrix(NA,15,5)
 d - as.data.frame(d)

 colnames(d) - c(studynumber,t1,t2,t[,1],t[,2])

 d$studynumber - c(1:15)# add study numbers to select
 studies in scenarios

 # Link treatment to relating treatment number: make vector of all unique
 treatment options
 t1 - duplicated(c(d$t1,d$t2)) # returns TRUE and False, implying that we
 can need it so select
 t2 - c(d$t1,d$t2) # combine both vectors, as treatments can be both
 reference as index treatment
 t3 - na.omit(ifelse(t1==FALSE,c(d$t1,d$t2),NA))[1:nt] # omit double

 #make dataset with first colomn all possible treatments, and second colomn
 their respective numbers
 t.n - matrix(NA,17,2)  # list possible treatments (here 17), and
 link them to numbers
 t.n - as.data.frame(t.n)
 colnames(t.n) - c(treatment,numbers)
 t.n$treatment - t3
 t.n$numbers - 1:17

 # link treatments in d with treatment numbers in dataset t.n

 Here is where I aim to fill d$t[,1] and d$t[,2] with the corresrponding
 numbers from t.n



 -Oorspronkelijk bericht-
 Van: David Winsemius [mailto:dwinsem...@comcast.net]
 Verzonden: zo 18-9-2011 15:20
 Aan: Janssen, K.J.M.
 CC: michael.weyla...@gmail.com; r-help@r-project.org
 Onderwerp: Re: [R] Replacing matching values by related values

 On Sep 18, 2011, at 3:56 AM, Janssen, K.J.M. wrote:

  Thanks Michael.
  I tested it and it works for numeric values, but not for the 'text'
  values that I am comparing, thus comparing a with a,b, etc.
  Any advice how I can solve it?

 Solve what? You never posted full working code and an explicit
 example. Unless there were actually objects named a, b, c, etc.
 that d[,2] was actually letters[1:17] rather than what you wrote. It's
 especially important to indicate whehte ryou have attached any objects.

 Post dput(head(d)) and dput(v) for the example part and include any
 code use to construct them.


  -Oorspronkelijk bericht-
  Van: R. Michael Weylandt michael.weyla...@gmail.com [mailto:
  Verzonden: zo 18-9-2011 2:27
  Aan: Janssen, K.J.M.
  CC: r-help@r-project.org
  Onderwerp: Re: [R] Replacing matching values by related values
  Try playing with match(). Something like
  Should work (untested bc I'm writing from my phone though)
  Michael Weylandt
  On Sep 17, 2011, at 4:33 PM, Janssen, K.J.M. 
  I am trying to replace values of a vector (consisting of 15 values)
  by a value that is related to a matching value in a dataset
  (consisting of 17 rows).
  Here's an example
  The vector:
  v - c(f,a,e,d,m,o,e,f,i,n,e,i,b,a,o)
  The dataset's columns consist of the following values
  d[,1] - c(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q)
  d[,2] - 1:17
  So I want to end up with a vector that consists of the values of
  the second colomn, when the value of the vector matches the value
  of the first colomn.
  Thus, I aim to end up with a vector with the following values
  Help is appreciated!
  This message may contain confidential information and is...

2011-09-16 Thread Justin Haynes
you want

options(width= )

you can edit your .Rprofile file and the .First function in there to set it
when you start R or in the console interactively

On Fri, Sep 16, 2011 at 12:48 PM, Mike P mike.polya...@gmail.com wrote:


 I want to apologize in advance if this has already been asked. I
 wasn't able to find any information, either on google or from local
 list search.

 I'm running an R shell from a linux command line, in an xterm window.
 Whenever I print a data frame, only the first couple of columns are
 printed side-by-side, the others are being repositioned below them. It
 seems something is limiting the line width of the output, even though
 there is enough horizontal space to fit each row on a single line.

 For example, this command:


 prints columns 1-21 on the first line, and the rest 22-30 on the second.

 Is there a way I can configure R to increase the width of my output?


Re: [R] map

2011-09-13 Thread Justin Haynes
i responded offline the first time, but:

google is your friend:  search for R maps and you'll find what I mention

In the future make sure to perform a thorough search of google and the help
forums before you post

That said... you're looking for the maps package


ggplot2 package has a function called map_data that extracts the lines if
you want the actual data, see the example hadley provided ?ggplot2::map_data

hope that helps,


On Tue, Sep 13, 2011 at 8:48 AM, Batur swordligh...@gmail.com wrote:

 Adding to the previous question, I would like to map central Asia along
 those five countries (Kazakhstan, Kyrgyzstan, Uzbekstan, Tajikstan and
 Turkmenstan). Please tell us the right data base!!! Thanks a lot!!!

 View this message in context:
 Sent from the R help mailing list archive at Nabble.com.

Re: [R] reshaping data

2011-09-07 Thread Justin Haynes
look at the melt function in reshape, specifically ?melt.data.frame


there is an additional feature in the melt function for handling na values.

  Year MonthCO2
1 1958 J NA
2 1959 J 315.58
3 1960 J 316.43
4 1961 J 316.89
5 1962 J 317.94
6 1963 J 318.74

you can order your data.frame if you'd like


Year MonthCO2
1   1958 J NA
48  1958 F NA
95  1958 M 315.71
142 1958 A 317.45
189 1958   M.1 317.50
236 1958   J.1 NA

On Wed, Sep 7, 2011 at 7:35 AM, B77S bps0...@auburn.edu wrote:

 I have the following data (see RawData using dput below)

 How do I get it in the following 3 column format (CO2 measurements are the
 elements of the original data frame).  I'm sure the package reshape is
 I should look, but I haven't figured out how.

 Thanks ahead of time

  Month Year CO2
 J   1958
 F   1958
 M   1958315.71
 A   1958317.45
 M.1 1958317.5
 J.1 1958
 J.2 1958315.86
 A.1 1958314.93
 S   1958313.19
 O   1958
 N   1958313.34
 D   1958314.67
 J   1959315.58
 F   1959316.47

 # here is the data

 RawData - structure(list(Year = c(1958, 1959, 1960, 1961, 1962, 1963,
 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975,
 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,
 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
 1998, 1999, 2000, 2001, 2002, 2003, 2004), J = c(NA, 315.58,
 316.43, 316.89, 317.94, 318.74, 319.57, 319.44, 320.62, 322.33,
 322.57, 324, 325.06, 326.17, 326.77, 328.54, 329.35, 330.4, 331.74,
 332.92, 334.97, 336.23, 338.01, 339.23, 340.75, 341.37, 343.7,
 344.97, 346.29, 348.02, 350.43, 352.76, 353.66, 354.72, 355.98,
 356.7, 358.36, 359.96, 362.05, 363.18, 365.32, 368.15, 369.14,
 370.28, 372.43, 374.68, 376.79), F = c(NA, 316.47, 316.97, 317.7,
 318.56, 319.08, NA, 320.44, 321.59, 322.5, 323.15, 324.42, 325.98,
 326.68, 327.63, 329.56, 330.71, 331.41, 332.56, 333.42, 335.39,
 336.76, 338.36, 340.47, 341.61, 342.52, 344.51, 346, 346.96,
 348.47, 351.72, 353.07, 354.7, 355.75, 356.72, 357.16, 358.91,
 361, 363.25, 364, 366.15, 368.87, 369.46, 371.5, 373.09, 375.63,
 377.37), M = c(315.71, 316.65, 317.58, 318.54, 319.69, 319.86,
 NA, 320.89, 322.39, 323.04, 323.89, 325.64, 326.93, 327.18, 327.75,
 330.3, 331.48, 332.04, 333.5, 334.7, 336.64, 337.96, 340.08,
 341.38, 342.7, 343.1, 345.28, 347.43, 347.86, 349.42, 352.22,
 353.68, 355.39, 357.16, 357.81, 358.38, 359.97, 361.64, 364.03,
 364.57, 367.31, 369.59, 370.52, 372.12, 373.52, 376.11, 378.41
 ), A = c(317.45, 317.71, 319.03, 319.48, 320.58, 321.39, NA,
 322.13, 323.7, 324.42, 325.02, 326.66, 328.13, 327.78, 329.72,
 331.5, 332.65, 333.31, 334.58, 336.07, 337.76, 338.89, 340.77,
 342.51, 343.56, 344.94, 347.08, 348.35, 349.55, 350.99, 353.59,
 355.42, 356.2, 358.6, 359.15, 359.46, 361.26, 363.45, 364.72,
 366.35, 368.61, 371.14, 371.66, 372.87, 374.86, 377.65, 380.52
 ), M.1 = c(317.5, 318.29, 320.03, 320.58, 321.01, 322.24, 322.23,
 322.16, 324.07, 325, 325.57, 327.38, 328.07, 328.92, 330.07,
 332.48, 333.09, 333.96, 334.87, 336.74, 338.01, 339.47, 341.46,
 342.91, 344.13, 345.75, 347.43, 348.93, 350.21, 351.84, 354.22,
 355.67, 357.16, 359.34, 359.66, 360.28, 361.68, 363.79, 365.41,
 366.79, 369.29, 371, 371.82, 374.02, 375.55, 378.35, 380.63),
J.1 = c(NA, 318.16, 319.59, 319.78, 320.61, 321.47, 321.89,
321.87, 323.75, 324.09, 325.36, 326.7, 327.66, 328.57, 329.09,
332.07, 332.25, 333.59, 334.34, 336.27, 337.89, 339.29, 341.17,
342.25, 343.35, 345.32, 346.79, 348.25, 349.54, 351.25, 353.79,
355.13, 356.22, 358.24, 359.25, 359.6, 360.95, 363.26, 364.97,
365.62, 368.87, 370.35, 371.7, 373.3, 375.4, 378.13, 379.57
), J.2 = c(315.86, 316.55, 318.18, 318.58, 319.61, 319.74,
320.44, 321.21, 322.4, 322.55, 324.14, 325.89, 326.35, 327.37,
328.05, 330.87, 331.18, 331.91, 333.05, 334.93, 336.54, 337.73,
339.56, 340.49, 342.06, 343.99, 345.4, 346.56, 347.94, 349.52,
352.39, 353.9, 354.82, 356.17, 357.03, 357.57, 359.55, 361.9,
363.65, 364.47, 367.64, 369.27, 370.12, 371.62, 374.02, 376.62,
377.79), A.1 = c(314.93, 314.8, 315.91, 316.79, 317.4, 317.77,
318.7, 318.87, 320.37, 320.92, 322.11, 323.67, 324.69, 325.43,
326.32, 329.31, 329.4, 330.06, 330.94, 332.75, 334.68, 336.09,
337.6, 338.43, 339.82, 342.39, 343.28, 344.69, 345.91, 348.1,
350.44, 351.67, 352.91, 354.03, 355, 355.52, 357.49, 359.46,
361.49, 362.51, 365.77, 366.94, 368.12, 369.55, 371.49, 374.5,
375.86), S = c(313.19, 313.84, 314.16, 314.99, 316.26, 316.21,
316.7, 317.81, 318.64, 319.26, 320.33, 322.38, 323.1, 323.36,
324.84, 327.51, 327.44, 328.56, 329.3, 331.58, 332.76, 333.91,
335.88, 336.69, 

Re: [R] Fitting my data to a Weibull model

2011-08-31 Thread Justin Haynes
This is what I use...

  est-fitdistr(x$wind_speed, 'weibull')$estimate

feel free to correct me if this is wrong!


On Wed, Aug 31, 2011 at 6:21 AM, Dennis Murphy djmu...@gmail.com wrote:


 Things work if x is the response and y is the covariate. To use the
 approach I describe below, you need RStudio and its manipulate package
 (which is only available in RStudio - you won't find it on CRAN). You
 can download and install RStudio freely from http://rstudio.org/ ; it
 is available for Windows, Linux and Mac. To quote an old TV commercial
 line in the US: 'Try it, you'll like it' :)

 In the script below, the covariate has to be named x since the script
 calls the curve() function, which plots a mathematical function of a
 single variable named x. As a result, you need to interchange the
 names of your vectors. Within RStudio, copy and paste the following in
 chunks; in particular, copy and paste the code starting with
 'manipulate('  and ending in ')' to generate the sliders for the
 parameter estimates. The idea is to tweak the parameter values until
 you get a fitted model that fits the observed data fairly closely.
 When you achieve that, kill the slider box (upper right corner); the
 estimates at the state where the sliders are closed are then saved in
 a vector called start, which you use in the subsequent nls() call.
 After the model is fit, a sequence of x values is generated as new
 data, the predicted values at those points are computed, and a plot of
 the observed data with overlaid fitted model is produced.

 You have to be a bit careful; occasionally, you'll get an error
 Error in nls(y ~ a - b * exp(-c * x^d), start = start) :
  singular gradient
 If so, just try again with a different set of initial values, trying
 not to overdo it. You don't need to be exact, just close.

 ### Weibull model:

 x - c(1,2,3,4,10,20)
 y - c(1,7,14,25,29,30)

 ## Copy and paste the code chunk below into RStudio,
 ## stopping with the line of hash marks
 start - list()
 # Generate sliders to find good initial parameter estimates
   plot(y ~ x)
   a - a0; b - b0; c - c0; d - d0
   curve(a-b*exp(-c*x^d), add=TRUE)
   start - list(a=a, b=b, c=c, d=d)
  a0 = slider(10, 50, step=0.1, initial = 30),
  b0 = slider(0, 100, step=1, initial = 3),
  c0 = slider(0, 0.1, step=0.01, initial = 0.01),
  d0 = slider(0, 10, step=0.1, initial = 5)
 ## Stop here ##

 # Fit the model using the estimates from the sliders
 weibm - nls(y ~ a-b*exp(-c*x^d), start = start)

 # Make predictions over a sequence of x values and plot
 ndata - data.frame(x = seq(0, 20, by = 0.1))
 wpred - predict(weibm, newdata = ndata)
 plot(y ~ x, pch = 16)
 lines(ndata$x, wpred, col = 'red')

 ### Logistic:

 start - list()
plot(y ~ x)
a - a0; b - b0; d - d0
curve(a/(1+b*exp(-d*x)), add=TRUE)
start - list(a=a, b=b, d=d)
  a0 = slider(0, 50, step = 1, initial = 30),
  b0 = slider(0, 20, step = 0.1, initial = 10),
  d0 = slider(0, 1, step = 0.01, initial = 0.1)

 logism - nls(y ~ a/(1+b*exp(-d*x)), start = start)

 ldata - data.frame(x = seq(0, 20, by = 0.1))
 lpred - predict(weibm, newdata = ndata)
 plot(y ~ x, pch = 16)
 lines(ldata$x, lpred, col = 'red')

 This is a good exercise to learn how the various parameters affect the
 shape of the curve associated with a particular nonlinear model in one
 variable. It also helps to read about the model in question and
 understand the interpretation associated with each of the parameters.
 That way, you can use the sliders to visualize the effects of changes
 in one parameter when the others are held constant. If you find that
 the boundaries of the sliders are too restrictive, you can always
 reset them and try again. The code above came about from a few
 iterations of tweaking ranges for individual parameters (either wider
 or narrower as the case may be). I always keep the code in an editor
 so that it's easy to change, then copy and paste into the R console.
 If you redo the slider fitting, it's easier to reset the start vector,

 You'll also notice that one parameter in each of the fitted models is
 nonsignificant, but you need to take into account that you're fitting
 models with three or four parameters to six data points.

 Aside: If you really meant to use y and x as response and covariate,
 respectively, in your posted data example, the sliders will show you
 that the two models are way off the mark, since y would start out
 slowly and then jump exponentially. That would require a completely
 different nonlinear model. You'll also notice that the estimates of a
 and b in the Weibull model are an order 

[R] lubridate and intervals

2011-08-30 Thread Justin Haynes

maybe there is a native R function for this and if so please let me know!

I have 2 data.frames with start and end dates, they read in as strings and I
am converting to POSIXct.  How can I check for overlap?

The end result ideally will be a single data.frame containing all the
columns of the other two with rows where there were date overlaps.

df1-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')),
end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep='')))

I tried:


[1] 2011-06-01 01:00:00 -- 2011-06-01 01:30:00
[1] 2011-06-01 01:17:00 PDT


 df2$start[1] %in% df1$interval[1]

This must be fairly straight forward and I just don't know where to look!


[[alternative HTML version deleted]]

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to referee a dimension name via a variable?

2011-08-29 Thread Justin Haynes



On Mon, Aug 29, 2011 at 12:29 PM, Jie TANG totang...@gmail.com wrote:

 hi, R-users
   I have a data.frame for example  test$newdataday24 and test$newdataday48
 I can plot them by
 but now i want to plot different data by define a variable to describe them

 but i failed,the error message said that something wrong with plot.window

 what can i do to fix my script ? thanks

[[alternative HTML version deleted]]

 R-help@r-project.org mailing list
 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] debugging functions in R

2011-08-24 Thread Justin Haynes
Another great tool is debugonce()

wrap your function name in it and then execute your function call.



And you'll be brought into the same interactive browser. (its Vi if im not
mistaken which can take a little getting used to.)


On Wed, Aug 24, 2011 at 7:29 AM, Liviu Andronic landronim...@gmail.comwrote:

 On Wed, Aug 24, 2011 at 4:20 PM, Eran Eidinger e...@taykey.com wrote:
  I am not sure if this is the right list to ask this question (though I
  not find a more appropriate one).
  I've started using R a month ago, and small scripts work fine. However,
  I start writing more complex code, it gets messy.
  1. Is there any way to debug normally, with breakpoints?


 My solution when I run into mysteries like this is to put 'browser()' in
 function just before or after the line of interest. The magnitude and
 of my stupidity usually become clear quickly.
   -- Patrick Burns
  R-help (February 2006)

 Use browser() to inspect the environment and execute the code one step
 at a time.

 2. I am using the Eclipse plugin (StatET), and tried JGR(). Is there an
  that enables breakpoints?
  3. Is there an equivalent to include in other programming languages? So
  many functions in one file are very messy. I would like to break it to
  several files.
  4. Any way to create a local context of variables inside a function?
  Otherwise I have to be careful to give different names inside functions,
  those in the workspace.
  I should point that I am a long time Matlab user and am probably
  some things that don't necessarily exist in R...
  I know it's a lot, if there is a more appropriate forum to ask these,
  point me in that direction.
 [[alternative HTML version deleted]]
  R-help@r-project.org mailing list
  PLEASE do read the posting guide
  and provide commented, minimal, self-contained, reproducible code.

 Do you know how to read?
 Do you know how to write?

 R-help@r-project.org mailing list
 PLEASE do read the posting guide
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.numeric() and POSIXct format

2011-08-24 Thread Justin Haynes
[1] 2001-01-07 PST

[1] 2001-01-07 08:00:00 PST

On Wed, Aug 24, 2011 at 9:22 AM, Agustin Lobo agustin.l...@ija.csic.eswrote:


 I'm confused by this:
 [1] 978822000

 I guess the problem is that as.numeric() assumes a different origin, but
 cannot find
 any default origin.

 How can I get back the seconds from the POSIXct format? In other words,
 which the inverse function of as.POSIXct()?
 I've tried as.numeric and unclass() using a origin= argument, but this does
 not work.



 Dr. Agustin Lobo
 Institut de Ciencies de la Terra Jaume Almera (CSIC)
 LLuis Sole Sabaris s/n
 08028 Barcelona
 Tel. 34 934095410
 Fax. 34 934110012
 email: agustin.l...@ija.csic.es

 R-help@r-project.org mailing list
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] subsetting a list of matrices

2011-08-23 Thread Justin Haynes
His is better, but you can also use a for loop...

for(i in 1:3){
  if(l[[i]][3]=='Message 1') {
  } else {

but shouldn't if your list is very long

On Tue, Aug 23, 2011 at 9:35 AM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 subset(as.data.frame(do.call(rbind, lapply(l, [, , 1))), row3 == Message

 On Tue, Aug 23, 2011 at 1:28 PM, Lara Poplarski larapoplar...@gmail.com
  Hi all,
  I have an object that looks (roughly) like the following:
  l - list(a = matrix(rnorm(9), 3), b = matrix(rnorm(9), 3), c =
  matrix(rnorm(9), 3))
  l$a[3,] - sample(c(Message 1, Message 2, Message 3))
  l$b[3,] - sample(c(Message 1, Message 2, Message 3))
  l$c[3,] - sample(c(Message 1, Message 2, Message 3))
  rownames(l$a) - rownames(c(1:3), do.NULL = FALSE, prefix = row)
  rownames(l$b) - rownames(c(1:3), do.NULL = FALSE, prefix = row)
  rownames(l$c) - rownames(c(1:3), do.NULL = FALSE, prefix = row)
  colnames(l$a) - c(V1, V2, V3)
  colnames(l$b) - c(V1, V2, V3)
  colnames(l$c) - c(V1, V2, V3)
  I want to extract values (row1, V1) for the three sublists a, b, c,
  but only for those cases in which row3 == Message 1. Could someone
  suggest how to proceed?
  Many thanks in advance,
  R-help@r-project.org mailing list
  PLEASE do read the posting guide
  and provide commented, minimal, self-contained, reproducible code.

 Henrique Dallazuanna
 25° 25' 40 S 49° 16' 22 O

Re: [R] ddply - how to transform df column in place

2011-08-23 Thread Justin Haynes

Ista is right, but:

In your function you are asking as.Date to convert the whole data.frame df
rather than just your daterep column.

out-ddply(d2, .(daterep), function(df)
'data.frame':30 obs. of  2 variables:
 $ daterep: num  20100801 20100802 20100803 20100804 20100805 ...
 $ V1 : Date, format: 2010-08-01 2010-08-02 2010-08-03
2010-08-04 ...

On Tue, Aug 23, 2011 at 3:16 PM, jjap jean.plamon...@fpinnovations.cawrote:

 Dear R-users,

 I am trying to get the plyr syntax right, without much success.

 d- data.frame(cbind(x=1,y=seq(20100801,20100830,1)))
 names(d)-c(first, daterep)

 # I can convert the daterep column in place the classic way:
 d$daterep-as.Date(strptime(d$daterep, format=%Y%m%d))

 # How to do it the plyr way?
 ddply(d2, c(daterep), function(df){as.Date(df, format=%Y%m%d)})
 # returns: Error in as.Date.default(df, format = %Y%m%d) :
 #   do not know how to convert 'df' to class Date

 Thanks for any hints,


 View this message in context:
 Sent from the R help mailing list archive at Nabble.com.

Re: [R] Help: Sort components of a vector with indices tracked in R

2011-08-23 Thread Justin Haynes
If you make your vector a data.frame, you will have row numbers accompanying
your sorting



also, you shouldn't use c as a variable name since its an important R

see your example :)


On Tue, Aug 23, 2011 at 4:59 PM, Chee Chen chee.c...@yahoo.com wrote:

 Dear All,
 I would like to know how to sort a vector of numeric values such that we
 know the original index of each ordered component. Say, we have
 c - c(1,4,3,2)
 csort - sort(c,descreasing=FALSE)
 With a few components of c, we can manually find out:
  csort[1] = 1 = c[1], ie, the original index of csort[1] is 1,
 csort[2] =2 =c[4], ie, the original index of csort[2] is 4.

 When length(c) is very large, manual checking is infeasible.
 We can set up a for loop to compare and extract the index. However, is
 there an easier way to do this, so that the output is the sorted vector and
 their corresponding original indices.
[[alternative HTML version deleted]]

[R] ggplot in a function confusion!

2011-08-15 Thread Justin Haynes
Whats going on here?


ggplot()+geom_point(data=df,aes(x=x,y=y))  ## this is the normal usage

ggplot()+geom_point(data=df,aes(x=df[,1],y=df[,2]))  ## but I can also feed
it column indices
ggplot()+geom_point(aes(x=df[,'x'],y=df[,'y']))  ## or column names.

## but if i wrap it in a function...

print(ggplot() + geom_point(aes(x=dff[,x.var],y=dff[,y.var])))

print(ggplot() + geom_point(data=dff,aes(x=dff[,x.var],y=dff[,y.var])))

print(ggplot() + geom_point(data=dff,aes(x=eval(x.var),y=eval(y.var

plot.func.one(df,1,2) ## i assume the dff not found error is happening in
the aes call rather than the data= portion..
plot.func.one(df,'x','y')  ## but why does it work in the global env and not
within a function?


plot.func.three(df,var.x,var.y)  ## why does it give the error on y.var
instead of x.var?


plot.func.one(dff,x.var,y.var)  ## now whats going on?  I assume this works
because ggplot is looking globally rather than within the function...

nothing seems to work right!  How do I plot within a function where I can
feed the function a data.frame and the columns I want plotted?

I assume this is some interesting name space issue but if you guys can
enlighten me as to what's going on...


P.S.  So before I sent this I dug some more and found my answer, aes_string:

print(ggplot() + geom_point(data=dff,aes_string(x=x.var,y=y.var)))


works great.  But I still wouldn't mind some clarification on what's
happening in my earlier examples.

[R] Sequential Naming of ggplot .pngs using plyr

2011-08-10 Thread Justin Haynes
If I have data:


And want to plot like this:

for(i in c('a','b','c','d')){
print(ggplot(dat[,names(dat) %in%
number',ctr,sep=' ')))

Is there a way to do the same naming using plyr (or data.table or foreach
which I am not familiar with at all!)?

print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?)

And better yet, is there a way to do it using .parallel=T?

Faceting is not really an option (unless I can facet onto multiple pages of
a pdf or something) because these need to go into reports as individually
labelled and titled plots.

As a bit of a corollary, is it really worth the headache to resolve this if
I am only using melt/plyr to split on the four letter variables? With a
larger set of data (1e6 rows), the melt/plyr version takes a significant
amount of time but .parallel=T drops the time significantly.  Is the right
answer a foreach loop and can I do that with the increasing counter? (I
haven't gotten beyond Hadley's .parallel feature in my parallel R

 system.time(for(i in c('a','b','c','d')){
+ png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
+ print(ggplot(dat[,names(dat) %in%
number',ctr,sep=' ')))
+ dev.off()
+ ctr-ctr+1
+ })
   user  system elapsed
 54.630   0.120  54.843

+ ddply(melt(dat,id.vars='site'),.(variable),function(df) {
+ print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
+ dev.off()
+ },.parallel=F)
+ )
   user  system elapsed
  58.400.13   58.63

+ ddply(melt(dat,id.vars='site'),.(variable),function(df) {
+ print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
+ dev.off()
+ },.parallel=T)
+ )
   user  system elapsed
  70.333.46   27.61

How might I speed this up and include the sequential plot names?

Thanks a bunch!


Re: [R] Sequential Naming of ggplot .pngs using plyr

2011-08-10 Thread Justin Haynes
Thanks Ista,

In my real code that is exactly what I'm doing, but I want to prepend the
names with a sequential number for easier reference once the pngs are made.

My initial thought was to add the sequential number to the data before
sending it to plyr and drawing it out there, but that seems like an
excessive extra step when I have 1e6 - 1e7 rows.


On Wed, Aug 10, 2011 at 2:42 PM, Ista Zahn iz...@psych.rochester.eduwrote:

 Hi Justin,

 On Wed, Aug 10, 2011 at 5:04 PM, Justin Haynes jto...@gmail.com wrote:
  If I have data:
  And want to plot like this:
  for(i in c('a','b','c','d')){
 print(ggplot(dat[,names(dat) %in%
  number',ctr,sep=' ')))
  Is there a way to do the same naming using plyr (or data.table or foreach
  which I am not familiar with at all!)?

 This is not the same naming, but the same general idea can be
 achieved with plyr using

  d_ply(melt(dat,id.vars='site'),.(variable),function(df) {
 png(file=paste(plyr_plot, unique(df$variable),

 I'm not up to speed on .parallel, foreach etc., so I'l leave the rest
 to someone else.

  print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?)
  And better yet, is there a way to do it using .parallel=T?
  Faceting is not really an option (unless I can facet onto multiple pages
  a pdf or something) because these need to go into reports as individually
  labelled and titled plots.
  As a bit of a corollary, is it really worth the headache to resolve this
  I am only using melt/plyr to split on the four letter variables? With a
  larger set of data (1e6 rows), the melt/plyr version takes a significant
  amount of time but .parallel=T drops the time significantly.  Is the
  answer a foreach loop and can I do that with the increasing counter? (I
  haven't gotten beyond Hadley's .parallel feature in my parallel R
  system.time(for(i in c('a','b','c','d')){
  + png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
  + print(ggplot(dat[,names(dat) %in%
  number',ctr,sep=' ')))
  + dev.off()
  + ctr-ctr+1
  + })
user  system elapsed
   54.630   0.120  54.843
  + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
  + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
  + dev.off()
  + },.parallel=F)
  + )
user  system elapsed
   58.400.13   58.63
  + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
  + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
  + dev.off()
  + },.parallel=T)
  + )
user  system elapsed
   70.333.46   27.61
  How might I speed this up and include the sequential plot names?
  Thanks a bunch!
 [[alternative HTML version deleted]]
  R-help@r-project.org mailing list
  PLEASE do read the posting guide
  and provide commented, minimal, self-contained, reproducible code.

 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology

[R] binary conversion list to data.frame with plyr... AND NO LOOPS!

2011-07-08 Thread Justin Haynes
Happy weekend helpeRs!

As usual, I'm stumped by R...

My plan was to take an integer number, convert it to binary and wind
up with a data.frame where each column is either 1 or 0 so I can see
which bits are changing:

bb-function(i) ifelse(i, paste(bb(i %/% 2), i %% 2, sep=), )

for(i in 1:length(my.list)){
for(j in 1:length(my.list[[i]])){

But this isn't exactly feasable on a million+ rows where some binary
numbers are 20 digits...  I know theres a way without loops I just
know it!

Ideally, I can do this to multiple columns of a data.frame and have
[R] rle with NA values?

2011-06-24 Thread Justin Haynes
Happy Friday!

Using this function:

fixSeq - function(df) {
  shift1 - function(x) c(1, x[-length(x)])
  repeat {
change - df.rle$values = 4  shifted.sf = 4  shifted.sf != df.rle$values
df.rle$values[change] - shifted.sf[change] else break

I would like to separate runs where the removed NAs will separate runs
into two separate runs.
to illustrate with a short example:


Error in df.rle$values[change] - shifted.sf[change] :
  NAs are not allowed in subscripted assignments

  id state state_shift
1  1 1   1
2  1 2   2
3  1 4   4
4  1 4   4
5  1 5   4
7  1 5   4
8  1 5   4
9  1 1   1

rather than the true output of 1 2 4 4 4 5 5 1.  The NA makes the
second pair of 5s a unique state rather than a continuation of the
previous state 4.  Is this best accomplished by assigning NA to a
value like -99?  or do I have other options?

R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] rle on large data . . . without a for loop!

2011-06-17 Thread Justin Haynes
I think need to do something like this:

dat-data.frame(state=sample(id=rep(1:5,each=200),1:3, 1000,
for(i in 1:length(rle.dat$length)){

to a very large dataset.  I want to apply a few summary functions to
some variables within a data.frame for given states. to complicate
things, id like to use plyr and split on the id variable before i do
any of this...

  for(i in 1:length(rle.dat$length)){

and provide commented, minimal, self-contained, reproducible code.

[R] gridExtra with cairodevie and ggplots

2011-06-14 Thread Justin Haynes
I apologise in advance for not providing code, but this seems like a
straight forward question...

I am making a few full page plots some of which are portrait and
some of which are landscape

I would like to open my cairo device once and put all the plots in the
same .pdf.  But since some
need to be rotated to fit the cairo device dimensions, is there a
simple parameter to arrangeGrob
(im using grid.arrange to generate the final plot) that will rotate
the entire output 90 degrees so all
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gridExtra with cairodevie and ggplots

2011-06-14 Thread Justin Haynes
Thats perfect, thank you!

On Tue, Jun 14, 2011 at 2:10 PM, baptiste auguie
baptiste.aug...@googlemail.com wrote:

 You can draw arrangeGrob in a rotated viewport,

 ps = replicate(4, qplot(rnorm(10), rnorm(10)), simplify=F)
 g = gTree(children=gList(do.call(arrangeGrob, ps)), vp=viewport(angle=90))

 though you get some warnings about clipping for some reason.

 Perhaps more cleanly, you can define a print.arrange method,
 (shamelessly borrowed from ggplot2),

 print.arrange = function (x, newpage = is.null(vp), vp = NULL, ...)
       if (newpage)
    if (is.null(vp)) {
    else {
        if (is.character(vp))
        else pushViewport(vp)

 print(do.call(arrangeGrob, ps), vp=viewport(angle=90))



 On 15 June 2011 08:39, Justin Haynes jto...@gmail.com wrote:
 I apologise in advance for not providing code, but this seems like a
 straight forward question...

 I am making a few full page plots some of which are portrait and
 some of which are landscape

 I would like to open my cairo device once and put all the plots in the
 same .pdf.  But since some
 need to be rotated to fit the cairo device dimensions, is there a
 simple parameter to arrangeGrob
 (im using grid.arrange to generate the final plot) that will rotate
 the entire output 90 degrees so all
 my pages can be the same direction?


 R-help@r-project.org mailing list
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[R] ragged data.frame? using plyr

2011-06-02 Thread Justin Haynes
I have a dataset that looks like:


I want to normalise it using the following function (unless you have
a better idea...):



Gives me my data.frame all nice and pretty and ready to do the following:




However, as it turns out, my data look more like:



So on different days I only have data for some of the id variables
which leads to a ragged data.frame.


can i do something like

ldply(dlply(dat,.(id.day),adj.values), function(x){put in a NA for the
places where data is missing?})

To give you a sense of where this is going, I'm eventually going to
plot the mean of each id variable over the time period vs. its IQR
(again unless you have a better idea...).

As always,

thanks for your help!


[R] count value changes in a column

2011-05-31 Thread Justin Haynes
is there a way to look for value changes in a column?


any of the five states are acceptable.  however if, for example,
states 4 or 5 follow state 3, i want to overwrite them with 3.
changes from 1 to any value and 2 to any value are acceptable as are
changes from any value to 1 or 2.

By way of an example:

the sequence 1 3 3 5 5 3 2 4 2 1 5 3 3 5

should read   1 3 3 3 3 3 2 4 2 1 5 5 5 5

Thanks for the help!


R-help@r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
2011-05-31 Thread Justin Haynes
I apologize for the confusion but that solution will work with a twist.

I want to record only the first value of a state change that goes above 2.
so if the sequence is

344455544334 it should read all 3s

but 3442555414433 should read 33321

[R] ggplot geom_boxplot vertical margins

2011-05-18 Thread Justin Haynes
If you plot:


How do I remove those pesky margins on the sides of the plot area?  Or
maybe just reduce their size to something more like the spacing of the



R-help@r-project.org mailing list
2011-05-18 Thread Justin Haynes

Thanks, I couldn't find that anywhere!

On Wed, May 18, 2011 at 1:59 PM, Felipe Carrillo
mazatlanmex...@yahoo.com wrote:
 Is this what you want? You can control how much space you
 want to see on the sides of the plot:

 ggplot(df,aes(x=x,y=y))+geom_boxplot() + scale_x_discrete(expand=c(0,0))

 Felipe D. Carrillo
 Supervisory Fishery Biologist
 Department of the Interior
 US Fish  Wildlife Service
 California, USA

 - Original Message 
 From: Justin Haynes jto...@gmail.com
 To: r-help@r-project.org
 Sent: Wed, May 18, 2011 1:51:19 PM
 Subject: [R] ggplot geom_boxplot vertical margins

 If you plot:


 How do I remove those pesky margins on the sides of the plot area?  Or
 maybe just reduce their size to something more like the spacing of the



 R-help@r-project.org mailing list
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[R] How do I break my addiction to for loops!?!?

2011-05-13 Thread Justin Haynes
I know I'm not supposed to use them... but they're just so easy! I
have trouble defining an appropriate function for plyr or apply!


So... I am currently generating a table and a geom_boxplot and squish
em together with gridExtra.  But, for columns U,V and W I want to use
group1 as my split variable and columns X, Y and Z I will use group2.
I also need to make it as flexible as possible.

What I've got now is...



for(j in 1:length(group.types)){
  for(i in 1:length(box.vars)){
index.rows-which(df[,index.group]==group.types[j]  df[,box.vars[i]]!=0)

p-p+geom_boxplot()+labs(x='Machine ID',y=names(df[box.vars[i]]))



# + misc. gridExtra lines

Currently I hard code the box.vars and index.group which is ok with
me, but the for loops should be in a fancy function.  Anyway, im sure
theres an elegant plyr or apply that can do this for me... but as I
said before, I need a FA Group (for loops anonymous)...

Also, this winds up being a lot of calcs on a big data set.  So, if
you have magical ff, big.memory and/or doMC suggestions I'm all ears,
[R] xtable without a loop alongside a ggplot

2011-05-04 Thread Justin Haynes
I would like to create a table of my points and identify which
'quadrant' of a plot they are in with the 'origin' at the means.  the
kicker is i would like to display it right next to or below a ggplot
of the data.  Maybe xtable isnt the right thing to use, but its the
only thing i can think of.  Any help is appreciated!

test$right-sapply(test$x,function(x) {mean.x-mean(test$x);any(xmean.x)})
test$up-sapply(test$y,function(y) {mean.y-mean(test$y);any(ymean.y)})

for(i in 1:length(test$x)){
  if(test$right[i]==TRUE  test$up[i]==TRUE)
print(paste(rownames(test[i,]),'is in the upper right quadrant'))
  if(test$right[i]==FALSE  test$up[i]==TRUE)
print(paste(rownames(test[i,]),'is in the upper left quadrant'))
  if(test$right[i]==TRUE  test$up[i]==FALSE)
print(paste(rownames(test[i,]),'is in the lower right quadrant'))
  if(test$right[i]==FALSE  test$up[i]==FALSE)
print(paste(rownames(test[i,]),'is in the lower left quadrant'))

I know theres a better way then using a for loop!  and I haven't the
foggiest how to use xtable.  as i said, the ultimate goal is to create
a plot with a table along side it showing outliers and where they
appear using the inout function from the splancs package and a
[R] MASS fitdistr with plyr or data.table?

2011-04-27 Thread Justin Haynes
I am trying to extract the shape and scale parameters of a wind speed
distribution for different sites.  I can do this in a clunky way, but
I was hoping to find a way using data.table or plyr.  However, when I
try I am met with the following:



Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) :
  invalid to set the class to matrix unless the dimension attribute is
of length 2 (was 0)
In addition: Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
10: In dweibull(x, shape, scale, log) : NaNs produced

(the warning messages are normal from what I can tell)

or using plyr:



Error in .fun(piece, ...) : 'x' must be a non-empty numeric vector

those sound like similar errors to me, but I can't figure out how to
make them go away!

to prove I'm not crazy:

2.996815 8.009757
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In dweibull(x, shape, scale, log) : NaNs produced
[R] MASS fitdistr call in plyr help!

2011-04-22 Thread Justin Haynes
I have a set of wind speeds read at different locations.  The data is
a data frame with two columns: site and wind speed.  I want to split
the data on site and call a function to find the shape and scale
parameters of a weibull distribution fit.

The end result is a plot with x-axis = shape and y-axis = scale.
Currently my code looks like:

  for(i in 1:l){

for(i in (mini+1):maxi){

for(i in 1:l){temp-data.frame(fit[i])}
for(i in 1:l){temp[i]-data.frame(fit[i])}
for(i in 1:l){temp2[i,j]-temp[j,i]}

Id like to combine the two functions into one plyr call, but I can't
figure out how it would work!  If there is a better package than MASS
[R] string interpolation

2011-03-21 Thread Justin Haynes
Is there a way to do this in R? I have data in the form:

57_input  57_output  58_input  58_output  etc.

can i use a for loop (i in 57:n)  that plots only the outputs?  I want
this to be robust so im not specifying a column id but rather
something like c++ code,

[R] linear regression in a data.frame using recast

2011-03-16 Thread Justin Haynes
I have a very large dataset with columns of id number, actual value,
predicted value.  This used to be a time series but I have dropped the
time component.  So I now have a data.frame where the id number is
repeated but each value in the actual and predicted columns are

I assume I need to use recast somehow but I'm at a loss... how can I
perform a simple linear regression (using lm()?) on my two variables
for each unique id number?

