[R] system command to a specific shell (bash)

2012-04-16 Thread Justin Haynes
I need to run a bash command, but when you call system() the default shell
is sh (see my sessionInfo below).
I found the shell command (
http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/base/html/shell.html)
but it seems to be disappeared in current versions of R?
I am running all this from R CMD BATCH  with system calls to other R
scripts.

For a little more info, I'm generating sphinx documents (a python
documentation library) through R and need to use a python virtual
environment.
So I need to call system('source bin/activate'), but source isn't a
recognized command in the sh shell...


Any help is appreciated,

Justin

 sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C
  LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] graphics  grDevices utils datasets  stats grid  methods
base

other attached packages:
[1] ggplot2_0.9.0  reshape2_1.2.1 plyr_1.7.1

loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4digest_0.5.1   MASS_7.3-16
 memoise_0.1munsell_0.3
 [7] proto_0.3-9.2  RColorBrewer_1.0-5 scales_0.2.0   stringr_0.6
 tools_2.15.0

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] system command to a specific shell (bash)

2012-04-16 Thread Justin Haynes
Thanks Jeff, but I'm running a python program that expects certain
functionality that bash provides and sh doesn't...  I can just stop using
github checkouts and use system packages though and fix this.

I'm mostly wondering where the shell command went in base R... it sounds
like it completely solves this issue but doesn't exist in my R




On Mon, Apr 16, 2012 at 10:58 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.uswrote:

 You could make a hash bang bash script that sources the file and then
 proceeds to do whatever you want. Bourne shell should have no problems
 invoking another shell.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
  Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 Justin Haynes jto...@gmail.com wrote:

 I need to run a bash command, but when you call system() the default
 shell
 is sh (see my sessionInfo below).
 I found the shell command (
 http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/base/html/shell.html
 )
 but it seems to be disappeared in current versions of R?
 I am running all this from R CMD BATCH  with system calls to other R
 scripts.
 
 For a little more info, I'm generating sphinx documents (a python
 documentation library) through R and need to use a python virtual
 environment.
 So I need to call system('source bin/activate'), but source isn't a
 recognized command in the sh shell...
 
 
 Any help is appreciated,
 
 Justin
 
  sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: x86_64-pc-linux-gnu (64-bit)
 
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C
   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] graphics  grDevices utils datasets  stats grid  methods
 base
 
 other attached packages:
 [1] ggplot2_0.9.0  reshape2_1.2.1 plyr_1.7.1
 
 loaded via a namespace (and not attached):
 [1] colorspace_1.1-1   dichromat_1.2-4digest_0.5.1
 MASS_7.3-16
  memoise_0.1munsell_0.3
 [7] proto_0.3-9.2  RColorBrewer_1.0-5 scales_0.2.0
 stringr_0.6
  tools_2.15.0
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A little exercise in R!

2012-04-14 Thread Justin Haynes
Since I thought this was a cool question, I posted it to StackOverflow.
 Vincent Zookynd's  answer is amazing and really exercises the power of R.


http://stackoverflow.com/questions/10150161/ordering-117-by-perfect-square-pairs/10150797#10150797



On Fri, Apr 13, 2012 at 10:06 PM, Bert Gunter gunter.ber...@gene.comwrote:

 ... and a moment's more consideration immediately shows it cannot be
 done for n = 18, since 16,17, and 18 cannot all be at an end.

 -- Bert

 On Fri, Apr 13, 2012 at 9:59 PM, Bert Gunter bgun...@gene.com wrote:
  Folks:
 
  IMHO this is exactly the **wrong** way t go about this. These are
  mathematical exercises that should employ mathematical thinking, not
  brute force checking of cases.
 
  Consider, for example, the 1 to 17 sequence given by Ted. Then 17
  **must** be one end of the sequence and 16 the other. (Why?) Hence,
  starting from the 17 end, the values ** must** be 17  8 1 ...
  Proceeding in this way, it takes only a couple of minutes to solve.
 
  The more interesting point which I think the question was really
  about, is can this always be done? I haven't given this any thought,
  but there may be an easy proof or counterexample. If the answer to
  this latter is no, then perhaps even more interesting is to
  characterize the set of numbers where it can/cannot be done.
 
  But this is all way off topic, no?
 
  Cheers,
  Bert
 
 
 
  On Fri, Apr 13, 2012 at 6:26 PM, Philippe Grosjean
  phgrosj...@sciviews.org wrote:
  Hi all,
 
  I got another solution, and it would apply probably for the ugliest one
 :-(
  I made it general enough so that it works for any series from 1 to n (n
 not
  too large, please... tested up to 30).
 
  Hint for a better algorithm: inspect the object 'friends' in my code:
 there
  is a nice pattern appearing there!!!
 
  Best,
 
  Philippe
 
  ..¡}))
   ) ) ) ) )
  ( ( ( ( (Prof. Philippe Grosjean
   ) ) ) ) )
  ( ( ( ( (Numerical Ecology of Aquatic Systems
   ) ) ) ) )   Mons University, Belgium
  ( ( ( ( (
  ..
 
  findSerie - function (n, tmax = 500) {
   ## Check arguments
   n - as.integer(n)
   if (length(n) != 1 || is.na(n) || n  1)
 stop('n' must be a single positive integer)
 
   tmax - as.integer(tmax)
   if (length(tmax) != 1 || is.na(tmax) || tmax  1)
 stop('tmax' must be a single positive integer)
 
   ## Suite of our numbers to be sorted
   nbrs - 1:n
 
   ## Trivial cases: only one or two numbers
   if (n == 1) return(1)
   if (n == 2) stop(The pair does not sum to a square number)
 
   ## Compute all possible pairs
   omat - outer(rep(1, n), nbrs)
   ## Which pairs sum to a square number?
   friends - sqrt(omat + nbrs) %% 1  .Machine$double.eps
   diag(friends) - FALSE # Eliminate pairs of same numbers
 
   ## Get a list of possible neighbours
   neigb - apply(friends, 1, function(x) nbrs[x])
 
   ## Nbr of neighbours for each number
   nf - sapply(neigb, length)
 
   ## Are there numbers without neighbours?
   ## then, problem impossible to solve..
   if (any(!nf))
 stop(Impossible to solve:\n,
   paste(nbrs[!nf], collapse = , ),
sum to square with nobody else!)
 
   ## Are there numbers that can have only one neighbour?
   ## Must be placed at one extreme
   toEnds - nbrs[nf == 1]
   ## I must have two of them maximum!
   l - length(toEnds)
   if (l  2)
 stop(Impossible to solve:\n,
   More than two numbers form only one pair:\n,
   paste(toEnds, collapse = , ))
 
   ## The other numbers can appear in the middle of the suite
   inMiddle - nbrs[!nbrs %in% toEnds]
 
   generateSerie - function (neigb, toEnds, inMiddle) {
 ## Allow to generate serie by picking candidates randomly
 if (length(toEnds)  1) toEnds - sample(toEnds)
 if (length(inMiddle)  1) inMiddle - sample(inMiddle)
 
 ## Choose a number to start with
 res - rep(NA, n)
 
 ## Three cases: 0, 1, or 2 numbers that must be at an extreme
 ## Following code works in all cases
 res[1] - toEnds[1]
 res[n] - toEnds[2]
 
 ## List of already taken numbers
 taken - toEnds
 
 ## Is there one number in res[1]? Otherwise, fill it now...
 if (is.na(res[1])) {
 taken - inMiddle[1]
 res[1] - taken
 }
 
 ## For each number in the middle, choose one acceptable neighbour
 for (ii in 2:(n-1)) {
   prev - res[ii - 1]
   allpossible - neigb[[prev]]
   candidate - allpossible[!(allpossible %in% taken)]
   if (!length(candidate)) break # We fail to construct the serie
   ## Take randomly one possible candidate
   if (length(candidate)  1) take - sample(candidate, 1) else
 take - candidate
   res[ii] - take
   taken - c(taken, take)
 }
 
 ## If we manage to go to the end, check last pair...
 if (length(taken) == (n - 1)) {
   take - nbrs[!(nbrs %in% taken)]
   res[n] - take
   taken 

Re: [R] A little exercise in R!

2012-04-13 Thread Justin Haynes
I thought this was kinda cool!  Here's my solution, its not robust or
probably efficient

I'd to hear improvements or other solutions!

Justin


sq.test - function(a, b) {
  ## test for number pairs that sum to squares.
  sqrt(sum(a, b)) == floor(sqrt(sum(a, b)))
}

ok.pairs - function(n, vec) {
  ## given n as a member of vec,
  ## which other members of vec satisfiy sq.test
  vec - vec[vec!=n]
  vec[sapply(vec, sq.test, b=n)]
}

grow.seq - function(y) {
  ## given a starting point (y) and a pairs list (pl)
  ## grow the squaring sequence.
  ly - length(y)
  if(ly == y[1]) return(y)

  ## this line is the one that breaks down on other number sets...
  y - c(y, max(pl[[y[ly]]][!pl[[y[ly]]] %in% y]))
  y - grow.seq(y)

  return(y)
}


## start vector
x - 1:17

## get list of possible pairs
pl - lapply(x, ok.pairs, vec=x)

## pick start at max since few combinations there.
y - max(x)
grow.seq(y)



On Fri, Apr 13, 2012 at 2:34 PM, Ted Harding ted.hard...@wlandres.netwrote:

 Greetings all!
 A recent news item got me thinking that a problem stated
 therein could provide a teasing little exercise in R
 programming.

 http://www.bbc.co.uk/news/uk-england-cambridgeshire-17680326

  Cambridge University hosts first European 'maths Olympiad'
  for girls

  The first European girls-only mathematical Olympiad
  competition is being hosted by Cambridge University.
  [...]
  Olympiad co-director, Dr Ceri Fiddes, said competition questions
  encouraged clever thinking rather than regurgitating a taught
  syllabus.
  [...]
  A lot of Olympiad questions in the competition are about
  proving things, Dr Fiddes said.

  If you have a puzzle, it's not good enough to give one answer.
  You have to prove that it's the only possible answer.
  [...]
  In the Olympiad it's about starting with a problem that anybody
  could understand, then coming up with that clever idea that
  enables you to solve it, she said.

  For example, take the numbers one up to 17.

  Can you write them out in a line so that every pair of numbers
  that are next to each other, adds up to give a square number?

 Well, that's the challenge: Write (from scratch) an R program
 that solves this problem. And make it neat.

 NOTE: If there should happen to be some R package that can solve
 this kind of problem already, without you having to think much,
 then its use is illegitimate! (I.e. will be deemed regurgitation).

 Over to you.

 With best wishes,
 Ted.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 13-Apr-2012  Time: 22:33:43
 This message was sent by XFMail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove carriage return in writing tab-delimited file.

2012-04-04 Thread Justin Haynes
take a look at ?paste

paste(yourmatrix, sep='\t', collapse='')

On Wed, Apr 4, 2012 at 2:58 PM, kickout plant.breeding.cr...@gmail.com wrote:
 Having problems with the write.table function. I can write a tab delimited
 file just fine, but for each line in my matrix its inputs a carriage return
 when i dont want it to.

 For example my matrix might be:

 ID V1 V2 V3
 FARY1004 1 2 3
 FARY2067 2 3 1
 FARY4587 2 2 2

 And I want the written File to be:

 FARY1004     1     2     3FARY2067     2     3     1FARY4587     2     2
 2

 TIA

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Remove-carriage-return-in-writing-tab-delimited-file-tp4533322p4533322.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sampling rows from a list

2012-04-02 Thread Justin Haynes
## recreating your data
mydata-list(matrix(1:9, nrow=3, byrow=T),
  matrix(10:15, nrow=2, byrow=T),
  matrix(16:30, nrow=5, byrow=T))

## get the shortest matrix in your list
n - min(unlist(lapply(mydata, nrow)))

## subset the list into random samples of length n
out - lapply(mydata, function(x, n) x[sample(1:nrow(x), n),], n=n)
## this  structure is still a list though...

## converting directly to an array:
out.array - array(unlist(out), dim=c(dim(out[[1]]), length(out)))

not totally sure about what structure you're wanting in the last step,
so if i missed i apologize...

Hope that helps,

Justin


On Mon, Apr 2, 2012 at 11:24 AM, Bcampbell99 briand.campb...@ec.gc.ca wrote:
 Hi:

 I'm sure this seems like a rudimentary question, but I am not well versed
 with R syntax for lists.  I have a ragged array from which I've removed
 records (entire rows) with missing data.  The functions I used to remove the
 missing cases resulted in the generation of an R list class object, that
 looks something like this;

 mydata
 [[1]]
     [,1] [,2] [,3]
 [1,]    1    2    3
 [2,]    4    5    6
 [3,]    7    8    9

 [[2]]
     [,1] [,2] [,3]
 [1,]   10   11   12
 [2,]   13   14   15

 [[3]]
     [,1] [,2] [,3]
 [1,]   16   17   18
 [2,]   19   20   21
 [3,]   22   23   24
 [4,]   25   26   27
 [5,]   28   29   30

 Part1
 What I would like to do is draw an equal number of random row samples
 from[[1]],[[2]] and [[3]] (to preserve the structure of [,1][,2],[,3].

 Part2
 Then I would like to cocerce the list object into something like an array.

 Help scripting out part 1 or 2 would be much appreciated.

 Brian Campbell




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sampling-rows-from-a-list-tp4526831p4526831.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] list assignment syntax?

2012-03-30 Thread Justin Haynes
You can also take a look at

http://stackoverflow.com/questions/7519790/assign-multiple-new-variables-in-a-single-line-in-r

which has some additional solutions.



On Fri, Mar 30, 2012 at 4:49 PM, Peter Ehlers ehl...@ucalgary.ca wrote:
 On 2012-03-30 15:40, ivo welch wrote:

 Dear R wizards:  is there a clean way to assign to elements in a list?
  what I would like to do, in pseudo R+perl notation is

  f- function(a,b) list(a+b,a-b)
  (c,d)- f(1,2)

 and have c be assigned 1+2 and d be assigned 1-2.  right now, I use the
 clunky

   x- f(1,2
   c- x[[1]]
   d- x[[2]]
   rm(x)

 which seems awful.  is there a nicer syntax?

 regards, /iaw
 
 Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)


 I must be missing something. Why not just assign to a
 vector instead of a list?

  f- function(a,b) c(a+b,a-b)

 If it's imperative that f return a list, then you
 could use

  (c, d) - unlist(f(a, b))

 to get vector (c, d).

 Peter Ehlers


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scanning data into r

2012-03-28 Thread Justin Haynes
What have you tried?

What type of file are you trying to import from?

What do you want your data to look like in R?

take a look at ?read.table and ?readLines


On Wed, Mar 28, 2012 at 11:23 AM, joel.green joel.gr...@live.co.uk wrote:

 Hey

 I am having trouble importing data into R, my data field looks like this

 21  TEST DATA
 32  year:2012
 33
 34
  5
 36

 I require the the number at the start of each line however the text is not
 needed, i am struggling to get R to import the data with out changing the
 file itself?

 how do i import the data, i have tried using comment.char= , however this
 didnt work, any help would be much appreciated thanks



 --
 View this message in context:
 http://r.789695.n4.nabble.com/scanning-data-into-r-tp4513182p4513182.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why does this work? plyr within-subset normalization

2012-03-28 Thread Justin Haynes
To those without access to nabble, the code in reference is:

relative - ddply(ranktable, .(Timestamp), function(x)
data.frame(relative = x[,5]/max(x[,5])))


I may be misunderstanding your question, but:

ddply splits your data.frame, ranktable, by the column Timestamp into
many smaller data.frames, one for each unique Timestamp value.

Those new small data.frames are sent one at a time to the function you
specify.
So, when you call max(x[,5]) you're taking the max of the data.frame
sent to the function rather than the max of the larger ranktable
data.frame.




On Wed, Mar 28, 2012 at 10:18 AM, z2.0 zack.abraham...@gmail.com wrote:

 Working code that normalize each row's value against the subset's maximum.



 Does the invocation of max() somehow instruct R to 'step back' and evaluate
 the subset?

 Thanks, Zack

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Why-does-this-work-plyr-within-subset-normalization-tp4512989p4512989.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Justin Haynes
In most regexs the carrot( ^ ) signifies the start of a line and the
dollar sign ( $ ) signifies the end.

gsub('^S S', 'S', a)

gsub('^S S', 'S', '3421 BIGS St')

you can use logical or inside your pattern too:

gsub('^S S|S S$| S S ', 'S', a)

the  S S  condition is difficult.

gsub('^S S|S S$| S S ', 'S', 'foo S S bar')

gives the wrong output. as does:

gsub('^S S | S S$| S S ', ' S ', 'foo S S bar')
gsub('^S S | S S$| S S ', ' S ', a)


so you might have to catch that with a second gsub.

gsub(' S S ', ' S ', 'foo S S bar')


On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com wrote:
 trying to switch out addresses that have double directions, such as the
 following example:

 a = S S Main St  Interstate 95

 a = gsub(pattern=S S , replacement=S , a)


 … the problem is that I don't want to affect instances where this might be
 a correct address such as the following:


 3421 BIGS St


 what I want to say is switch out only if this is either of the following
 situations


 [beginning of char]S S

  S S 

 S S[end of char]


 Is there anyway of making gsub or a similar function make the replacements
 I want?  Thanks in advance for your help.


 ~Markus

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to match exact phrase using gsub (or similar function)

2012-03-28 Thread Justin Haynes
wow!  and here I thought I was starting to know most things about regexes...

On Wed, Mar 28, 2012 at 1:34 PM, William Dunlap wdun...@tibco.com wrote:
 You can use the \ and \ patterns (backslashing the backslashes) to
 mean start and end of word, respectively.  E.g.,

   addresses - c(S S Main St  Interstate 95, 3421 BIGS St)
   gsub(\\S S\\, S, addresses)
  [1] S Main St  Interstate 95 3421 BIGS St

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Justin Haynes
 Sent: Wednesday, March 28, 2012 1:24 PM
 To: Markus Weisner
 Cc: r-help@r-project.org
 Subject: Re: [R] how to match exact phrase using gsub (or similar function)

 In most regexs the carrot( ^ ) signifies the start of a line and the
 dollar sign ( $ ) signifies the end.

 gsub('^S S', 'S', a)

 gsub('^S S', 'S', '3421 BIGS St')

 you can use logical or inside your pattern too:

 gsub('^S S|S S$| S S ', 'S', a)

 the  S S  condition is difficult.

 gsub('^S S|S S$| S S ', 'S', 'foo S S bar')

 gives the wrong output. as does:

 gsub('^S S | S S$| S S ', ' S ', 'foo S S bar')
 gsub('^S S | S S$| S S ', ' S ', a)


 so you might have to catch that with a second gsub.

 gsub(' S S ', ' S ', 'foo S S bar')


 On Wed, Mar 28, 2012 at 12:32 PM, Markus Weisner r...@themarkus.com wrote:
  trying to switch out addresses that have double directions, such as the
  following example:
 
  a = S S Main St  Interstate 95
 
  a = gsub(pattern=S S , replacement=S , a)
 
 
  . the problem is that I don't want to affect instances where this might be
  a correct address such as the following:
 
 
  3421 BIGS St
 
 
  what I want to say is switch out only if this is either of the following
  situations
 
 
  [beginning of char]S S
 
   S S 
 
  S S[end of char]
 
 
  Is there anyway of making gsub or a similar function make the replacements
  I want?  Thanks in advance for your help.
 
 
  ~Markus
 
         [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert day of year back into a date format.

2012-03-27 Thread Justin Haynes
There may very well be a better solution, but this works.

format(strptime(dayofyear, format=%j), format=%m-%d)

On Tue, Mar 27, 2012 at 11:12 AM, Sam Albers tonightstheni...@gmail.comwrote:

 Hello,

 I am having trouble figuring out how to convert a Day of Year integer
 back into a Date format. For example I have the following:

 date -
 c('2008-01-01','2008-01-02','2008-01-03','2008-01-04','2008-01-05','2008-01-06','2008-01-07',

 '2008-01-08','2008-01-09','2008-01-10','2008-01-11','2008-01-12','2008-01-13','2008-01-14','2008-01-15',

 '2008-01-16','2008-01-17','2008-01-18','2008-01-19','2008-01-20','2008-01-21','2008-01-22','2008-01-23')

 ## this is then converted into a number corresponding to the day of
 the year like so:

 dayofyear - strptime(date, format=%Y-%m-%d)$yday + 1

 ## Now my question is how do I get back to a date format (obviously
 omitting the year).
 ## The end result is that I'd like to be able to have axis labels as
 something like Month-Day or just Month
 ## instead of just an integers which isn't always intuitive for people
 but I can't seem to figure out how to tell R
 ## to recognize an integer as a date.

 Any suggestions?

 Many thanks in advance!

 Sam

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove a word from a character vector value XXXX

2012-03-07 Thread Justin Haynes
Hadley's package stringr is wonderful for all things string.

library(stringr)

?str_trim

and

?str_replace are what you want.  (the base R equivalent of these two
would be ?gsub and some regular expressions)

str_trim(str_replace(d5.Region, 'Average', ''))

should do the trick.

hope that helps,
Justin


On Wed, Mar 7, 2012 at 8:03 AM, Dan Abner dan.abne...@gmail.com wrote:
 Hi everyone,

 What is the easiest way to remove the word Average and strip leading
 and trailing blanks from the character vector (d5.Region) below?

 .nrow.d5.           d5.Region
 1            1     Central Average
 2            2     Coastal Average
 3            3        East Average
 4            4  Metro East Average
 5            5 Metro North Average
 6            6 Metro South Average
 7            7  Metro West Average
 8            8   Northeast Average
 9            9   Northwest Average


 Thanks!

 Dan

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logical to vector?

2012-03-07 Thread Justin Haynes
?as.numeric

 as.numeric(c(TRUE, FALSE))
[1] 1 0


On Wed, Mar 7, 2012 at 8:02 AM, Ed Siefker ebs15...@gmail.com wrote:
 I am trying to use the coXpress function from
 the coXpress package.  This function requires
 numerical vectors indicating which columns
 are in which group.

 The problem is, I can only figure out how
 to get a logical structure, not a numerical one.
 In other words, coXpress wants something like:
 1:3

  I have something like:
 TRUE TRUE TRUE FALSE FALSE

 Can I convert one into the other easily?

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GPS handling libraries or (String manipulation)

2012-03-07 Thread Justin Haynes
Take a look at:
http://cran.r-project.org/web/views/Spatial.html

But I've always just parsed the string...

This is from the last time I did this, its not quite the same but you
can see the similarities.


## if data is presented as 43°02'46.60059 N need to split on the °
symbol, ' and .
to.decimal - function(vec){
  # convert all symbols to _
  vec - gsub('°','_',vec)
  vec - gsub('\'','_',vec)
  vec - gsub('\','_',vec)

  split - str_split(vec,'_')
  deg - as.numeric(sapply(split,'[',1))
  min - as.numeric(sapply(split,'[',2))
  sec - as.numeric(sapply(split,'[',3))

  deg - deg + min/60 + sec/3600
  return(deg)
}


On Wed, Mar 7, 2012 at 8:28 AM, Alaios ala...@yahoo.com wrote:
 Dear all,
 I would like to ask you if R has a library that can work with different GPS 
 formats

 For example
 I have a string of this format

 N50° 47.513 E006° 03.985
 and I would like to convert to GPS decimal format.

 that means for example converting the part N50° 47.513
 to 50 + 47/60 + 513/3600.

 Is it possible to do that with R?
 What is the name of such a library?

 I would like to thank you in advance for your help

 B.R
 Alex

        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GPS handling libraries or (String manipulation)

2012-03-07 Thread Justin Haynes
Wow... that is WAY better!

Thanks Gabor!

On Wed, Mar 7, 2012 at 8:51 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Wed, Mar 7, 2012 at 11:28 AM, Alaios ala...@yahoo.com wrote:
 Dear all,
 I would like to ask you if R has a library that can work with different GPS 
 formats

 For example
 I have a string of this format

 N50° 47.513 E006° 03.985
 and I would like to convert to GPS decimal format.

 that means for example converting the part N50° 47.513
 to 50 + 47/60 + 513/3600.

 Is it possible to do that with R?
 What is the name of such a library?


 Use strapply to extract the digits and convert them to numeric
 followed by matrix multiplication to apply the formula:

 library(gsubfn)
 x - N50° 47.513

 c(1, 1/60, 1/3600) %*% strapply(x, \\d+, as.numeric, simplify = TRUE)


 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regular expression

2012-02-29 Thread Justin Haynes
gsub('.+; (.+);.+','\\1',x)

or if you just want the value out:

gsub('.+; Surv\\(months\\): ([0-9]+);.+','\\1',x)

You can also look at strsplit:
 strsplit(x,';')
[[1]]
[1] 99-625: Cell type: S Surv(months): 21   
STATUS(0=alive, 1=dead): 1

 lapply(strsplit(x,';'),'[',2)
[[1]]
[1]  Surv(months): 21

But i would follow David's second suggestion and just read them in with
sep=';' instead.


Justin

On Wed, Feb 29, 2012 at 11:24 AM, Fred G bayespoker...@gmail.com wrote:

 Computer Friends,

 with the following example lines:

 [107] 98-610: Cell type: S; Surv(months): 6; STATUS(0=alive, 1=dead): 1

 [108] 99-625: Cell type: S; Surv(months): 21; STATUS(0=alive, 1=dead): 1

 i want to be able to isolate the number of months of survival for each row.

 is there a regular expression that can find the first instance of a ;,
 delete everything in front of it-- and find the second instance of an ;
 and delete everything behind it? in python there is a function line.find(),
 would be grateful to hear the R equiv; or, any other better alternatives to
 get the number of months of survival stored as a variable.

 Much Thank You!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem building up ggplot graph in a loop.

2012-02-16 Thread Justin Haynes
ggplot is looking for thisData as a column of coffs.  the most
'ggplotesque' way of doing this would be:

# melt your data to a long format:
coffs.melt - melt(coffs, id.vars = 'levels')

# plot using colour aes parameter:
ggplot(coffs.melt, aes(x=levels, y=value, colour=variable)) + geom_line() +
ylab('Total Chargeoffs')

this is untested since there is no sample data!

Justin


On Thu, Feb 16, 2012 at 2:50 PM, Keith Weintraub kw1...@gmail.com wrote:

 Folks,
  I want to automate some graphing using ggplot.

 Here is my code
 graphChargeOffs2-function(coffs) {
  ggplot(coffs, aes(levels))
  dataNames-names(coffs)[!names(coffs) == levels]
  for(i in dataNames) {
thisData-coffs[[i]]
last_plot() + geom_line(aes(y = thisData, colour = i))
  }
  last_plot() + ylab(Total Chargeoffs)
 }

 coffs is a data.frame.

 I get the following error:
 Error in eval(expr, envir, enclos) : object 'thisData' not found

 As little as I know about environments in R I am pretty sure that the
 geom_line in the loop is not able to see the thisData variable.

 Any help you could provide would be appreciated. I would be surprised if
 there wasn't a way to pass the data into the geom_line function without
 using environments. Of course I have been wrong once or twice in the past.
 :)

 Note that geom_line also can't see the input variable coffs.

 Thanks for any and all heo




 --


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change dataframe-structure

2012-02-13 Thread Justin Haynes
There is probably a more ellegant way, but:

 df -
data.frame(p1=c(1,2,1),p2=c(3,3,2),p3=c(2,1,3),p4=c(5,6,4),p5=c(4,4,6),p6=c(6,5,5))
 as.data.frame(t(apply(df,1,function(x) names(x)[match(1:6,x)])))
  V1 V2 V3 V4 V5 V6
1 p1 p3 p2 p5 p4 p6
2 p3 p1 p2 p5 p6 p4
3 p1 p2 p3 p4 p6 p5



On Mon, Feb 13, 2012 at 2:07 PM, David Studer stude...@gmail.com wrote:

 Hello everybody,

 I have the following problem and have no idea how to solve it:

 In my dataframe I have six columns representing six societal problems (p1,
 p2, ..., p6).
 The values are ranks between 1 (worst problem) and 6 (best problem)


 p1 p2 p3  p4 p5 p6
 1   3   2   5   4   6
 2   3   1   6   4   5
 1   2   3   4   6   5

 but I'd like the dataframe the other way round:
 123456
 p1  p3  p2  p4  p4  p6
 p3  p1  p2  p5  p6  p4
 p1  p2  p3  p4  p6  p5

 Can anyone help?

 Thanks!

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] debug in a loop

2012-02-10 Thread Justin Haynes
You can add

if(is.na(tab[i])) browser()

or

if(is.na(tab[i])) break

see inline

On Fri, Feb 10, 2012 at 7:22 AM, ikuzar raz...@hotmail.fr wrote:

 Hi,

 I'd like to debug in a loop (using debug() and browser() etc but not
 print()
 ). I'am looking for the first occurence of NA.
 For instance:

 tab = c(1:300)
 tab[250] = NA
 len = length(tab)
 for (i in 1:len){
   if(i != len){

   if(is.na(tab[i])) browser()

 tab[i] = tab[i]+tab[i+1]
   }
 }

 I do not want to do Browse[2] n for each step ... I'd like to declare a
 browser() in the loop with a condition. But how to write stop running
 when you encounter NA ?

 Thanks for your help

 --
 View this message in context:
 http://r.789695.n4.nabble.com/debug-in-a-loop-tp4376563p4376563.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory allocation problem (again!)

2012-02-08 Thread Justin Haynes
32 bit windows has a memory limit of 2GB.  Upgrading to a computer thats
less than 10 years old is the best path.

But short of that, if you're just generating random data, why not do it in
two or more pieces and combine them later?

mat.1 - matrix(rnorm(5*2000),nrow=5)
mat.2 - matrix(rnorm(5*2000),nrow=5)
mat.3 - matrix(rnorm(5*2000),nrow=5)

mat.1.sums - rowSums(mat.1)
mat.2.sums - rowSums(mat.2)
mat.3.sums - rowSums(mat.3)

mat.sums - c(mat.1.sums,mat.2.sums,mat.3.sums)



On Wed, Feb 8, 2012 at 8:37 AM, Christofer Bogaso 
bogaso.christo...@gmail.com wrote:

 Dear all, I know this problem was discussed many times in forum, however
 unfortunately I could not find any way out for my own problem. Here I am
 having Memory allocation problem while generating a lot of random number.
 Here is my description:

  rnorm(5*6000)
 Error: cannot allocate vector of size 2.2 Gb
 In addition: Warning messages:
 1: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 2: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 3: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 4: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
  memory.size(TRUE)
 [1] 15.75
  rnorm(5*6000)
 Error: cannot allocate vector of size 2.2 Gb
 In addition: Warning messages:
 1: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 2: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 3: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)
 4: In rnorm(5 * 6000) :
  Reached total allocation of 1535Mb: see help(memory.size)

 And the Session info is here:

  sessionInfo()
 R version 2.14.0 (2011-10-31)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
 States.1252
 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C

 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] graphics  grDevices utils datasets  grid  stats methods
 base

 other attached packages:
 [1] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.6  zoo_1.7-6

 loaded via a namespace (and not attached):
 [1] lattice_0.20-0

 I am using Windows 7 (home version) with 4 GB of RAM (2.16GB is usable as
 my
 computer reports). So in my case, is it not possible to generate a random
 vector with such length? Note that generating such vector is my primary
 job.
 Later I need to do something on that vector. Those Job includes:
 1. Create a matrix with 50,000 rows.
 2. Get the row sum
 3. then report some metrics on that sum values (min. 50,000 elements must
 be
 there).

 Can somebody help me with some real solution/suggesting?

 Thanks and regards,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help need

2012-02-07 Thread Justin Haynes
Instead of a for loop, why not use the vectorization inherent in R?

sigmasqaured - 1
i - complex(real = 0, imaginary =1)
f - seq(0,0.5,0.1)
spectrum
- 
(sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2)

 spectrum
[1] 9.632720e+00 1.411130e+03 2.947753e+00 6.479994e-02 1.295175e-02
8.042731e-03


On Tue, Feb 7, 2012 at 1:08 PM, Jaymin Shah jayminsh...@live.com wrote:

 I have mad a for loop to try and output values which i have named
 spectrum.  However, I cannot seem to get the answers to come out as a
 vector which is what i need. They come out as separate values which I am
 then unable to join together. Thank you

 for(f in seq(0,0.5,0.1)) {
sigmasqaured - 1
i = complex(real = 0, imaginary = 1)
spectrum -
 (sigmasqaured)/(abs(1-2.7607*exp(2*pi*i*f)+3.8106*exp(4*pi*i*f)-2.6535*exp(6*pi*i*f)+0.9258*exp(8*pi*i*f))^2)
  print(spectrum)
 }
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I bet apply has a solution

2012-02-06 Thread Justin Haynes
How bout:

 apply(Data..,1, function(vec) !all(vec==vec[1]))
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE



On Mon, Feb 6, 2012 at 10:34 AM, LCOG1 jr...@lcog.org wrote:

 Hi all
 For the data below, I would like to return a logical value indicating
 differences in the data.

 #Create data
 Data..-data.frame(a=rep(1,10),b=c(rep(1,9),2),c=c(rep(1,8),2,2))

   a b c
 1  1 1 1
 2  1 1 1
 3  1 1 1
 4  1 1 1
 5  1 1 1
 6  1 1 1
 7  1 1 1
 8  1 1 1
 9  1 1 2
 10 1 2 2


 So what I want is to return logical value telling me if all the values are
 the same.  So the result would be a b c DidChange
 1  1 1 1 FALSE
 2  1 1 1 FALSE
 3  1 1 1 FALSE
 4  1 1 1 FALSE
 5  1 1 1 FALSE
 6  1 1 1 FALSE
 7  1 1 1 FALSE
 8  1 1 1 FALSE
 9  1 1 2  TRUE
 10 1 2 2  TRUE

 I bet apply could handle this elegantly but that family of functions is
 still not 100% intuitive to me.  Thoughts.  Thanks everyone

 Cheers,
  Josh


 --
 View this message in context:
 http://r.789695.n4.nabble.com/I-bet-apply-has-a-solution-tp4362294p4362294.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Select elements from text

2012-01-24 Thread Justin Haynes
how bout using read.table(... , sep= ).

That would give you a vector of single words.  then

grepl(\\[[9-z]+\\],x)

will return a boolean vector


 x-c('test','[bracket]','hi]','[blah','foo','[bar]')
 grepl('\\[[9-z]+\\]',x)
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE
 x[grepl('\\[[9-z]+\\]',x)]
[1] [bracket] [bar]

You might need a more complex reg-ex to catch them all incase of
([citation]) instances for example.

Justin

On Tue, Jan 24, 2012 at 6:52 AM, mdvaan mathijsdev...@gmail.com wrote:

 Hi,

 I have a series of MS word files and each file contains plain text. From
 these texts I would like to extract only those elements (read: words) that
 are between square brackets. Example of a text:

 Most fundamentally, it has led to an effort to clarify the organizational
 form concept. According to them [see also Smith, Jones and Carroll 2002],
 categories emerge as audience members recognize dissimilarities among
 groups
 of consumers and label them as members of a common set [Nicol 2000].

 Now I would like to get the following selection:

 see also Smith, Jones and Carroll 2002
 Nicol 2000

 Any ideas on how to do this? What would be the best way to import the text
 in R? The entire text as an element in a dataframe? Thank you very much!

 Best,

 Mathijs


 --
 View this message in context:
 http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4323947.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] drop columns whose rows are all 0

2012-01-24 Thread Justin Haynes
 dataset-data.frame(a=1:10,b=c(0,0,0,1,0,0,0,0,1,0),c=rep(0,10))
 apply(dataset,2,function(x) all(x==0))
a b c
FALSE FALSE  TRUE

 dataset[,!apply(dataset,2,function(x) all(x==0))]
a b
1   1 0
2   2 0
3   3 0
4   4 1
5   5 0
6   6 0
7   7 0
8   8 0
9   9 1
10 10 0




On Tue, Jan 24, 2012 at 8:14 AM, Francisco franciscororol...@google.comwrote:

 Hello,
 I have a dataset with 40 variables, some of them are always 0 (each row).
 I would like to make a subset containing only the columns which values are
 not all 0, but I don't know how to do it.

 I tried:

 for(cut_column in 1:40) {

 if(sum(dataset[,cut_column])!=**0) {
columns_useful-c(columns_**
 useful,dataset[cut_column])

 }
 }

 sorted_dataset-subset(**dataset, select=columns_useful)

 But it doesn't work.
 Thank you

 Francisco

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How can I access information stored after I run a command in R?

2012-01-23 Thread Justin Haynes
?str tells you about the object.

str(MAX3(a,'asy',1))

from that you can see the names of the various parts including p.value.

foo - MAX3(a,'asy',1)$p.value



On Mon, Jan 23, 2012 at 9:32 AM, Tiago V. Pereira
tiago.pere...@mbe.bio.brwrote:

 Dear all,

 Supposed I run the following command:

 ###
 #install.packages(Rassoc, dependencies=TRUE)
 library(Rassoc)
 ca=c(139,249,112)

 co=c(136,244,120)

 a=rbind(ca,co)

 MAX3(a,asy,1)
 ##

 I get:

The MAX3 test using the asy method

 data:  a
 statistic = 0.5993, p-value = 0.7933


 How can one save the result 0.7933 into a file?

 say:

 foo - 0.7933

 write.table(foo, file =/home/foo.txt, sep =  ,
 row.names=FALSE,col.names=TRUE, quote=FALSE, qmethod = double)


 However, instead of typing the value above, I would like to replace it by
 the macro (scalar, local) that has the accurate p-value.

 thanks in advance for your help.

 Tiago

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored outliers

2012-01-20 Thread Justin Haynes
TOC_NI-read.csv2(C:/Users/hilliges/Desktop/Master/Daten/Statistik/TOC-NI.csv,
sep=;, dec=,, encoding=UTF-8)
circ-TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]
plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3)  ## this line is coloring
all points because you're using TOC_NI still

points(NI~TOC,data=circ,col='red',pch=1,size=3)  ## now we're only plotting
the four points in circ.


sorry for the confusion.  however, in the future please provide a
reproducible data set along with your question so we can more easily help.

Justin


On Fri, Jan 20, 2012 at 5:49 AM, Geophagus
falk.hilli...@twain-systems.comwrote:

 Dear Petr and Justin,
 my problem ist, that I only want to have the 4 highest values for Ni as a
 red point or with a red circle. The other points should not be modificated.
 In your proposals always all points get a red circle or a red point not
 only
 the 4 highest Ni values!
 I hope you could understand me!
 Thanks  for your help!
 GeO


 --
 View this message in context:
 http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4313278.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stacked barchart in ggplot (or other library)

2012-01-20 Thread Justin Haynes
to use ggplot:


dat-data.frame(num=1:3,usage=c(4,2,5),cap=c(10,20,10),diff=c(6,18,5))
dat.melt-melt(dat,id.var=c('num','cap'))
ggplot(dat.melt)+geom_bar(aes(x=num,y=value,fill=variable),stat='identity')



On Fri, Jan 20, 2012 at 12:30 PM, Jean V Adams jvad...@usgs.gov wrote:

 Bart6114 wrote on 01/20/2012 08:54:39 AM:

  Hey,
 
  I want to create a stacked barchart in R for the following dataset
  (http://pastebin.com/pyHUNgr2):
 
  #   usage   capacity   diff
  1   4   10  6
  2   2   20  18
  3   5   10  5
 
  The stacked barchart should, in one plot show each line of the dataset
 as a
  stacked bar using data from 'usage' and 'diff' to create the stacked
 bar.
 
  I can't find a good example of how to do this on the ggplot2 site.
 
  Thanks in advance!


 See the help on barplot:
 ?barplot

 For example:

 df - data.frame(usage=c(4, 2, 5), capacity=c(10, 20, 10), diff=c(6, 18,
 5))
 barplot(t(as.matrix(df[, 1:2])))

 Jean
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Establishing groups using something other than ifelse()

2012-01-19 Thread Justin Haynes
how bout

levels(df$z)[grep('A',levels(df$z))] - 'A'
levels(df$z)[grep('B',levels(df$z))] - 'B'
levels(df$z)[grep('C',levels(df$z))] - 'C'

does that do what you're wanting?


On Thu, Jan 19, 2012 at 3:05 PM, Sam Albers tonightstheni...@gmail.comwrote:

 Hello all,

 This is one of those Is there a better way to do this questions. Say
 I have a dataframe (df) with a grouping variable (z). This is my base
 data. Now I know that there is a higher order level of grouping that
 exist for my group variable. So what I want to do is create a new
 column that express that higher order level of grouping based on
 values in the sub-group (z  in this case). In the past I have used
 ifelse() but this tends to get fairly redundant and messy with a large
 amount of sub-groupings (z). I've created a sample dataset below. Can
 anyone recommend a better way of achieving what I am currently
 achieving with ifelse()? A long series of ifelse statements makes me
 think that there is something better for this.

 ## Dataframe creation
 df - data.frame(x=runif(36, 0, 120),
   y=runif(36, 0, 120),

 z=factor(c(A1,A1,A2,A2,B1,B1,B2,B2,C1,C,C2,C2))
   )

 ## Current method is grouping
 df$Big.Group - with(df, ifelse(df$z==A1,A, ifelse(df$z==A2,A,
 ifelse(df$z==B1, B, ifelse(df$z==B2, B, C)


 So any suggestions? Thanks in advance!

 Sam

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] png output on a server?

2012-01-18 Thread Justin Haynes
I've got R running on a gentoo server that doesn't have X11 installed.  Its
a custom build to keep those dependencies at bay!  However, some of my
scripts use the base png() function and ggplot2. But, png uses X11.

A google search suggests using the Cairo package, which works... but
changes the fonts (specifically the size of the font).  Adjusting the
pointsize doesn't seem to have much effect.

Aside from tuning the CairoPNG function to make my graphs look right, has
anyone found a good way to avoid the X11 dependency but still use the base
png function?

If anyone has experience with CairoPNG and making it look like the base png
function, id love to hear what you've learned!


Thanks,

Justin


 capabilities()
jpeg  png tifftcltk  X11 aqua http/ftp  sockets
libxml fifo   clediticonv  NLS  profmem
   FALSEFALSEFALSEFALSEFALSEFALSE TRUE TRUE
TRUE TRUE TRUE TRUE TRUEFALSE
   cairo
   FALSE


 sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8LC_PAPER=C
  LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  grid  methods
base

other attached packages:
[1] Cairo_1.5-1   ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.7.1

loaded via a namespace (and not attached):
[1] tools_2.14.1


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Points inside a polygon

2012-01-12 Thread Justin Haynes

On Wed 11 Jan 2012 08:28:03 PM PST, Hasan Diwan wrote:

I have a list of bounds for a series of polygons. I do understand the
formula to determine whether point i is within polygon X (X[x1]  i[x]
  X[x2]  i[x]  X[y1]  i[y]  X[y2]  i[y]), and I can apply this
throughout the dataset. However, this naive algorithm doesn't scale
very well. The data set contains 10,000 points consisting of (n,e)
pairs where I'm interested in which are inside polygons denoted by
vertices (V[x1]/V[y1],V[x2],V[y2]). Is there a shortcut to accomplish
this goal? Many thanks!  -- H



Check out the splancs package.  particularly the inout function.

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relative frequency plot using ggplot or other function

2012-01-12 Thread Justin Haynes

On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote:

Hi
I have a data frame in the following form. There are two groups and for
each 'width' relative frequency for group1 and group2 is given. How to plot
this in R using ggplot or other package.


  Width   relativeFrequency1   relativeFrequency2
1   100 0.0006388783 0.02265428
2   200 0.0022677303 0.02948625
3   300 0.0061182673 0.01739936
4   400 0.0152237225 0.02569902
5   500 0.0300215262 0.03639880
6   600 0.0597610250 0.07717765


Thanks



not sure exactly what you're looking for but...


dat-data.frame(width=1:6*100,rel1=runif(6), rel2=runif(6))
dat.melt-melt(dat,id.var='width')
ggplot(dat.melt,aes(x=factor(width),y=value,fill=variable))+geom_bar(stat='identity',position='dodge')


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] relative frequency plot using ggplot or other function

2012-01-12 Thread Justin Haynes
ggplot(dat.melt,aes(x=width,y=value,fill=variable,colour=variable))+geom_density(stat='identity',alpha=0.5)

the fill and colour variables can be removed if you want.

or

ggplot(dat.melt,aes(x=width,y=value,fill=variable))+geom_density(stat='identity',alpha=0.5)+facet_wrap(~variable,ncol=1)

same with this version.



On Thu, Jan 12, 2012 at 9:35 AM, Mary Kindall mary.kind...@gmail.comwrote:

 Hi this is exactly what i am looking for but I do not like to draw as
 histogram instead I want two separate plot for this data.  Something like
 the ones shown in the following link. Please disregard the legends of the
 following fig.


 http://had.co.nz/ggplot2/graphics/55078149a733dd1a0b42a57faf847036.png

 http://had.co.nz/ggplot2/graphics/90983232ced45a93d9fbbe40afffd69a.png

 Thanks

 On Thu, Jan 12, 2012 at 12:13 PM, Justin Haynes jto...@gmail.com wrote:

 On Thu 12 Jan 2012 09:02:27 AM PST, Mary Kindall wrote:

 Hi
 I have a data frame in the following form. There are two groups and for
 each 'width' relative frequency for group1 and group2 is given. How to
 plot
 this in R using ggplot or other package.


  Width   relativeFrequency1   relativeFrequency2
 1   100 0.0006388783 0.02265428
 2   200 0.0022677303 0.02948625
 3   300 0.0061182673 0.01739936
 4   400 0.0152237225 0.02569902
 5   500 0.0300215262 0.03639880
 6   600 0.0597610250 0.07717765


 Thanks


 not sure exactly what you're looking for but...

  dat-data.frame(width=1:6*100,**rel1=runif(6), rel2=runif(6))
 dat.melt-melt(dat,id.var='**width')
 ggplot(dat.melt,aes(x=factor(**width),y=value,fill=variable))**
 +geom_bar(stat='identity',**position='dodge')






 --
 -
 Mary Kindall
 Yorktown Heights, NY
 USA



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add color to Boxplot by value

2012-01-12 Thread Justin Haynes
how bout:

dat-data.frame(val=rnorm(100,12,10),x=letters[1:4])
col.val-ddply(dat,.(x),summarise,mean(val))
col.val$breaks-cut(col.val$..1,c(0,9,15,Inf))
dat.merge-merge(dat,col.val)
ggplot(dat.merge,aes(x=x,y=val,colour=breaks))+geom_boxplot()+scale_color_manual(values=c('green','yellow','red'))


On Thu, Jan 12, 2012 at 7:45 AM, KWyshak kwys...@illumina.com wrote:

 I have a boxplot of Production run rates per 10 minute intervals and I
 would
 like to color code them by the average (i.e. 15ppm = green, 9ppm = red,
 everything else yellow).

 Is there a way to do this?

 http://r.789695.n4.nabble.com/file/n4289381/RunRateBoxWhisker.png

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Add-color-to-Boxplot-by-value-tp4289381p4289381.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored outliers

2012-01-10 Thread Justin Haynes
# find top 4 points
circ
- 
TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]

# add them to your plot!
plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3)



Justin

On Tue, Jan 10, 2012 at 7:11 AM, Geophagus
falk.hilli...@twain-systems.comwrote:

 Hi @ all,
 I have question how to mark significant outliers in R.
 This is my very simple script to plot a regression:

 TOC_NI-read.csv2(C:/Users/XYZ/Desktop/Master/Daten/Statistik/TOC-NI.csv,
 sep=;, dec=,, encoding=UTF-8)
 plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
 abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
 summary(lm(NI~TOC,data=TOC_NI))

 The result is the following pic:
 http://r.789695.n4.nabble.com/file/n4282207/nickel_TOC_5f.png
 nickel_TOC_5f.png

 Now I want to make small red circles around the four highest values of Ni.
 Does anyone has an idea how to do that?
 Thanks a lot!

 Best Regards
 Geophagus




 --
 View this message in context:
 http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4282207.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] colored outliers

2012-01-10 Thread Justin Haynes
woops! see inline.


Hope that helps, and enjoy R.


Justin

On Tue, Jan 10, 2012 at 8:40 AM, Geophagus
falk.hilli...@twain-systems.comwrote:

 Hi Justin,
 thanks a lot for your quick answer.
 If I use your code, all points become red.
 How do you include the sorted and separated four values into the points
 argument?
 The variable in your script is called circ but this is not fronted up
 anymore.
 Here the script again:


 TOC_NI-read.csv2(C:/Users/hilliges/Desktop/Master/Daten/Statistik/TOC-NI.csv,
 sep=;, dec=,, encoding=UTF-8)


this line just needs trimming.  not sure how i missed that on my copy...
anyway, order puts the data.frame in order of the given vector, default
behavior sorts in ascending order unless you specify decreasing=TRUE.

circ-TOC_NI[order(TOC_NI$NI,decreasing=T),][1:4,]


and it should work


 plot(NI~TOC,data=TOC_NI,col=blue, pch=16, xlim=c(0,450))
 abline(lm(NI~TOC,data=TOC_NI),col = red,lwd=3)
 points(NI~TOC,data=TOC_NI,col='red',pch=1,size=3)

 Thanks a lot for your help!
 GeO



 --
 View this message in context:
 http://r.789695.n4.nabble.com/colored-outliers-tp4282207p4282481.html
 Sent from the R help mailing list archive at Nabble.com.


__
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] match matrices of different lengths

2012-01-05 Thread Justin Haynes
see ?merge

 merge(xx,aa,by.x='x',by.y='a')
x   y   b
1 2.00112e+11 1.0 1.2
2 2.00112e+11 1.1 1.9

making the two matricies time series does not mean that R knows that the
first column is a datetime.
and depending on your desired result, that may not be important.

hope that helps,

Justin


On Thu, Jan 5, 2012 at 5:51 AM, Thijs vanden Bergh 
bergh.thijsvan...@gmail.com wrote:

 was trying to match different matrices of different lengths with in
 the first collumn date and time info (yearmonthdayhourminute). the
 routine needs to return NA´s where data  of either of the matrices is
 non existent.

 have been trying the following:

 
 x - c(200112030003, 200112030004, 200112030005, 200112030006)
 y - c(0.1, 1, 1.1, 1.5)
 a - c(200112030004, 200112030005, 200112030007, 200112030008,
 200112030009)
 b - c(1.2, 1.9, 2.0, 2.5, 2.1)

 xx - cbind(x, y)
 aa - cbind(a, b)

 xxnew - ts(xx)
 aanew - ts(aa)

 cc - ts.union(xxnew, aanew)
 cc
 

 this does however not give the wished for result as it simply cbinds
 the two matrices and filles up empty spots that are created due to the
 one matrix being shorter then the other at the bottom end of the
 shortest matrix. i realy want the routine to match matrix xx and aa
 to time in the first collumn of both matrices.

 any help towards this end would be much appreciated,

 th.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 - tricky problem

2012-01-05 Thread Justin Haynes
how bout:

dat-data.frame(id=1:4,city=c('berlin','munich'),likeability=c(5,4,6,5),uniqueness=c(3,4,4,4))

ggplot(ddply(melt(dat,
  id.vars=c('id','city')),
  .(variable,city),
  summarise,
  value=mean(value)),
  aes(x=factor(city),y=value)) +
geom_point() +
facet_wrap(~variable)

the line drawing is a bit more tricky...  Since the x values are factors
rather than continuous, fitting a line to them is kind of nonsense.  It
matters which order they are in for example.  If instead you want to plot
something like:

ggplot(dat,aes(x=likeability,y=uniqueness,colour=city))+geom_point()+geom_smooth(aes(group=city),method='lm')

You could draw fit lines that make a bit more sense.  Forgive me if I'm
over simplifying your problem!


Justin

On Thu, Jan 5, 2012 at 7:46 AM, Mario Giesel rr.gie...@yahoo.de wrote:

 Hello, R friends,

  I've been struggling quite a bit with ggplot2.
 Having worked through Hadleys book twice I still wonder how to solve this
 task.


 1. Short example Dataframe:

 idcityLikeabilityUniqueness
 1Berlin53
 2Munich44
 3Berlin64
 4Munich54

 2. Task:

 a) Facetting plots for each attitude (1 plot for likeability and
 uniqueness each, horizontally on one page)
 b) Showing Berlin and Munich together on x axis
 c) Showing the means of Berlin and Munich on y axis (means of cities in
 likeability on first plot, means of cities in uniqueness on second plot)
 d) Drawing a line through mean points on each plot



 Hope I could explain it understandably. Any help is appreciated!

 Thanks a lot,
  Mario

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [newbie] stack operations, or functions with side effects (or both)

2012-01-04 Thread Justin Haynes
do s[1] and s[-1] do what you're looking for?
those are just to display... if you want to change s, you need to reassign
it or fiddle with namespacing.  however, I'd say it is better to write R
code as though data structures are immutable until you explicitly re-assign
them rather than trying to deal with side effects and state...


 pop - function(vec){
+   print(vec[1])
+   print(vec[-1])
+   return(vec[-1])
+}
 s - 1:5
 s - pop(s)
[1] 1
[1] 2 3 4 5
 s
[1] 2 3 4 5



On Wed, Jan 4, 2012 at 1:22 PM, Tom Roche tom_ro...@pobox.com wrote:


 summary: Specifically, how does one do stack/FIFO operations in R?
 Generally, how does one code functions with side effects in R?

 details:

 I have been a coder for years, mostly using C-like semantics (e.g.,
 Java). I am now trying to become a scientist, and to use R, but I don't
 yet have the sense of good R and R idiom (i.e., expressions that are
 to R what (e.g.) the Schwartzian transform is to Perl).

 I have a data-assimilation problem for which I see a solution that
 wants a stack--or, really, just a pop(...) such that

 * s - c(1:5)
 * print(s)
 [1] 1 2 3 4 5
 * pop(s)
 [1] 1
 * print(s)
 [1] 2 3 4 5

 but in fact I get

  pop(s)
 Error: could not find function pop

 and Rseek'ing finds me nothing. When I try to write pop(...) I get

 pop1 - function(vector_arg) {
 +   length(vector_arg) - lv
 +   vector_arg[1] - ret
 +   vector_arg - vector_arg[2:lv]
 +   ret
 + }
 
  pop1(s)
 [1] 1
  print(s)
 [1] 1 2 3 4 5

 i.e., no side effect on the argument

 pop2 - function(vector_arg) {
 +   length(vector_arg) - lv
 +   vector_arg[1] - ret
 +   assign(vector_arg, vector_arg[2:lv])
 +   return(ret)
 + }
 
  pop2(s)
 [1] 1
  print(s)
 [1] 1 2 3 4 5

 ditto :-( What am I missing?

 * Is there already a stack API for R (which I would expect)? If so, where?

 * How to cause the desired side effect to the argument in the code above?

 TIA, Tom Roche tom_ro...@pobox.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Combining characters

2012-01-04 Thread Justin Haynes
apply(expand.grid(x, y, z, stringsAsFactors=F), 1, paste, collapse=' ')



On Wed, Jan 4, 2012 at 8:32 AM, jeremy jeremynamer...@gmail.com wrote:

 Hi all,

 I'm trying to combine exhaustively several character arrays in R like:
 x=c(one,two,three)
 y=c(yellow,blue,green)
 z=c(apple,cheese)

 in order to get concatenation of

 x[1] y[1] z[1]  (one yellow apple)
 x[1] y[1] z[2] (one yellow cheese)
 x[1] y[2] z[1](one blue apple)
 ...
 x[length(x)] y[length(y)] z[length(z)]  (three green cheese)

 Anyone has a solution ?
 Thank in advance

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Combining-characters-tp4261888p4261888.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a quick question about rbinom

2012-01-04 Thread Justin Haynes
homework or not,

?rbinom

should be plenty.




On Wed, Jan 4, 2012 at 1:38 PM, lynn.tsai vernal@gmail.com wrote:

 Hello, I have the following code using rbinom, but I don't understand what
 *+1* means in the code. Could someone help? Thanks so much,

  X1-c(A,B)[rbinom(n,1,0.6)+1]
  X2-c(C,D)[rbinom(n,1,0.1)+1]

 --
 View this message in context:
 http://r.789695.n4.nabble.com/a-quick-question-about-rbinom-tp4262977p4262977.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Applyiing mode() or class() to each column of a data.frame XXXX

2011-12-30 Thread Justin Haynes
there is also colwise in the plyr package.

 library(plyr)
 colwise(class)(data6)
  v13 v14   v15 f4 v16
1 integer numeric character factor logical


Justin


On Thu, Dec 29, 2011 at 4:47 PM, Jean V Adams jvad...@usgs.gov wrote:

 Dan Abner wrote on 12/29/2011 06:13:11 PM:

  Hi everyone,
 
  I am attempting to use the apply() function to obtain the mode and class
 of
  each column in a data frame, however, I am encountering unexpected
 results.
  I have the following example data:
 
 
  v13-1:6
  v14-c(1,2,3,3,NA,1)
  v15-c(Good,Bad,NA,Good,Bad,Bad)
  f4-factor(rep(c(Blue,Red,Green),2))
  v16-c(F,T,F,F,T,F)
  data6-data.frame(v13,v14,v15,f4,v16)
  data6
 
 
  Here is my function definition:
 
 
  contents-function(x){
   output-data.frame(Varnum=1:ncol(x),
Name=names(x),
Mode=apply(x,2,mode),
Class=apply(x,2,class))
   print(output)
  }


 Use sapply() instead of apply().  In the help file for apply() it says: 
 If X is not an array but an object of a class with a non-null dim value
 (such as a data frame), apply attempts to coerce it to an array via
 as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.
 This coercion to a matrix might be causing the unexpected result. sapply()
 and lapply() are designed specifically for lists (which a data frame is).
 I also simplified the function a bit ...

 contents-function(x){
data.frame(Varnum=1:ncol(x), Name=names(x),
Mode=sapply(x,mode), Class=sapply(x,class))
}

 Jean


  
 
  When I call the function, I obtain the following:
 
 
   contents(data6)
  Varnum Name  Mode Class
  v13  1  v13 character character
  v14  2  v14 character character
  v15  3  v15 character character
f4   4   f4 character character
  v16  5  v16 character character
 
  =
 
  Any help is appreciated.
 
  Thank you,
 
  Dan
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with code

2011-12-20 Thread Justin Haynes
the short answer... which is a guess cause you didn't provide a
reproducible example... is:

your column (i think its called t1d_ptype[1:25]) is a factor and using
factors is dangerous at best.

you can check with ?str.

see ?factor for how to convert back to strings and see if your code works.



to answer your second question, yes I'm sure there is a better simple way
to do this, but i can't follow what you're doing... for example, I don't
know what c1 is...

but, the place I would look is at the plyr package.  its excellent at
splitting and reordering data.


and one final note, you should avoid naming things with pre-existing R
functions (e.g. data).

Justin


On Tue, Dec 20, 2011 at 11:14 AM, 1Rnwb sbpuro...@gmail.com wrote:

 hello gurus,

 i have a data frame like this
   HTN HTN_FDR Dyslipidemia CAD t1d_ptype[1:25]
 1Y   YY T1D
 2   T1D
 3  Ctrl_FDR
 4   T1D
 5Y Ctrl
 6  Ctrl
 7  Ctrl_FDR
 8   T1D
 9YY T1D
 10  T1D
 11 Ctrl_FDR
 12   YY T1D
 13   Y   YY T1D
 14  T1D
 15 Ctrl
 16 Ctrl
 17 Ctrl_FDR
 18  T1D
 19  T1D
 20   Y  T1D
 21 Ctrl_FDR
 22 Ctrl_FDR
 23 Ctrl
 24 Ctrl
 25  T1D

 i am converting it to define the groups more uniformly using this code:

 for( i in 1:dim(c1)[1])
 {
  num_comp-0
  for (j in 1:dim(c1)[2])
 if (c1[i,j]==2) num_comp=num_comp+1  #Y=2
  for (j in 1:dim(c1)[2])
if(num_comp0)
{
  if (data$t1d_ptype[i] == T1D  c1[i ,j] == 2) c2[i,j]-T1D_w
if (data$t1d_ptype[i] == T1D  c1[i, j] == 1)  c2[i,j]-T1D_oc
if(substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 2)
 c2[i,j]-Ctrl_w
if (substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 1)
 c2[i,j]-Ctrl_oc
  }
  else
   {
if(data$t1d_ptype[i] == T1D) c2[i,j]-T1D_noc
if(substr(data$t1d_ptype[i],1,4) == Ctrl) c2[i,j]-Ctrl_noc
   }
 }

 it is giving me error
 In `[-.factor`(`*tmp*`, iseq, value = structure(c(NA,  ... :
  invalid factor level, NAs generated

 Also it there a simple way to do this.
 Thanks
 Sharad

 --
 View this message in context:
 http://r.789695.n4.nabble.com/Help-with-code-tp4218989p4218989.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with code

2011-12-20 Thread Justin Haynes
Fair enough and good point.  How about, dangerous when used unknowingly!


On Tue, Dec 20, 2011 at 1:01 PM, William Dunlap wdun...@tibco.com wrote:

 Re
  your column (i think its called t1d_ptype[1:25]) is a factor and using
  factors is dangerous at best.

 This depends on how you want to define dangerous.  If t1d_ptype ought
 take values from a certain set of strings then making it a factor gives
 you some safety, since it warns you when you go outside of that set and
 try to give it an illegal value.  E.g.,
 sex - factor(c(M,F,F), levels=c(F, M))
 sex[2] - no
Warning message:
In `[-.factor`(`*tmp*`, 2, value = no) :
   invalid factor level, NAs generated

 It does take more work to set up, since you need to enumerate the set
 of good strings.  That is tedium, not danger.

 If t1d_ptype might take any value, then make it a character vector.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of Justin Haynes
  Sent: Tuesday, December 20, 2011 11:54 AM
  To: 1Rnwb
  Cc: r-help@r-project.org
  Subject: Re: [R] Help with code
 
  the short answer... which is a guess cause you didn't provide a
  reproducible example... is:
 
  your column (i think its called t1d_ptype[1:25]) is a factor and using
  factors is dangerous at best.
 
  you can check with ?str.
 
  see ?factor for how to convert back to strings and see if your code
 works.
 
 
 
  to answer your second question, yes I'm sure there is a better simple way
  to do this, but i can't follow what you're doing... for example, I don't
  know what c1 is...
 
  but, the place I would look is at the plyr package.  its excellent at
  splitting and reordering data.
 
 
  and one final note, you should avoid naming things with pre-existing R
  functions (e.g. data).
 
  Justin
 
 
  On Tue, Dec 20, 2011 at 11:14 AM, 1Rnwb sbpuro...@gmail.com wrote:
 
   hello gurus,
  
   i have a data frame like this
 HTN HTN_FDR Dyslipidemia CAD t1d_ptype[1:25]
   1Y   YY T1D
   2   T1D
   3  Ctrl_FDR
   4   T1D
   5Y Ctrl
   6  Ctrl
   7  Ctrl_FDR
   8   T1D
   9YY T1D
   10  T1D
   11 Ctrl_FDR
   12   YY T1D
   13   Y   YY T1D
   14  T1D
   15 Ctrl
   16 Ctrl
   17 Ctrl_FDR
   18  T1D
   19  T1D
   20   Y  T1D
   21 Ctrl_FDR
   22 Ctrl_FDR
   23 Ctrl
   24 Ctrl
   25  T1D
  
   i am converting it to define the groups more uniformly using this code:
  
   for( i in 1:dim(c1)[1])
   {
num_comp-0
for (j in 1:dim(c1)[2])
   if (c1[i,j]==2) num_comp=num_comp+1  #Y=2
for (j in 1:dim(c1)[2])
  if(num_comp0)
  {
if (data$t1d_ptype[i] == T1D  c1[i ,j] == 2)
 c2[i,j]-T1D_w
  if (data$t1d_ptype[i] == T1D  c1[i, j] == 1)
  c2[i,j]-T1D_oc
  if(substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 2)
   c2[i,j]-Ctrl_w
  if (substr(data$t1d_ptype[i],1,4) == Ctrl  c1[i,j] == 1)
   c2[i,j]-Ctrl_oc
}
else
 {
  if(data$t1d_ptype[i] == T1D) c2[i,j]-T1D_noc
  if(substr(data$t1d_ptype[i],1,4) == Ctrl)
 c2[i,j]-Ctrl_noc
 }
   }
  
   it is giving me error
   In `[-.factor`(`*tmp*`, iseq, value = structure(c(NA,  ... :
invalid factor level, NAs generated
  
   Also it there a simple way to do this.
   Thanks
   Sharad
  
   --
   View this message in context:
   http://r.789695.n4.nabble.com/Help-with-code-tp4218989p4218989.html
   Sent from the R help mailing list archive at Nabble.com.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http

Re: [R] how to manually enter an double quote as data feed?

2011-12-13 Thread Justin Haynes
\ is how its displayed on the screen.  however, if you write your object
to a csv it will be correct.  r cant display  as it is so it is escaping
the second double quote for you

however, ' (double quote single quote double quote) does display
correctly as well as save correctly.

If that doesn't answer your question, some more back story on what you're
trying to do would help.

Justin

On Tue, Dec 13, 2011 at 2:03 PM, bonnieyuan bby2...@columbia.edu wrote:

 I'm doing a text mining project where I have to manually enter a double
 quote
 as an element inside a vector.

 I tried

 char[10]=''#where i enclosed the double quote in a pair of single quotes.

 But the result is [1] \. Somehow a back slash is added automatically.

 I also tried to enclose the double quote in a pair of double quotes. That
 didn't work either.

 I'm using Mac and latest release of R.

 Thank you!

 Bonnie Yuan


 --
 View this message in context:
 http://r.789695.n4.nabble.com/how-to-manually-enter-an-double-quote-as-data-feed-tp4192283p4192283.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using sample

2011-12-07 Thread Justin Haynes
Emma,

If you haven't spent much time on the r-help forums, please do read the
posting guide.

You need to provide reproducible examples for us to help you.

We don't know anything about your data...

what is event.details, (if you can't provide the data often ?str will do)

since I don't know what event.details is, I can't figure out waht the line:


obs = (1:133429)[event.details[,2] == i]

is supposed to do.

But if I had to guess... ?sample says it expects the first argument as a
vector.  I assume obs is not a vector but a larger structure?

Feel free to post more info about your data (see ?str and ?dput) or if you
can generate made up data that replicates your problem that works too.


Justin


On Wed, Dec 7, 2011 at 9:16 AM, bevare emma.ra...@jbaconsulting.co.ukwrote:

 Hi,

 Can anyone help sort out the problem with the following script - I am a R
 newbie and I am self taught.

 obs.all = c()
 for(i in 1:386){
  if (n.sim[i]0){
obs = (1:133429)[event.details[,2] == i]
obs.all = c(obs.all, sample(obs[obs  n.sim[i]], size = n.sim[i],
 replace=T))
}

 Basically, in the sample bit, I only want to get obs.all if the value of
 obs
 is less than the value of n.sim[i]. I get the error message

 Error in sample(obs[obs  n.sim[i]], size = n.sim[i], replace = T) :
  invalid first argument

 length(n.sim)  is 386

 Thanks in advance for your suggestions

 Emma







 --
 View this message in context:
 http://r.789695.n4.nabble.com/using-sample-tp4169747p4169747.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] hour in x-axis

2011-11-29 Thread Justin Haynes
without knowing much about your data or the base plotting...

I'd use the library ggplot2.

First, you'll need to format your dates to POSIXct

AggData$time - as.POSIXct(AggData$time,format='%H:%M')

Then plotting is trivial.
ggplot(AggData,aes(x=time,y=value))+geom_points()

or +geom_line() if you'd rather.


Hope that helps,

Justin

On Tue, Nov 29, 2011 at 10:07 AM, threshold r.kozar...@gmail.com wrote:


 Dear R useres, got the following problem. Given the AggData (listed below)
 I need to plot AggData[,2] vs time (AggData[,1]) for chosen 'rows'. Ive
 done
 already:

 plot(AggData[rows,2], xaxt='n')
 axis(1,at=seq(1,length(rows),1),sub(,, AggData[rows,1]))

 which works, but I need to list only chosen data points, say full hours or
 every 60th point, something like:

 axis(1,at=seq(1,seq(1,length(rows),60)),sub(, ,
 AggData[day.rows[seq(1,length(rows),60)],2]))

 but does not work. Could be nice if time on the x-axis is in H:m format (no
 seconds).

 In the original data time bout is 1 minute, e.g. 17:19:35, 17:20:35,
 17:21:35 . Taken every 100th for brevity yields

  (AggData[seq(1,length(rows),100),c(2,7)])

  time value
 117:19:3580.68327
 101  18:59:3580.97230
 201  20:39:3578.30810
 301  22:19:3580.41558
 401  23:59:3577.01051
 501  01:39:3577.19687
 601  03:19:3578.20762
 701  04:59:3577.13315
 801  06:39:3576.29110
 901  08:19:3575.32090
 1001 09:59:3585.32890
 1101 11:39:3579.86978
 1201 13:19:3583.32418
 1301 14:59:3578.26018
 1401 16:39:3579.06434


 Thanks in advance.
 Best, robert




 --
 View this message in context:
 http://r.789695.n4.nabble.com/hour-in-x-axis-tp4120142p4120142.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate syntax for grouped column means

2011-11-29 Thread Justin Haynes
look at just your data that is in that first id category and I bet you can
figure it out!

 myData[myData$id=='0m11',]
var1  var2   id
10 30.79 32.15 0m11
11 30.79 32.39 0m11
12 30.94NA 0m11

aggregate performs the na.rm step on the entire row thus, a mean of 30.79.
 data.table and plyr perform the na.rm on each column.


Justin

On Tue, Nov 29, 2011 at 12:21 PM, Juliet Hannah juliet.han...@gmail.comwrote:

 I am calculating the mean of each column grouped by the variable 'id'.
 I do this using aggregate, data.table, and plyr. My aggregate results
 do not match the other two, and I am trying to figure out what is
 incorrect with my syntax. Any suggestions? Thanks.

 Here is the data.

 myData - structure(list(var1 = c(31.59, 32.21, 31.78, 31.34, 31.61, 31.61,
 30.59, 30.84, 30.98, 30.79, 30.79, 30.94, 31.08, 31.27, 31.11,
 30.42, 30.37, 30.29, 30.06, 30.3, 30.43, 30.61, 30.64, 30.75,
 30.39, 30.1, 30.25, 31.55, 31.96, 31.87, 30.29, 30.15, 30.37,
 29.59, 29.52, 28.96, 29.69, 29.58, 29.52, 30.21, 30.3, 30.25,
 30.23, 30.29, 30.39), var2 = c(33.78, 33.25, NA, 32.05, 32.59,
 NA, 32.24, NA, NA, 32.15, 32.39, NA, 32.4, 31.6, NA, 30.5, 30.66,
 NA, 30.6, 29.95, NA, 31.24, 30.73, NA, 30.51, 30.43, 31.17, 31.44,
 31.17, 31.18, 31.01, 30.98, 31.25, 30.44, 30.47, NA, 30.47, 30.56,
 NA, 30.6, 30.57, NA, 31, 30.8, NA), id = c(0m4, 0m4, 0m4,
 0m5, 0m5, 0m5, 0m6, 0m6, 0m6, 0m11, 0m11, 0m11,
 0m12, 0m12, 0m12, 205m1, 205m1, 205m1, 205m4, 205m4,
 205m4, 205m5, 205m5, 205m5, 205m6, 205m6, 205m6,
 205m7, 205m7, 205m7, 600m1, 600m1, 600m1, 600m3,
 600m3, 600m3, 600m4, 600m4, 600m4, 600m5, 600m5,
 600m5, 600m7, 600m7, 600m7)), .Names = c(var1, var2,
 id), row.names = c(NA, -45L), class = data.frame)

  head(myData)
   var1  var2  id
 1 31.59 33.78 0m4
 2 32.21 33.25 0m4
 3 31.78NA 0m4
 4 31.34 32.05 0m5
 5 31.61 32.59 0m5
 6 31.61NA 0m5



 results1 - aggregate(. ~  id ,data=myData,FUN=mean,na.rm=T)
  head(results1,1)
 #id  var1  var2
 # 1 0m11 30.79 32.27

 library(data.table)
 mydt - data.table(myData)
 setkey(mydt,id)
 results2 - mydt[,lapply(.SD,mean,na.rm=TRUE),by=id]
  head(results2,1)
 #   id  var1  var2
 # [1,] 0m11 30.84 32.27

 library(plyr)
 results3 - ddply(myData,.(id),colwise(mean),na.rm=TRUE)
  head(results3,1)
 #id  var1  var2
 # 1 0m11 30.84 32.27

  sessionInfo()
 R version 2.14.0 (2011-10-31)
 Platform: i386-pc-mingw32/i386 (32-bit)

 locale:
 [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
 States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 [5] LC_TIME=English_United States.1252

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base

 other attached packages:
 [1] plyr_1.6 data.table_1.7.3

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tip: large plots

2011-11-18 Thread Justin Haynes
Very cool.  Sadly, as far as I can tell, it doesn't work with ggplot though
:(


 x-runif(1e6)
 y-runif(1e6)
 system.time(plot(x,y,pch='.'))
   user  system elapsed
  0.824   0.012   0.845
 system.time(plot(x,y))
   user  system elapsed
 33.422   0.016  33.545
 system.time(print(qplot(x,y)))
   user  system elapsed
 45.142   0.228  45.687
 system.time(print(qplot(x,y,pch='.')))
   user  system elapsed
 47.483   1.060  49.040
 system.time(print(qplot(x,y,shape='.')))
   user  system elapsed
 44.807   0.689  45.710


On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.comwrote:

 Hi all,

 I'm working with a bunch of large graphs, and stumbled across
 something useful. Probably many of you know this, but I didn't and so
 others might benefit.

 Using pch=. speeds up plotting considerably over using symbols.

  x - runif(100)
  y - runif(100)
  system.time(plot(x, y, pch=.))
   user  system elapsed
  1.042   0.030   1.077
  system.time(plot(x, y))
   user  system elapsed
  37.865   0.033  38.122

 If you have enough points, the result is also more legible.

 Choice of which pch symbol makes a difference too, the default pch=1 being
 the slowest of what I tried, but . is by far the speediest.

  system.time(plot(x, y, pch=0))
   user  system elapsed
  11.191   0.011  11.270
  system.time(plot(x, y, pch=1))
   user  system elapsed
  38.024   0.008  38.245
  system.time(plot(x, y, pch=2))
   user  system elapsed
  14.140   0.027  14.270
  system.time(plot(x, y, pch=3))
   user  system elapsed
  15.696   0.011  15.799
  system.time(plot(x, y, pch=4))
   user  system elapsed
  18.770   0.007  18.888

 This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I
 haven't tried it on any other OS, but it's making my life a lot
 smoother right now.

 Sarah

 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tip: large plots

2011-11-18 Thread Justin Haynes
That is a function I did not know about, thanks Hadley!

I still don't see the speed increase that you do with the base plot
package, but I'm sticking with ggplot anyway!

 x-runif(1e6)
 y-runif(1e6)
 system.time(print(qplot(x,y)))
   user  system elapsed
 42.234   0.520  43.061
 system.time(print(qplot(x,y,pch=I('.'
   user  system elapsed
 32.370   0.204  33.868


On Fri, Nov 18, 2011 at 12:39 PM, Hadley Wickham had...@rice.edu wrote:

 You need: system.time(print(qplot(x,y,pch=I('.'

 Hadley

 On Fri, Nov 18, 2011 at 1:30 PM, Justin Haynes jto...@gmail.com wrote:
  Very cool.  Sadly, as far as I can tell, it doesn't work with ggplot
 though
  :(
 
 
  x-runif(1e6)
  y-runif(1e6)
  system.time(plot(x,y,pch='.'))
user  system elapsed
   0.824   0.012   0.845
  system.time(plot(x,y))
user  system elapsed
   33.422   0.016  33.545
  system.time(print(qplot(x,y)))
user  system elapsed
   45.142   0.228  45.687
  system.time(print(qplot(x,y,pch='.')))
user  system elapsed
   47.483   1.060  49.040
  system.time(print(qplot(x,y,shape='.')))
user  system elapsed
   44.807   0.689  45.710
 
 
  On Fri, Nov 18, 2011 at 11:03 AM, Sarah Goslee sarah.gos...@gmail.com
 wrote:
 
  Hi all,
 
  I'm working with a bunch of large graphs, and stumbled across
  something useful. Probably many of you know this, but I didn't and so
  others might benefit.
 
  Using pch=. speeds up plotting considerably over using symbols.
 
   x - runif(100)
   y - runif(100)
   system.time(plot(x, y, pch=.))
user  system elapsed
   1.042   0.030   1.077
   system.time(plot(x, y))
user  system elapsed
   37.865   0.033  38.122
 
  If you have enough points, the result is also more legible.
 
  Choice of which pch symbol makes a difference too, the default pch=1
 being
  the slowest of what I tried, but . is by far the speediest.
 
   system.time(plot(x, y, pch=0))
user  system elapsed
   11.191   0.011  11.270
   system.time(plot(x, y, pch=1))
user  system elapsed
   38.024   0.008  38.245
   system.time(plot(x, y, pch=2))
user  system elapsed
   14.140   0.027  14.270
   system.time(plot(x, y, pch=3))
user  system elapsed
   15.696   0.011  15.799
   system.time(plot(x, y, pch=4))
user  system elapsed
   18.770   0.007  18.888
 
  This is a vanilla R session, 2.13.1 for x86_64-redhat-linux-gnu. I
  haven't tried it on any other OS, but it's making my life a lot
  smoother right now.
 
  Sarah
 
  --
  Sarah Goslee
  http://www.functionaldiversity.org
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] apply on rows and columns?

2011-11-16 Thread Justin Haynes
To expand on what Sarah and Michael said:

if you have a 3d array:

 x-array(1:4,c(2,2,4))
 x
, , 1

 [,1] [,2]
[1,]13
[2,]24

, , 2

 [,1] [,2]
[1,]13
[2,]24

, , 3

 [,1] [,2]
[1,]13
[2,]24

, , 4

 [,1] [,2]
[1,]13
[2,]24

 apply(x,c(1,2),sum)
 [,1] [,2]
[1,]4   12
[2,]8   16

a margin of c(1,2) makes more sense.  Hope that clarifies things.


Justin

On Wed, Nov 16, 2011 at 12:18 PM, Sarah Goslee sarah.gos...@gmail.com wrote:
 Hi,

 On Wed, Nov 16, 2011 at 3:13 PM,  rkevinbur...@charter.net wrote:

 I have the following scenario:

 m - matrix(1:4, ncol=2)
 m
      [,1] [,2]
 [1,]    1    3
 [2,]    2    4
 apply(m, 2, sum)
 [1] 3 7
 apply(m, 1, sum)
 [1] 4 6

 So I can apply to rows *or* columns. According to the documentation
 (?apply)

 MARGIN a vector giving the subscripts which the function will be applied
 over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2)
 indicates rows and columns. Where X has named dimnames, it can be a
 character vector selecting dimension names.


 But I get the following results:

 apply(m, c(1,2), sum)
      [,1] [,2]
 [1,]    1    3
 [2,]    2    4

 How am I to interpret this result?

 I'm pretty sure R is taking the sum of m[1,1] and putting it [1,1],
 and the sum of m[1,2] and putting it in [1,2] and so on. You
 instructed apply() to work on rows and columns *simultaneously*,
 rather than sequentially.

 apply() on c(1,2) is useful if you have a matrix that's three-dimensional,
 but not so much if it's two dimensional.

 What are you trying to accomplish?

 Sarah




 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract pattern from string

2011-11-15 Thread Justin Haynes
take a look at the structure of what Sys.time returns.

str(Sys.time)

and now at ?strptime!

 format(Sys.time(),format='%d-%H-%M-%S')
[1] 15-09-55-55

 format(Sys.time(),format='%Y')
[1] 2011
 format(Sys.time(),format='%m')
[1] 11



Hope that helps,

Justin

On Tue, Nov 15, 2011 at 9:48 AM, syrvn ment...@gmx.net wrote:
 Hello,

 with Sys.time() you get the following string:

 2011-11-15 16:25:55 GMT

 How can I extract the following substrings:

 year - 2011

 month - 11

 day_time - 15_16_25_55


 Cheers,

 Syrvn

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Extract-pattern-from-string-tp4073432p4073432.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create design matrix

2011-11-03 Thread Justin Haynes
?expand.grid

 expand.grid(c(M,F),c(Y,O))
  Var1 Var2
1MY
2FY
3MO
4FO



Justin

On Thu, Nov 3, 2011 at 10:56 AM, Bond, Stephen stephen.b...@cibc.com wrote:
 Greetings useRs,

 What is the easiest way to create a design matrix of several factor 
 variables? Function gendata in Design seems to do that for a fitted model, 
 but how to do that only on several factor vectors??

 The result should be a df with one row for each distinct combination of 
 levels of factors eg for (M,F) (Y,O)
 We get
 M Y
 M O
 F Y
 F O

 In reality I will have more than 1000 rows so doing by hand not good.
 Maybe there is a way with outer, but I couldn't see it.
 All the best to everybody.

 Stephen

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mysterious warning message regarding bytecode...

2011-11-02 Thread Justin Haynes
While running a long script which source()s other scripts I get the
following warning:

Warning message:
In t(object$S[[1]]) : bytecode version mismatch; using eval


I cannot replicate it if I run the sourced files line by line though...

What is that error?  And do I care about it?  It doesn't seem to
affect my output as far as I can tell.


Thanks!
Justin


 sessionInfo()
R version 2.13.2 (2011-09-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid  stats graphics  grDevices utils datasets
methods   base

other attached packages:
 [1] mgcv_1.7-9stringr_0.5   RPostgreSQL_0.2-0 biglm_0.8
  DBI_0.2-5 doMC_1.2.3multicore_0.1-7
 [8] foreach_1.3.2 codetools_0.2-8   iterators_1.0.5
cairoDevice_2.19  pixmap_0.4-11 gridExtra_0.8.5   splancs_2.01-29
[15] sp_0.9-91 ellipse_0.3-5 ggplot2_0.8.9
proto_0.3-9.2 reshape_0.8.4 plyr_1.6  MASS_7.3-14

loaded via a namespace (and not attached):
[1] compiler_2.13.2 digest_0.5.1lattice_0.19-33 Matrix_1.0-1nlme_3.1-102

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] factor level issue after subsetting

2011-11-01 Thread Justin Haynes
first of all, the subsetting line is overly complicated.

dat.sub-dat[dat$treat!='cont',]

will work just fine.  R does exactly what you're describing.  It knows
the levels of the factor.  Once you remove 'cont' from the data, that
doesn't mean that the level is removed from the factor:

 df-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100))
 str(df)
'data.frame':   100 obs. of  2 variables:
 $ let: Factor w/ 5 levels a,b,c,d,..: 1 5 1 4 3 5 2 2 1 3 ...
 $ num: num  0.224 -0.523 0.974 -0.268 -0.61 ...

 df.sub-df[df$let!='a',]
 str(df.sub)
'data.frame':   82 obs. of  2 variables:
 $ let: Factor w/ 5 levels a,b,c,d,..: 5 4 3 5 2 2 3 3 5 3 ...
 $ num: num  -0.523 -0.268 -0.61 -1.383 -0.193 ...

 unique(df.sub$let)
[1] e d c b
Levels: a b c d e

 df.sub$let-factor(df.sub$let)
 unique(df.sub$let)
[1] e d c b
Levels: e d c b

 str(df.sub$let)
 Factor w/ 4 levels e,d,c,b: 1 2 3 1 4 4 3 3 1 3 ...


by redefining your factor you can eliminate the problem.  the other
option, if you don't want factors to begin with is:

options(stringsAsFactors=FALSE)  # to set the global option

or

dat-read.csv(~/MyFiles/data.csv,stringsAsFactors=FALSE)  # to set
the option locally for this single read.csv call.


On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan
stefan.schrei...@ales.ualberta.ca wrote:
 Dear list,

 I cannot figure out why, after sub-setting my data, that particular item
 which I don't want to plot is still in the newly created subset (please
 see example below). R somehow remembers what was in the original data
 set. A work around is exporting and importing the new subset. Then it's
 all fine; but I don't like this idea and was wondering what am I missing
 here?

 Thanks!
 Stefan

 P.S. I am using R 2.13.2 for Mac.

 dat-read.csv(~/MyFiles/data.csv)
 class(dat$treat)
 [1] factor
 dat
   treat yield
 1   cont  98.7
 2   cont  97.2
 3   cont  96.1
 4   cont  98.1
 5     10 103.0
 6     10 101.3
 7     10 102.1
 8     10 101.9
 9     30 121.1
 10    30 123.1
 11    30 119.7
 12    30 118.9
 13    60 109.9
 14    60 110.1
 15    60 113.1
 16    60 112.3
 plot(dat$treat,dat$yield)
 dat.sub-dat[which(dat$treat!='cont')]
 class(dat.sub$treat)
 [1] factor
 dat.sub
   treat yield
 5     10 103.0
 6     10 101.3
 7     10 102.1
 8     10 101.9
 9     30 121.1
 10    30 123.1
 11    30 119.7
 12    30 118.9
 13    60 109.9
 14    60 110.1
 15    60 113.1
 16    60 112.3
 plot(dat.sub$treat,dat.sub$yield)

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshape2: Lost Values Between melt() and dcast()

2011-10-31 Thread Justin Haynes
The reason dcast would give that warning (not a failure) is if the
formula you gave did not specify unique values.  Thus, dcast needs an
aggregating function, which defaults to length.

However, the dcast calls that failed can be helpful for determining
the source of your error.  I'd look at the outputs of those two dcast
calls and find cells where the length is  1.  Those are duplicated
entries in your initial data.frames (when I've run into this is was
usually due to NA values somewhere unexpected).

Hope that clarifies things.

Justin


On Mon, Oct 31, 2011 at 9:32 AM, Rich Shepard rshep...@appl-ecosys.com wrote:
  Working with 5 subset streams from my source data frame, three of them
 successfully call dcast(), but two fail:

 jerritt.cast - dcast(jerritt.melt, site + sampdate ~ param)
 Aggregation function missing: defaulting to length

 and

 winters.cast - dcast(winters.melt, site + sampdate ~ param)
 Aggregation function missing: defaulting to length

  Yet both data frames have the values in their .melt data frames:

 summary(jerritt.melt)
      site         sampdate              param       variable
  JCM-1  :2178   Min.   :1978-03-28   pH     : 292   quant:7519
  JCM-20A:2149   1st Qu.:1996-05-24   As     : 286
  JC-E   : 476   Median :2000-05-31   SO4    : 271
  JC     : 400   Mean   :2001-02-04   TDS    : 271
  GD-1   : 395   3rd Qu.:2006-05-31   Cl     : 253
  JC-2   : 349   Max.   :2009-12-30   Zn     : 250
  (Other):1572                        (Other):5896
     value
  Min.   :    0.000
  1st Qu.:    0.005
  Median :    0.650
  Mean   :  317.588
  3rd Qu.:   27.000
  Max.   :20450.000
  NA's   : 2134.000

 and

 summary(winters.melt)
      site        sampdate              param      variable
  WC     :601   Min.   :1987-07-23   As     : 96   quant:1189
  WC-2   :327   1st Qu.:1994-06-15   TDS    : 79
  WC-1   :261   Median :1995-07-27   NO3-N  : 74
  BC-0.5 :  0   Mean   :1997-05-15   pH     : 72
  BC-1   :  0   3rd Qu.:1996-07-29   SO4    : 69
  BC-1.5 :  0   Max.   :2011-06-06   Cl     : 64
  (Other):  0                        (Other):735
     value
  Min.   :   0.00
  1st Qu.:   0.05
  Median :   7.59
  Mean   :  79.20
  3rd Qu.:  75.00
  Max.   :2587.00
  NA's   : 252.00

  What might be causing dcast() to fail with these two data frames while it
 succeeds with three others processed using the same syntax? If additional
 information would help, let me know and I'll provide it.

 Puzzled,

 Rich

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replacing matching values by related values

2011-09-18 Thread Justin Haynes
in your assignment for t3 you use nt which is undefined.  thus t.n$treatment
is NAs

but:

df-data.frame(num=1:10,let=letters[1:10])
dat-data.frame(let=sample(letters[1:10],20,replace=T))

dat$matched-df$num[match(dat$let,df$let)]

should get you started


On Sun, Sep 18, 2011 at 7:56 AM, Janssen, K.J.M. 
k.j.m.jans...@umcutrecht.nl wrote:

 Apologies, I wanted to make life easier by shortly describing my problem.
 Indeed, it is better to post the full code.
 I am not familiar with the dput, but I have pasted the code that I have
 used below.


 d - matrix(NA,15,5)
 d - as.data.frame(d)

 colnames(d) - c(studynumber,t1,t2,t[,1],t[,2])

 d$studynumber - c(1:15)# add study numbers to select
 studies in scenarios
 d$t1
 -c(car_pac,car_pac,cis_vin,car_pac,cis_doc,cis_gem,cis_gem,cis_vin,car_pac,car_doc,car_pac,car_pac,car_doc.pac,cis_vin,cis_iri)
 d$t2
 -c(gef,bev_car_pac,cet_cis_vin,gef,gef,bev_cis_gem,cis_pem,cet_cis_vin,car_gem_pac,car_pem,erl,cis_pac,cet_car_doc.pac,cis_doc,car_pac)

 # Link treatment to relating treatment number: make vector of all unique
 treatment options
 t1 - duplicated(c(d$t1,d$t2)) # returns TRUE and False, implying that we
 can need it so select
 t2 - c(d$t1,d$t2) # combine both vectors, as treatments can be both
 reference as index treatment
 t3 - na.omit(ifelse(t1==FALSE,c(d$t1,d$t2),NA))[1:nt] # omit double
 treatment

 #make dataset with first colomn all possible treatments, and second colomn
 their respective numbers
 t.n - matrix(NA,17,2)  # list possible treatments (here 17), and
 link them to numbers
 t.n - as.data.frame(t.n)
 colnames(t.n) - c(treatment,numbers)
 t.n$treatment - t3
 t.n$numbers - 1:17

 # link treatments in d with treatment numbers in dataset t.n

 Here is where I aim to fill d$t[,1] and d$t[,2] with the corresrponding
 numbers from t.n

 Thanks.

 Kristel




 -Oorspronkelijk bericht-
 Van: David Winsemius [mailto:dwinsem...@comcast.net]
 Verzonden: zo 18-9-2011 15:20
 Aan: Janssen, K.J.M.
 CC: michael.weyla...@gmail.com; r-help@r-project.org
 Onderwerp: Re: [R] Replacing matching values by related values


 On Sep 18, 2011, at 3:56 AM, Janssen, K.J.M. wrote:

  Thanks Michael.
  I tested it and it works for numeric values, but not for the 'text'
  values that I am comparing, thus comparing a with a,b, etc.
  Any advice how I can solve it?

 Solve what? You never posted full working code and an explicit
 example. Unless there were actually objects named a, b, c, etc.
 in your workspace then the code that started out: v -
 c(f,a,e,d,m,  would not have been meaningful except to hint at the
 possibility that you might be comparing character vectors. I assumed
 that d[,2] was actually letters[1:17] rather than what you wrote. It's
 especially important to indicate whehte ryou have attached any objects.

 Post dput(head(d)) and dput(v) for the example part and include any
 code use to construct them.

 --
 david.

 
  Thanks!
 
 
  -Oorspronkelijk bericht-
  Van: R. Michael Weylandt michael.weyla...@gmail.com [mailto:
 michael.weyla...@gmail.com
  ]
  Verzonden: zo 18-9-2011 2:27
  Aan: Janssen, K.J.M.
  CC: r-help@r-project.org
  Onderwerp: Re: [R] Replacing matching values by related values
 
  Try playing with match(). Something like
 
  d[match(v,d[,1]),2]
 
  Should work (untested bc I'm writing from my phone though)
 
  Michael Weylandt
 
  On Sep 17, 2011, at 4:33 PM, Janssen, K.J.M. 
 k.j.m.jans...@umcutrecht.nl
   wrote:
 
 
  I am trying to replace values of a vector (consisting of 15 values)
  by a value that is related to a matching value in a dataset
  (consisting of 17 rows).
  Here's an example
  The vector:
  v - c(f,a,e,d,m,o,e,f,i,n,e,i,b,a,o)
 
  The dataset's columns consist of the following values
  d[,1] - c(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q)
  d[,2] - 1:17
 
  So I want to end up with a vector that consists of the values of
  the second colomn, when the value of the vector matches the value
  of the first colomn.
  Thus, I aim to end up with a vector with the following values
  c(6,1,5,4,13,15,5,6,9,14,5,9,2,1,15)
 
  Help is appreciated!
 
 
 --
 
  De informatie opgenomen in dit bericht kan vertrouwelijk zijn en is
  uitsluitend bestemd voor de geadresseerde. Indien u dit bericht
  onterecht
  ontvangt, wordt u verzocht de inhoud niet te gebruiken en de
  afzender direct
  te informeren door het bericht te retourneren. Het Universitair
  Medisch
  Centrum Utrecht is een publiekrechtelijke rechtspersoon in de zin
  van de W.H.W.
  (Wet Hoger Onderwijs en Wetenschappelijk Onderzoek) en staat
  geregistreerd bij
  de Kamer van Koophandel voor Midden-Nederland onder nr. 30244197.
 
  Denk s.v.p aan het milieu voor u deze e-mail afdrukt.
 
 
 --
 
  This message may contain confidential information and is...
  {{dropped:12}}
 
  

Re: [R] R shell line width

2011-09-16 Thread Justin Haynes
you want

options(width= )

you can edit your .Rprofile file and the .First function in there to set it
when you start R or in the console interactively

On Fri, Sep 16, 2011 at 12:48 PM, Mike P mike.polya...@gmail.com wrote:

 Hi,

 I want to apologize in advance if this has already been asked. I
 wasn't able to find any information, either on google or from local
 list search.

 I'm running an R shell from a linux command line, in an xterm window.
 Whenever I print a data frame, only the first couple of columns are
 printed side-by-side, the others are being repositioned below them. It
 seems something is limiting the line width of the output, even though
 there is enough horizontal space to fit each row on a single line.

 For example, this command:

  data.frame(matrix(1:30,nrow=1))

 prints columns 1-21 on the first line, and the rest 22-30 on the second.

 Is there a way I can configure R to increase the width of my output?

 Thanks.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] map

2011-09-13 Thread Justin Haynes
i responded offline the first time, but:

google is your friend:  search for R maps and you'll find what I mention
below.

In the future make sure to perform a thorough search of google and the help
forums before you post


That said... you're looking for the maps package

install.packages('maps')
map('italy')

ggplot2 package has a function called map_data that extracts the lines if
you want the actual data, see the example hadley provided ?ggplot2::map_data




hope that helps,

Justin


On Tue, Sep 13, 2011 at 8:48 AM, Batur swordligh...@gmail.com wrote:

 Adding to the previous question, I would like to map central Asia along
 with
 those five countries (Kazakhstan, Kyrgyzstan, Uzbekstan, Tajikstan and
 Turkmenstan). Please tell us the right data base!!! Thanks a lot!!!

 --
 View this message in context:
 http://r.789695.n4.nabble.com/map-tp3810363p3810421.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reshaping data

2011-09-07 Thread Justin Haynes
look at the melt function in reshape, specifically ?melt.data.frame

require(reshape)
Raw.melt-melt(RawData,id.vars='Year',variable_name='Month')

there is an additional feature in the melt function for handling na values.
names(Raw.melt)[3]-'CO2'

 head(Raw.melt)
  Year MonthCO2
1 1958 J NA
2 1959 J 315.58
3 1960 J 316.43
4 1961 J 316.89
5 1962 J 317.94
6 1963 J 318.74


you can order your data.frame if you'd like

Raw.melt-Raw.melt[order(Raw.melt$Year,Raw.melt$Month),]

 head(Raw.melt)
Year MonthCO2
1   1958 J NA
48  1958 F NA
95  1958 M 315.71
142 1958 A 317.45
189 1958   M.1 317.50
236 1958   J.1 NA


On Wed, Sep 7, 2011 at 7:35 AM, B77S bps0...@auburn.edu wrote:

 I have the following data (see RawData using dput below)

 How do I get it in the following 3 column format (CO2 measurements are the
 elements of the original data frame).  I'm sure the package reshape is
 where
 I should look, but I haven't figured out how.

 Thanks ahead of time

  Month Year CO2
 J   1958
 F   1958
 M   1958315.71
 A   1958317.45
 M.1 1958317.5
 J.1 1958
 J.2 1958315.86
 A.1 1958314.93
 S   1958313.19
 O   1958
 N   1958313.34
 D   1958314.67
 J   1959315.58
 F   1959316.47


 # here is the data

 RawData - structure(list(Year = c(1958, 1959, 1960, 1961, 1962, 1963,
 1964,
 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975,
 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986,
 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997,
 1998, 1999, 2000, 2001, 2002, 2003, 2004), J = c(NA, 315.58,
 316.43, 316.89, 317.94, 318.74, 319.57, 319.44, 320.62, 322.33,
 322.57, 324, 325.06, 326.17, 326.77, 328.54, 329.35, 330.4, 331.74,
 332.92, 334.97, 336.23, 338.01, 339.23, 340.75, 341.37, 343.7,
 344.97, 346.29, 348.02, 350.43, 352.76, 353.66, 354.72, 355.98,
 356.7, 358.36, 359.96, 362.05, 363.18, 365.32, 368.15, 369.14,
 370.28, 372.43, 374.68, 376.79), F = c(NA, 316.47, 316.97, 317.7,
 318.56, 319.08, NA, 320.44, 321.59, 322.5, 323.15, 324.42, 325.98,
 326.68, 327.63, 329.56, 330.71, 331.41, 332.56, 333.42, 335.39,
 336.76, 338.36, 340.47, 341.61, 342.52, 344.51, 346, 346.96,
 348.47, 351.72, 353.07, 354.7, 355.75, 356.72, 357.16, 358.91,
 361, 363.25, 364, 366.15, 368.87, 369.46, 371.5, 373.09, 375.63,
 377.37), M = c(315.71, 316.65, 317.58, 318.54, 319.69, 319.86,
 NA, 320.89, 322.39, 323.04, 323.89, 325.64, 326.93, 327.18, 327.75,
 330.3, 331.48, 332.04, 333.5, 334.7, 336.64, 337.96, 340.08,
 341.38, 342.7, 343.1, 345.28, 347.43, 347.86, 349.42, 352.22,
 353.68, 355.39, 357.16, 357.81, 358.38, 359.97, 361.64, 364.03,
 364.57, 367.31, 369.59, 370.52, 372.12, 373.52, 376.11, 378.41
 ), A = c(317.45, 317.71, 319.03, 319.48, 320.58, 321.39, NA,
 322.13, 323.7, 324.42, 325.02, 326.66, 328.13, 327.78, 329.72,
 331.5, 332.65, 333.31, 334.58, 336.07, 337.76, 338.89, 340.77,
 342.51, 343.56, 344.94, 347.08, 348.35, 349.55, 350.99, 353.59,
 355.42, 356.2, 358.6, 359.15, 359.46, 361.26, 363.45, 364.72,
 366.35, 368.61, 371.14, 371.66, 372.87, 374.86, 377.65, 380.52
 ), M.1 = c(317.5, 318.29, 320.03, 320.58, 321.01, 322.24, 322.23,
 322.16, 324.07, 325, 325.57, 327.38, 328.07, 328.92, 330.07,
 332.48, 333.09, 333.96, 334.87, 336.74, 338.01, 339.47, 341.46,
 342.91, 344.13, 345.75, 347.43, 348.93, 350.21, 351.84, 354.22,
 355.67, 357.16, 359.34, 359.66, 360.28, 361.68, 363.79, 365.41,
 366.79, 369.29, 371, 371.82, 374.02, 375.55, 378.35, 380.63),
J.1 = c(NA, 318.16, 319.59, 319.78, 320.61, 321.47, 321.89,
321.87, 323.75, 324.09, 325.36, 326.7, 327.66, 328.57, 329.09,
332.07, 332.25, 333.59, 334.34, 336.27, 337.89, 339.29, 341.17,
342.25, 343.35, 345.32, 346.79, 348.25, 349.54, 351.25, 353.79,
355.13, 356.22, 358.24, 359.25, 359.6, 360.95, 363.26, 364.97,
365.62, 368.87, 370.35, 371.7, 373.3, 375.4, 378.13, 379.57
), J.2 = c(315.86, 316.55, 318.18, 318.58, 319.61, 319.74,
320.44, 321.21, 322.4, 322.55, 324.14, 325.89, 326.35, 327.37,
328.05, 330.87, 331.18, 331.91, 333.05, 334.93, 336.54, 337.73,
339.56, 340.49, 342.06, 343.99, 345.4, 346.56, 347.94, 349.52,
352.39, 353.9, 354.82, 356.17, 357.03, 357.57, 359.55, 361.9,
363.65, 364.47, 367.64, 369.27, 370.12, 371.62, 374.02, 376.62,
377.79), A.1 = c(314.93, 314.8, 315.91, 316.79, 317.4, 317.77,
318.7, 318.87, 320.37, 320.92, 322.11, 323.67, 324.69, 325.43,
326.32, 329.31, 329.4, 330.06, 330.94, 332.75, 334.68, 336.09,
337.6, 338.43, 339.82, 342.39, 343.28, 344.69, 345.91, 348.1,
350.44, 351.67, 352.91, 354.03, 355, 355.52, 357.49, 359.46,
361.49, 362.51, 365.77, 366.94, 368.12, 369.55, 371.49, 374.5,
375.86), S = c(313.19, 313.84, 314.16, 314.99, 316.26, 316.21,
316.7, 317.81, 318.64, 319.26, 320.33, 322.38, 323.1, 323.36,
324.84, 327.51, 327.44, 328.56, 329.3, 331.58, 332.76, 333.91,
335.88, 336.69, 

Re: [R] Fitting my data to a Weibull model

2011-08-31 Thread Justin Haynes
This is what I use...

fit.func-function(x){
  require(MASS)
  est-fitdistr(x$wind_speed, 'weibull')$estimate
  data.frame(shape=est[1],scale=est[2])
}

feel free to correct me if this is wrong!


Justin


On Wed, Aug 31, 2011 at 6:21 AM, Dennis Murphy djmu...@gmail.com wrote:

 Hi:

 Things work if x is the response and y is the covariate. To use the
 approach I describe below, you need RStudio and its manipulate package
 (which is only available in RStudio - you won't find it on CRAN). You
 can download and install RStudio freely from http://rstudio.org/ ; it
 is available for Windows, Linux and Mac. To quote an old TV commercial
 line in the US: 'Try it, you'll like it' :)

 In the script below, the covariate has to be named x since the script
 calls the curve() function, which plots a mathematical function of a
 single variable named x. As a result, you need to interchange the
 names of your vectors. Within RStudio, copy and paste the following in
 chunks; in particular, copy and paste the code starting with
 'manipulate('  and ending in ')' to generate the sliders for the
 parameter estimates. The idea is to tweak the parameter values until
 you get a fitted model that fits the observed data fairly closely.
 When you achieve that, kill the slider box (upper right corner); the
 estimates at the state where the sliders are closed are then saved in
 a vector called start, which you use in the subsequent nls() call.
 After the model is fit, a sequence of x values is generated as new
 data, the predicted values at those points are computed, and a plot of
 the observed data with overlaid fitted model is produced.

 You have to be a bit careful; occasionally, you'll get an error
 Error in nls(y ~ a - b * exp(-c * x^d), start = start) :
  singular gradient
 If so, just try again with a different set of initial values, trying
 not to overdo it. You don't need to be exact, just close.

 library('manipulate')
 ### Weibull model:

 x - c(1,2,3,4,10,20)
 y - c(1,7,14,25,29,30)

 ## Copy and paste the code chunk below into RStudio,
 ## stopping with the line of hash marks
 start - list()
 # Generate sliders to find good initial parameter estimates
 manipulate(
  {
   plot(y ~ x)
   a - a0; b - b0; c - c0; d - d0
   curve(a-b*exp(-c*x^d), add=TRUE)
   start - list(a=a, b=b, c=c, d=d)
  },
  a0 = slider(10, 50, step=0.1, initial = 30),
  b0 = slider(0, 100, step=1, initial = 3),
  c0 = slider(0, 0.1, step=0.01, initial = 0.01),
  d0 = slider(0, 10, step=0.1, initial = 5)
  )
 ## Stop here ##

 # Fit the model using the estimates from the sliders
 weibm - nls(y ~ a-b*exp(-c*x^d), start = start)
 summary(weibm)

 # Make predictions over a sequence of x values and plot
 ndata - data.frame(x = seq(0, 20, by = 0.1))
 wpred - predict(weibm, newdata = ndata)
 plot(y ~ x, pch = 16)
 lines(ndata$x, wpred, col = 'red')

 ### Logistic:

 start - list()
 manipulate(
  {
plot(y ~ x)
a - a0; b - b0; d - d0
curve(a/(1+b*exp(-d*x)), add=TRUE)
start - list(a=a, b=b, d=d)
  },
  a0 = slider(0, 50, step = 1, initial = 30),
  b0 = slider(0, 20, step = 0.1, initial = 10),
  d0 = slider(0, 1, step = 0.01, initial = 0.1)
  )

 logism - nls(y ~ a/(1+b*exp(-d*x)), start = start)
 summary(logism)

 ldata - data.frame(x = seq(0, 20, by = 0.1))
 lpred - predict(weibm, newdata = ndata)
 plot(y ~ x, pch = 16)
 lines(ldata$x, lpred, col = 'red')

 This is a good exercise to learn how the various parameters affect the
 shape of the curve associated with a particular nonlinear model in one
 variable. It also helps to read about the model in question and
 understand the interpretation associated with each of the parameters.
 That way, you can use the sliders to visualize the effects of changes
 in one parameter when the others are held constant. If you find that
 the boundaries of the sliders are too restrictive, you can always
 reset them and try again. The code above came about from a few
 iterations of tweaking ranges for individual parameters (either wider
 or narrower as the case may be). I always keep the code in an editor
 so that it's easy to change, then copy and paste into the R console.
 If you redo the slider fitting, it's easier to reset the start vector,
 too.

 You'll also notice that one parameter in each of the fitted models is
 nonsignificant, but you need to take into account that you're fitting
 models with three or four parameters to six data points.

 Aside: If you really meant to use y and x as response and covariate,
 respectively, in your posted data example, the sliders will show you
 that the two models are way off the mark, since y would start out
 slowly and then jump exponentially. That would require a completely
 different nonlinear model. You'll also notice that the estimates of a
 and b in the Weibull model are an order 

[R] lubridate and intervals

2011-08-30 Thread Justin Haynes
Hiya,

maybe there is a native R function for this and if so please let me know!

I have 2 data.frames with start and end dates, they read in as strings and I
am converting to POSIXct.  How can I check for overlap?

The end result ideally will be a single data.frame containing all the
columns of the other two with rows where there were date overlaps.


df1-data.frame(start=as.POSIXct(paste('2011-06-01 ',1:20,':00',sep='')),
end=as.POSIXct(paste('2011-06-01 ',1:20,':30',sep='')))
df2-data.frame(start=as.POSIXct(paste('2011-06-01
',rep(seq(1,20,2),2),':',sample(1:19,20,replace=T),sep='')),
end=as.POSIXct(paste('2011-06-01
',rep(seq(1,20,2),2),':',sample(20:50,20),sep='')))

I tried:
library(lubridate)

df1$interval-new_interval(df1$start,df1$end)

 df1$interval[1]
[1] 2011-06-01 01:00:00 -- 2011-06-01 01:30:00
 df2$start[1]
[1] 2011-06-01 01:17:00 PDT

but

 df2$start[1] %in% df1$interval[1]
[1] FALSE


This must be fairly straight forward and I just don't know where to look!


Thanks,
Justin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to referee a dimension name via a variable?

2011-08-29 Thread Justin Haynes
try:

newnam-paste('newdatadat',dayno,sep='')

plot(test[[newnam[1]]])


On Mon, Aug 29, 2011 at 12:29 PM, Jie TANG totang...@gmail.com wrote:

 hi, R-users
   I have a data.frame for example  test$newdataday24 and test$newdataday48
 I can plot them by
 plot(test$newdataday24)
 but now i want to plot different data by define a variable to describe them
 dayno-c(24,48)
 newnam-paste(test$newdataday,dayno,sep=)
 plot(newnam[1])

 but i failed,the error message said that something wrong with plot.window

 what can i do to fix my script ? thanks
 -
 TANG Jie

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] debugging functions in R

2011-08-24 Thread Justin Haynes
Another great tool is debugonce()

wrap your function name in it and then execute your function call.

debugonce(my.function)

out-my.function(df)

And you'll be brought into the same interactive browser. (its Vi if im not
mistaken which can take a little getting used to.)


Justin


On Wed, Aug 24, 2011 at 7:29 AM, Liviu Andronic landronim...@gmail.comwrote:

 On Wed, Aug 24, 2011 at 4:20 PM, Eran Eidinger e...@taykey.com wrote:
  Hi,
 
  I am not sure if this is the right list to ask this question (though I
 did
  not find a more appropriate one).
  I've started using R a month ago, and small scripts work fine. However,
 when
  I start writing more complex code, it gets messy.
 
  1. Is there any way to debug normally, with breakpoints?
 

  fortune('browser')

 My solution when I run into mysteries like this is to put 'browser()' in
 the
 function just before or after the line of interest. The magnitude and
 direction
 of my stupidity usually become clear quickly.
   -- Patrick Burns
  R-help (February 2006)


 Use browser() to inspect the environment and execute the code one step
 at a time.
 Liviu


 2. I am using the Eclipse plugin (StatET), and tried JGR(). Is there an
 IDE
  that enables breakpoints?
  3. Is there an equivalent to include in other programming languages? So
  many functions in one file are very messy. I would like to break it to
  several files.
  4. Any way to create a local context of variables inside a function?
  Otherwise I have to be careful to give different names inside functions,
 to
  those in the workspace.
 
  I should point that I am a long time Matlab user and am probably
 expecting
  some things that don't necessarily exist in R...
 
  I know it's a lot, if there is a more appropriate forum to ask these,
 please
  point me in that direction.
 
  Thanks,
  Eran.
 
  *
 
  *
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Do you know how to read?
 http://www.alienetworks.com/srtest.cfm
 http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
 Do you know how to write?
 http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.numeric() and POSIXct format

2011-08-24 Thread Justin Haynes
as.POSIXct(518400,origin='2001-01-01')
[1] 2001-01-07 PST


as.POSIXct(as.numeric(as.POSIXct(518400,origin='2001-01-01')),origin='1970-01-01')
[1] 2001-01-07 08:00:00 PST


On Wed, Aug 24, 2011 at 9:22 AM, Agustin Lobo agustin.l...@ija.csic.eswrote:

 Hi!

 I'm confused by this:
  as.numeric(as.POSIXct(518400,**origin=2001-01-01))
 [1] 978822000

 I guess the problem is that as.numeric() assumes a different origin, but
 cannot find
 any default origin.

 How can I get back the seconds from the POSIXct format? In other words,
 which the inverse function of as.POSIXct()?
 I've tried as.numeric and unclass() using a origin= argument, but this does
 not work.

 Thanks

 Agus

 --
 Dr. Agustin Lobo
 Institut de Ciencies de la Terra Jaume Almera (CSIC)
 LLuis Sole Sabaris s/n
 08028 Barcelona
 Spain
 Tel. 34 934095410
 Fax. 34 934110012
 email: agustin.l...@ija.csic.es

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] subsetting a list of matrices

2011-08-23 Thread Justin Haynes
His is better, but you can also use a for loop...

out-data.frame(rows=1:3)
for(i in 1:3){
  if(l[[i]][3]=='Message 1') {
out$V1[i]-l[[i]][1]
  } else {
out$V1[i]-NA
  }
}

but shouldn't if your list is very long

On Tue, Aug 23, 2011 at 9:35 AM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this:

 subset(as.data.frame(do.call(rbind, lapply(l, [, , 1))), row3 == Message
 1)

 On Tue, Aug 23, 2011 at 1:28 PM, Lara Poplarski larapoplar...@gmail.com
 wrote:
  Hi all,
 
  I have an object that looks (roughly) like the following:
 
  l - list(a = matrix(rnorm(9), 3), b = matrix(rnorm(9), 3), c =
  matrix(rnorm(9), 3))
 
  l$a[3,] - sample(c(Message 1, Message 2, Message 3))
  l$b[3,] - sample(c(Message 1, Message 2, Message 3))
  l$c[3,] - sample(c(Message 1, Message 2, Message 3))
 
  rownames(l$a) - rownames(c(1:3), do.NULL = FALSE, prefix = row)
  rownames(l$b) - rownames(c(1:3), do.NULL = FALSE, prefix = row)
  rownames(l$c) - rownames(c(1:3), do.NULL = FALSE, prefix = row)
 
  colnames(l$a) - c(V1, V2, V3)
  colnames(l$b) - c(V1, V2, V3)
  colnames(l$c) - c(V1, V2, V3)
 
  I want to extract values (row1, V1) for the three sublists a, b, c,
  but only for those cases in which row3 == Message 1. Could someone
  suggest how to proceed?
 
  Many thanks in advance,
  Lara
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ddply - how to transform df column in place

2011-08-23 Thread Justin Haynes
Jean,

Ista is right, but:

In your function you are asking as.Date to convert the whole data.frame df
rather than just your daterep column.

out-ddply(d2, .(daterep), function(df)
as.Date(strptime(df$daterep,format='%Y%m%d')))
str(out)
'data.frame':30 obs. of  2 variables:
 $ daterep: num  20100801 20100802 20100803 20100804 20100805 ...
 $ V1 : Date, format: 2010-08-01 2010-08-02 2010-08-03
2010-08-04 ...


On Tue, Aug 23, 2011 at 3:16 PM, jjap jean.plamon...@fpinnovations.cawrote:

 Dear R-users,

 I am trying to get the plyr syntax right, without much success.

 Given:
 d- data.frame(cbind(x=1,y=seq(20100801,20100830,1)))
 names(d)-c(first, daterep)
 d2-d

 # I can convert the daterep column in place the classic way:
 d$daterep-as.Date(strptime(d$daterep, format=%Y%m%d))

 # How to do it the plyr way?
 ddply(d2, c(daterep), function(df){as.Date(df, format=%Y%m%d)})
 # returns: Error in as.Date.default(df, format = %Y%m%d) :
 #   do not know how to convert 'df' to class Date

 Thanks for any hints,

 ---jean

 --
 View this message in context:
 http://r.789695.n4.nabble.com/ddply-how-to-transform-df-column-in-place-tp3764037p3764037.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help: Sort components of a vector with indices tracked in R

2011-08-23 Thread Justin Haynes
If you make your vector a data.frame, you will have row numbers accompanying
your sorting

df-data.frame(V1=c(1,4,3,2))
df$rows-row.names(df)

df[order(df$V1),]

also, you shouldn't use c as a variable name since its an important R
function...

see your example :)

Justin


On Tue, Aug 23, 2011 at 4:59 PM, Chee Chen chee.c...@yahoo.com wrote:

 Dear All,
 I would like to know how to sort a vector of numeric values such that we
 know the original index of each ordered component. Say, we have
 c - c(1,4,3,2)
 csort - sort(c,descreasing=FALSE)
 With a few components of c, we can manually find out:
  csort[1] = 1 = c[1], ie, the original index of csort[1] is 1,
 csort[2] =2 =c[4], ie, the original index of csort[2] is 4.

 When length(c) is very large, manual checking is infeasible.
 We can set up a for loop to compare and extract the index. However, is
 there an easier way to do this, so that the output is the sorted vector and
 their corresponding original indices.
 Thanks
 Chee
[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot in a function confusion!

2011-08-15 Thread Justin Haynes
Whats going on here?

df-data.frame(x=1:10,y=1:10)

ggplot()+geom_point(data=df,aes(x=x,y=y))  ## this is the normal usage
right?

ggplot()+geom_point(data=df,aes(x=df[,1],y=df[,2]))  ## but I can also feed
it column indices
ggplot()+geom_point(aes(x=df[,'x'],y=df[,'y']))  ## or column names.

## but if i wrap it in a function...

plot.func.one-function(dff,x.var,y.var){
print(ggplot() + geom_point(aes(x=dff[,x.var],y=dff[,y.var])))
}

plot.func.two-function(dff,x.var,y.var){
print(ggplot() + geom_point(data=dff,aes(x=dff[,x.var],y=dff[,y.var])))
}

plot.func.three-function(dff,x.var,y.var){
print(ggplot() + geom_point(data=dff,aes(x=eval(x.var),y=eval(y.var
}

plot.func.one(df,1,2) ## i assume the dff not found error is happening in
the aes call rather than the data= portion..
plot.func.one(df,'x','y')  ## but why does it work in the global env and not
within a function?

plot.func.two(df,1,2)
plot.func.two(df,'x','y')

var.x-'x'
var.y-'y'
plot.func.three(df,var.x,var.y)  ## why does it give the error on y.var
instead of x.var?
plot.func.three(df,'x','y')

dff-df
x.var-var.x
y.var-var.y

plot.func.one(dff,x.var,y.var)  ## now whats going on?  I assume this works
because ggplot is looking globally rather than within the function...
plot.func.two(dff,x.var,y.var)
plot.func.three(dff,x.var,y.var)

nothing seems to work right!  How do I plot within a function where I can
feed the function a data.frame and the columns I want plotted?

I assume this is some interesting name space issue but if you guys can
enlighten me as to what's going on...


Thanks,
Justin


P.S.  So before I sent this I dug some more and found my answer, aes_string:

plot.func-function(dff,x.var,y.var){
print(ggplot() + geom_point(data=dff,aes_string(x=x.var,y=y.var)))
}

plot.func(df,'x','y')

works great.  But I still wouldn't mind some clarification on what's
happening in my earlier examples.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sequential Naming of ggplot .pngs using plyr

2011-08-10 Thread Justin Haynes
If I have data:

dat-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5))

And want to plot like this:

ctr-1
for(i in c('a','b','c','d')){
png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
width=11,units='in',pointsize=9,res=300)
print(ggplot(dat[,names(dat) %in%
c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
number',ctr,sep=' ')))
dev.off()
ctr-ctr+1
}

Is there a way to do the same naming using plyr (or data.table or foreach
which I am not familiar with at all!)?

m.dat-melt(dat,id.vars='site')
ddply(m.dat,.(variable),function(df)
print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?)

And better yet, is there a way to do it using .parallel=T?

Faceting is not really an option (unless I can facet onto multiple pages of
a pdf or something) because these need to go into reports as individually
labelled and titled plots.


As a bit of a corollary, is it really worth the headache to resolve this if
I am only using melt/plyr to split on the four letter variables? With a
larger set of data (1e6 rows), the melt/plyr version takes a significant
amount of time but .parallel=T drops the time significantly.  Is the right
answer a foreach loop and can I do that with the increasing counter? (I
haven't gotten beyond Hadley's .parallel feature in my parallel R
dealings.)


dat-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5))
 ctr-1
 system.time(for(i in c('a','b','c','d')){
+ png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
width=11,units='in',pointsize=9,res=300)
+ print(ggplot(dat[,names(dat) %in%
c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
number',ctr,sep=' ')))
+ dev.off()
+ ctr-ctr+1
+ })
   user  system elapsed
 54.630   0.120  54.843

 system.time(
+ ddply(melt(dat,id.vars='site'),.(variable),function(df) {
+
png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
+ print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
+ dev.off()
+ },.parallel=F)
+ )
   user  system elapsed
  58.400.13   58.63

 system.time(
+ ddply(melt(dat,id.vars='site'),.(variable),function(df) {
+
png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
+ print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
+ dev.off()
+ },.parallel=T)
+ )
   user  system elapsed
  70.333.46   27.61


How might I speed this up and include the sequential plot names?

Thanks a bunch!

Justin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sequential Naming of ggplot .pngs using plyr

2011-08-10 Thread Justin Haynes
Thanks Ista,

In my real code that is exactly what I'm doing, but I want to prepend the
names with a sequential number for easier reference once the pngs are made.

My initial thought was to add the sequential number to the data before
sending it to plyr and drawing it out there, but that seems like an
excessive extra step when I have 1e6 - 1e7 rows.


Justin


On Wed, Aug 10, 2011 at 2:42 PM, Ista Zahn iz...@psych.rochester.eduwrote:

 Hi Justin,

 On Wed, Aug 10, 2011 at 5:04 PM, Justin Haynes jto...@gmail.com wrote:
  If I have data:
 
 
 dat-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5))
 
  And want to plot like this:
 
  ctr-1
  for(i in c('a','b','c','d')){
 png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
  width=11,units='in',pointsize=9,res=300)
 print(ggplot(dat[,names(dat) %in%
 
 c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
  number',ctr,sep=' ')))
 dev.off()
 ctr-ctr+1
  }
 
  Is there a way to do the same naming using plyr (or data.table or foreach
  which I am not familiar with at all!)?

 This is not the same naming, but the same general idea can be
 achieved with plyr using

  d_ply(melt(dat,id.vars='site'),.(variable),function(df) {
 png(file=paste(plyr_plot, unique(df$variable),
 .png),height=8.5,width=11,units='in',pointsize=9,res=300)
 print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
 dev.off()
  })

 I'm not up to speed on .parallel, foreach etc., so I'l leave the rest
 to someone else.

 Best,
 Ista
 
  m.dat-melt(dat,id.vars='site')
  ddply(m.dat,.(variable),function(df)
  print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?)
 
  And better yet, is there a way to do it using .parallel=T?
 
  Faceting is not really an option (unless I can facet onto multiple pages
 of
  a pdf or something) because these need to go into reports as individually
  labelled and titled plots.
 
 
  As a bit of a corollary, is it really worth the headache to resolve this
 if
  I am only using melt/plyr to split on the four letter variables? With a
  larger set of data (1e6 rows), the melt/plyr version takes a significant
  amount of time but .parallel=T drops the time significantly.  Is the
 right
  answer a foreach loop and can I do that with the increasing counter? (I
  haven't gotten beyond Hadley's .parallel feature in my parallel R
  dealings.)
 
 
 
 dat-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5))
  ctr-1
  system.time(for(i in c('a','b','c','d')){
  + png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
  width=11,units='in',pointsize=9,res=300)
  + print(ggplot(dat[,names(dat) %in%
 
 c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
  number',ctr,sep=' ')))
  + dev.off()
  + ctr-ctr+1
  + })
user  system elapsed
   54.630   0.120  54.843
 
  system.time(
  + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
  +
 
 png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
  + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
  + dev.off()
  + },.parallel=F)
  + )
user  system elapsed
   58.400.13   58.63
 
  system.time(
  + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
  +
 
 png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
  + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
  + dev.off()
  + },.parallel=T)
  + )
user  system elapsed
   70.333.46   27.61
 
 
  How might I speed this up and include the sequential plot names?
 
  Thanks a bunch!
 
  Justin
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Ista Zahn
 Graduate student
 University of Rochester
 Department of Clinical and Social Psychology
 http://yourpsyche.org


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] binary conversion list to data.frame with plyr... AND NO LOOPS!

2011-07-08 Thread Justin Haynes
Happy weekend helpeRs!

As usual, I'm stumped by R...

My plan was to take an integer number, convert it to binary and wind
up with a data.frame where each column is either 1 or 0 so I can see
which bits are changing:

bb-function(i) ifelse(i, paste(bb(i %/% 2), i %% 2, sep=), )
my.dat-c(36,40,10,4)
my.binary.dat-bb(my.dat)
my.list-strsplit(my.binary.dat,'')

max.len-max(ldply(my.list,length))
len-length(my.list)
my.df-data.frame(two=rep(0,len),four=rep(0,len),eight=rep(0,len),sixteen=rep(0,len),thirtytwo=rep(0,len),sixtyfour=rep(0,len))
for(i in 1:length(my.list)){
for(j in 1:length(my.list[[i]])){
my.df[i,max.len-length(my.list[[i]])+j]-my.list[[i]][j]
}
}

But this isn't exactly feasable on a million+ rows where some binary
numbers are 20 digits...  I know theres a way without loops I just
know it!

Ideally, I can do this to multiple columns of a data.frame and have
them named accordingly (V1.two,V1.four... V2.two,V2.four, etc.)


Thanks,

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rle with NA values?

2011-06-24 Thread Justin Haynes
Happy Friday!

Using this function:

fixSeq - function(df) {
  shift1 - function(x) c(1, x[-length(x)])
  df$state_shift-df$state
  df.rle-rle(df$state_shift)
  repeat {
shifted.sf-shift1(df.rle$values)
change - df.rle$values = 4  shifted.sf = 4  shifted.sf != df.rle$values
if(any(change))
df.rle$values[change] - shifted.sf[change] else break
}
gc()
  df$state_shift-inverse.rle(df.rle)
  return(df)
}

I would like to separate runs where the removed NAs will separate runs
into two separate runs.
to illustrate with a short example:

 dat-data.frame(id=1,state=c(1,2,4,4,5,NA,5,5,1))

 fixSeq(dat)
Error in df.rle$values[change] - shifted.sf[change] :
  NAs are not allowed in subscripted assignments

 fixSeq(na.omit(dat))
  id state state_shift
1  1 1   1
2  1 2   2
3  1 4   4
4  1 4   4
5  1 5   4
7  1 5   4
8  1 5   4
9  1 1   1


rather than the true output of 1 2 4 4 4 5 5 1.  The NA makes the
second pair of 5s a unique state rather than a continuation of the
previous state 4.  Is this best accomplished by assigning NA to a
value like -99?  or do I have other options?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rle on large data . . . without a for loop!

2011-06-17 Thread Justin Haynes
I think need to do something like this:

dat-data.frame(state=sample(id=rep(1:5,each=200),1:3, 1000,
replace=T,prob=c(0.7,0.05,0.25)),V1=runif(1,10,1000),V2=rnorm(1000))
rle.dat-rle(dat$state)
temp-1
out-data.frame(id=1:length(rle.dat$length))
for(i in 1:length(rle.dat$length)){
temp2-temp+rle.dat$length[[i]]
out$V1[i]-mean(dat$V1[temp:temp2])
out$V2[i]-sum(dat$V2[temp:temp2])
out$state[i]-rle.dat$value[[i]]
temp-temp2
}

to a very large dataset.  I want to apply a few summary functions to
some variables within a data.frame for given states. to complicate
things, id like to use plyr and split on the id variable before i do
any of this...

loop.func-function(dat){
  rle.dat-rle(dat$state)
  temp-1
  out-data.frame(id=1:length(rle.dat$length))
  for(i in 1:length(rle.dat$length)){
temp2-temp+rle.dat$length[[i]]
out$V1[i]-mean(dat$V1[temp:temp2])
out$V2[i]-sum(dat$V2[temp:temp2])
out$state[i]-rle.dat$value[[i]]
temp-temp2
  }
  return(out)
}
out-ddply(dat,.(id),loop.func)

mostly, i just don't understand how to use a list (especially in this
instance) in a plyr/apply statement...


Thanks,

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gridExtra with cairodevie and ggplots

2011-06-14 Thread Justin Haynes
I apologise in advance for not providing code, but this seems like a
straight forward question...

I am making a few full page plots some of which are portrait and
some of which are landscape

I would like to open my cairo device once and put all the plots in the
same .pdf.  But since some
need to be rotated to fit the cairo device dimensions, is there a
simple parameter to arrangeGrob
(im using grid.arrange to generate the final plot) that will rotate
the entire output 90 degrees so all
my pages can be the same direction?


Thanks,
Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gridExtra with cairodevie and ggplots

2011-06-14 Thread Justin Haynes
Thats perfect, thank you!

On Tue, Jun 14, 2011 at 2:10 PM, baptiste auguie
baptiste.aug...@googlemail.com wrote:
 Hi,

 You can draw arrangeGrob in a rotated viewport,

 library(gridExtra)
 library(ggplot2)
 ps = replicate(4, qplot(rnorm(10), rnorm(10)), simplify=F)
 g = gTree(children=gList(do.call(arrangeGrob, ps)), vp=viewport(angle=90))
 grid.draw(g)

 though you get some warnings about clipping for some reason.

 Perhaps more cleanly, you can define a print.arrange method,
 (shamelessly borrowed from ggplot2),

 print.arrange = function (x, newpage = is.null(vp), vp = NULL, ...)
 {
       if (newpage)
        grid.newpage()
    if (is.null(vp)) {
        grid.draw(x)
    }
    else {
        if (is.character(vp))
            seekViewport(vp)
        else pushViewport(vp)
        grid.draw(x)
        upViewport()
    }
 }

 print(do.call(arrangeGrob, ps), vp=viewport(angle=90))

 HTH,

 baptiste

 On 15 June 2011 08:39, Justin Haynes jto...@gmail.com wrote:
 I apologise in advance for not providing code, but this seems like a
 straight forward question...

 I am making a few full page plots some of which are portrait and
 some of which are landscape

 I would like to open my cairo device once and put all the plots in the
 same .pdf.  But since some
 need to be rotated to fit the cairo device dimensions, is there a
 simple parameter to arrangeGrob
 (im using grid.arrange to generate the final plot) that will rotate
 the entire output 90 degrees so all
 my pages can be the same direction?


 Thanks,
 Justin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ragged data.frame? using plyr

2011-06-02 Thread Justin Haynes
I have a dataset that looks like:


set.seed(144)
sam-sample(1000,100)
dat-data.frame(id=letters[1:10],value=rnorm(1000),day=c(rep(1,100),rep(2,100),rep(3,100),rep(4,100),rep(5,100)))

I want to normalise it using the following function (unless you have
a better idea...):

adj.values-function(dframe){
  value_mean-mean(dframe$value)
  value_sd-sd(dframe$value)
  norm_value-(dframe$value-value_mean)/value_sd
  score_scale-100
  score_offset-1000
  scaled_value-norm_value*score_scale+score_offset
  names(scaled_value)-dframe$id
  return(scaled_value)
}

score_out-ddply(dat,.(day),adj.values)

Gives me my data.frame all nice and pretty and ready to do the following:

score_out.melt-melt(score_out,id='day')
names(score_out.melt)-c('day','id','score')

tblscore_mean-tapply(score_out.melt$score,INDEX=score_out.melt$id,mean)
tblscore_iqr-tapply(score_out.melt$score,INDEX=score_out.melt$id,IQR)

score_mean_iqr-data.frame(id=names(tblscore_iqr),mean=tblscore_mean,iqr=tblscore_iqr)

However, as it turns out, my data look more like:

dat-dat[-sam]

ldply(dlply(dat,.(id,day),adj.values),length)

So on different days I only have data for some of the id variables
which leads to a ragged data.frame.

ddply(dat,.(id,day),adj.values)

can i do something like

ldply(dlply(dat,.(id.day),adj.values), function(x){put in a NA for the
places where data is missing?})


To give you a sense of where this is going, I'm eventually going to
plot the mean of each id variable over the time period vs. its IQR
(again unless you have a better idea...).


As always,

thanks for your help!

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] count value changes in a column

2011-05-31 Thread Justin Haynes
is there a way to look for value changes in a column?

set.seed(144)
df-data.frame(state=sample(rep(1:5,200),1000))

any of the five states are acceptable.  however if, for example,
states 4 or 5 follow state 3, i want to overwrite them with 3.
changes from 1 to any value and 2 to any value are acceptable as are
changes from any value to 1 or 2.

By way of an example:

the sequence 1 3 3 5 5 3 2 4 2 1 5 3 3 5

should read   1 3 3 3 3 3 2 4 2 1 5 5 5 5


Thanks for the help!

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] count value changes in a column

2011-05-31 Thread Justin Haynes
I apologize for the confusion but that solution will work with a twist.

I want to record only the first value of a state change that goes above 2.
so if the sequence is

344455544334 it should read all 3s

but 3442555414433 should read 33321


Hope that helps clarify, if not I can get there from your function Bill,

Thanks!

Justin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot geom_boxplot vertical margins

2011-05-18 Thread Justin Haynes
If you plot:

df-data.frame(x=factor(1:100),y=rnorm(1000))
ggplot(df,aes(x=x,y=y))+geom_boxplot()

How do I remove those pesky margins on the sides of the plot area?  Or
maybe just reduce their size to something more like the spacing of the
boxes?


Thanks,

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot geom_boxplot vertical margins

2011-05-18 Thread Justin Haynes
Exactly!

Thanks, I couldn't find that anywhere!

On Wed, May 18, 2011 at 1:59 PM, Felipe Carrillo
mazatlanmex...@yahoo.com wrote:
 Is this what you want? You can control how much space you
 want to see on the sides of the plot:

 df-data.frame(x=factor(1:100),y=rnorm(1000))
 ggplot(df,aes(x=x,y=y))+geom_boxplot() + scale_x_discrete(expand=c(0,0))



 Felipe D. Carrillo
 Supervisory Fishery Biologist
 Department of the Interior
 US Fish  Wildlife Service
 California, USA
 http://www.fws.gov/redbluff/rbdd_jsmp.aspx




 - Original Message 
 From: Justin Haynes jto...@gmail.com
 To: r-help@r-project.org
 Sent: Wed, May 18, 2011 1:51:19 PM
 Subject: [R] ggplot geom_boxplot vertical margins

 If you plot:

 df-data.frame(x=factor(1:100),y=rnorm(1000))
 ggplot(df,aes(x=x,y=y))+geom_boxplot()

 How do I remove those pesky margins on the sides of the plot area?  Or
 maybe just reduce their size to something more like the spacing of the
 boxes?


 Thanks,

 Justin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I break my addiction to for loops!?!?

2011-05-13 Thread Justin Haynes
I know I'm not supposed to use them... but they're just so easy! I
have trouble defining an appropriate function for plyr or apply!

data-rnorm(144)
groups1-c('a','b','c','d')
groups2-c('aa','bb','cc','dd')
machines-1:12
df-data.frame(machine=machines,group1=groups1,group2=groups2,U=data,V=2*data,W=data^2,X=1/data,Y=data+2,Z=2/data)

So... I am currently generating a table and a geom_boxplot and squish
em together with gridExtra.  But, for columns U,V and W I want to use
group1 as my split variable and columns X, Y and Z I will use group2.
I also need to make it as flexible as possible.

What I've got now is...

box.vars-match(c('U','V','W'),colnames(df))
index.group-match('group1',colnames(df))

group.types-unique(df[,index.group])

for(j in 1:length(group.types)){
  for(i in 1:length(box.vars)){
index.rows-which(df[,index.group]==group.types[j]  df[,box.vars[i]]!=0)

p-ggplot(data=df,aes(x=factor(df$machine[index.rows]),y=df[index.rows,box.vars[i]]))
p-p+geom_boxplot()+labs(x='Machine ID',y=names(df[box.vars[i]]))
p-p+opts(axis.text.x=theme_text(angle=50,size=7))

mins-round(tapply(df[index.rows,box.vars[i]],df$machine[index.rows],min),digits=3)

maxes-round(tapply(df[index.rows,box.vars[i]],df$machine[index.rows],max),digits=3)

medians-round(tapply(df[index.rows,box.vars[i]],df$machine[index.rows],median),digits=3)
table.out-data.frame(min=mins,median=medians,max=maxes)
# + misc. gridExtra lines
  }
}

Currently I hard code the box.vars and index.group which is ok with
me, but the for loops should be in a fancy function.  Anyway, im sure
theres an elegant plyr or apply that can do this for me... but as I
said before, I need a FA Group (for loops anonymous)...

Also, this winds up being a lot of calcs on a big data set.  So, if
you have magical ff, big.memory and/or doMC suggestions I'm all ears,
I just have very little understanding of how they're working.

Thanks for your help,
Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xtable without a loop alongside a ggplot

2011-05-04 Thread Justin Haynes
I would like to create a table of my points and identify which
'quadrant' of a plot they are in with the 'origin' at the means.  the
kicker is i would like to display it right next to or below a ggplot
of the data.  Maybe xtable isnt the right thing to use, but its the
only thing i can think of.  Any help is appreciated!

set.seed(144)
x=rnorm(100,mean=5,sd=1)
test-data.frame(x=x,y=x^2)
test$right-sapply(test$x,function(x) {mean.x-mean(test$x);any(xmean.x)})
test$up-sapply(test$y,function(y) {mean.y-mean(test$y);any(ymean.y)})

for(i in 1:length(test$x)){
  if(test$right[i]==TRUE  test$up[i]==TRUE)
print(paste(rownames(test[i,]),'is in the upper right quadrant'))
  if(test$right[i]==FALSE  test$up[i]==TRUE)
print(paste(rownames(test[i,]),'is in the upper left quadrant'))
  if(test$right[i]==TRUE  test$up[i]==FALSE)
print(paste(rownames(test[i,]),'is in the lower right quadrant'))
  if(test$right[i]==FALSE  test$up[i]==FALSE)
print(paste(rownames(test[i,]),'is in the lower left quadrant'))
}

I know theres a better way then using a for loop!  and I haven't the
foggiest how to use xtable.  as i said, the ultimate goal is to create
a plot with a table along side it showing outliers and where they
appear using the inout function from the splancs package and a
confidence ellipse from the ellipse package.

Thank you for your help as usual!

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MASS fitdistr with plyr or data.table?

2011-04-27 Thread Justin Haynes
I am trying to extract the shape and scale parameters of a wind speed
distribution for different sites.  I can do this in a clunky way, but
I was hoping to find a way using data.table or plyr.  However, when I
try I am met with the following:

set.seed(144)
weib.dist-rweibull(1,shape=3,scale=8)
weib.test-data.table(cbind(1:10,weib.dist))
names(weib.test)-c('site','wind_speed')

fitted-weib.test[,fitdistr(wind_speed,'weibull'),by=site]

Error in class(ans[[length(byval) + jj]]) = class(testj[[jj]]) :
  invalid to set the class to matrix unless the dimension attribute is
of length 2 (was 0)
In addition: Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
...
10: In dweibull(x, shape, scale, log) : NaNs produced

(the warning messages are normal from what I can tell)

or using plyr:

set.seed(144)
weib.dist-rweibull(1,shape=3,scale=8)
weib.test.too-data.frame(cbind(1:10,weib.dist))
names(weib.test.too)-c('site','wind_speed')

fitted-ddply(weib.test.too,.(site),fitdistr,'weibull')

Error in .fun(piece, ...) : 'x' must be a non-empty numeric vector

those sound like similar errors to me, but I can't figure out how to
make them go away!

to prove I'm not crazy:

fitdistr(weib.dist,'weibull')$estimate
   shapescale
2.996815 8.009757
Warning messages:
1: In dweibull(x, shape, scale, log) : NaNs produced
2: In dweibull(x, shape, scale, log) : NaNs produced
3: In dweibull(x, shape, scale, log) : NaNs produced
4: In dweibull(x, shape, scale, log) : NaNs produced

Thanks

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MASS fitdistr call in plyr help!

2011-04-22 Thread Justin Haynes
I have a set of wind speeds read at different locations.  The data is
a data frame with two columns: site and wind speed.  I want to split
the data on site and call a function to find the shape and scale
parameters of a weibull distribution fit.

The end result is a plot with x-axis = shape and y-axis = scale.
Currently my code looks like:

  fit_wind_speed-function(x){
x-replace(x,x=0,0.0001)
  temp-fitdistr(na.exclude(x[,1]),weibull)
l-length(names(x))
  for(i in 1:l){
  temp[i]-(fitdistr(na.exclude(x[,i]),weibull))
}
  temp
  }

  wind_speed_wide_dataframe-function(x){
mini-min(x$site)
maxi-max(x$site)
ws.plot-as.matrix(subset(x,site==mini,select=(wind_speed)))
row.names(ws.plot)-NULL
for(i in (mini+1):maxi){
  temp-as.matrix(subset(x,site==i,select=(wind_speed)))
  row.names(temp)-NULL
  ws.plot-add.col(ws.plot,temp)
}
as.data.frame(ws.plot)
  }

ws.plots-wind_speed_wide_dataframe(dataset[,c(1,3)])
names(ws.plots)-c(min(dataset$site):max(dataset$site))
fit-fit_wind_speed(ws.plots)
names(fit)-names(ws.plots)
l-length(fit)
i-1:l
j-1:2
temp2-data.frame(1:l,2)
temp-data.frame(names(fit),2)
for(i in 1:l){temp-data.frame(fit[i])}
for(i in 1:l){temp[i]-data.frame(fit[i])}
for(i in 1:l){temp2[i,j]-temp[j,i]}
names(temp2)-c(shape,scale)

Id like to combine the two functions into one plyr call, but I can't
figure out how it would work!  If there is a better package than MASS
i'm all ears for that too.

Thanks,

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] string interpolation

2011-03-21 Thread Justin Haynes
Is there a way to do this in R? I have data in the form:

57_input  57_output  58_input  58_output  etc.

can i use a for loop (i in 57:n)  that plots only the outputs?  I want
this to be robust so im not specifying a column id but rather
something like c++ code,

%s_input, i

is that doable in R?

Thanks,
justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] linear regression in a data.frame using recast

2011-03-16 Thread Justin Haynes
I have a very large dataset with columns of id number, actual value,
predicted value.  This used to be a time series but I have dropped the
time component.  So I now have a data.frame where the id number is
repeated but each value in the actual and predicted columns are
unique.

I assume I need to use recast somehow but I'm at a loss... how can I
perform a simple linear regression (using lm()?) on my two variables
for each unique id number?

additionally, I need to fix the y-intercept at zero.


Thanks for your help,

Justin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.