Re: [R] seek(), skip by bits (not by bytes) in binary file

2012-06-19 Thread Ben quant
This post got me thinking and this works (fast!) to get the first 10
integers that I want:

#I'm still testing this...
# once I find the value of 'junk' and 'size_to_use', which I already
had/have.

to.read = file(file_path_name,"rb")
seek(to.read,where=junk)
data1 = readBin(to.read,integer(),n=10,size=size_to_use)

Seems kinda silly that I didn't think of this before...I looked into using
seek() before...

Anyway, thanks for helping me think it through.

PS - I still don't know how to use "the 3rd bit of the 71st byte" ...or was
that an example of how to think about the problem?

Thanks!
Ben


On Tue, Jun 19, 2012 at 11:07 AM, Jeff Newmiller
wrote:

> If the structure really changes day by day, then you have to decipher how
> it is constructed in order to find the correct bit to go to.
>
> If you think you already know which bit to go to, then the way you know is
> "the 3rd bit of the 71st byte", which means that the existing seek function
> should be sufficient to get that byte and pick apart the bits to get the
> ones you want.
>
> I recommend using the hexBin package for this kind of task.
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live
> Go...
>  Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> -------
> Sent from my phone. Please excuse my brevity.
>
>
>
> Ben quant  wrote:
>
> >Other people at my firm who know a lot about binary files couldn't
> >figure
> >out the parts of the file that I am skipping over. Part of the issue is
> >that there are several different files (dbs extension files) like this
> >that
> >I have to process and the structures do change depending on the source
> >of
> >these files.
> >
> >In short, the problem is over my head and I was hoping to go right to
> >the
> >correct bit and read, which would make things much easier. I guess
> >not...
> >Thanks for your help though.
> >
> >Anyone else?
> >
> >thanks,
> >
> >ben
> >
> >On Tue, Jun 19, 2012 at 10:10 AM, jim holtman 
> >wrote:
> >
> >> I am not sure why reading through 'bit-by-bit' gets you to where you
> >> want to be.  I assume that the file has some structure, even though
> >it
> >> may be changing daily.  You mentioned the various types of data that
> >> it might contain; are they all in 'byte' sized chucks?  If you really
> >> have data that begins in the middle of a byte and then extends over
> >> several bytes, you will have to write some functions that will pull
> >> out this data and then reconstruct it into an object (e.g., integer,
> >> numeric, ...) that R understands.  Can you provide some more
> >> definition of what the data actually looks like and how you would
> >find
> >> the "pattern" of the data.  Almost all systems read at the lowest
> >> level byte sized chucks, and if you really have to get down to the
> >bit
> >> level to reconstruct the data, then you have to write the unpack/pack
> >> functions.  This can all be done once you understand the structure of
> >> the data.  So some examples would be useful if you want someone to
> >> propose a solution.
> >>
> >> On Tue, Jun 19, 2012 at 11:54 AM, Ben quant 
> >wrote:
> >> > Hello,
> >> >
> >> > Has a function been built that will skip to a certain bit in a
> >binary
> >> file?
> >> >
> >> > As of 2009 the answer was 'no':
> >> > http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html
> >> > https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html
> >> >
> >> > If you feel I don't need to (like in the links above), please
> >provide
> >> some
> >> > help. (Note this is my first time working with binary files.)
> >> >
> >> > I'm still working on the script, but here is where I am right now.
> >The
> >> for
> >> > loop is being used because:
> >> >
> >> > 1) I have to get down to correct position then get the info I
> >want/need.
> >> > The stuff I am reading through (x) is not fully understood and it
> >is a
&

Re: [R] seek(), skip by bits (not by bytes) in binary file

2012-06-19 Thread Ben quant
Other people at my firm who know a lot about binary files couldn't figure
out the parts of the file that I am skipping over. Part of the issue is
that there are several different files (dbs extension files) like this that
I have to process and the structures do change depending on the source of
these files.

In short, the problem is over my head and I was hoping to go right to the
correct bit and read, which would make things much easier. I guess not...
Thanks for your help though.

Anyone else?

thanks,

ben

On Tue, Jun 19, 2012 at 10:10 AM, jim holtman  wrote:

> I am not sure why reading through 'bit-by-bit' gets you to where you
> want to be.  I assume that the file has some structure, even though it
> may be changing daily.  You mentioned the various types of data that
> it might contain; are they all in 'byte' sized chucks?  If you really
> have data that begins in the middle of a byte and then extends over
> several bytes, you will have to write some functions that will pull
> out this data and then reconstruct it into an object (e.g., integer,
> numeric, ...) that R understands.  Can you provide some more
> definition of what the data actually looks like and how you would find
> the "pattern" of the data.  Almost all systems read at the lowest
> level byte sized chucks, and if you really have to get down to the bit
> level to reconstruct the data, then you have to write the unpack/pack
> functions.  This can all be done once you understand the structure of
> the data.  So some examples would be useful if you want someone to
> propose a solution.
>
> On Tue, Jun 19, 2012 at 11:54 AM, Ben quant  wrote:
> > Hello,
> >
> > Has a function been built that will skip to a certain bit in a binary
> file?
> >
> > As of 2009 the answer was 'no':
> > http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html
> > https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html
> >
> > If you feel I don't need to (like in the links above), please provide
> some
> > help. (Note this is my first time working with binary files.)
> >
> > I'm still working on the script, but here is where I am right now. The
> for
> > loop is being used because:
> >
> > 1) I have to get down to correct position then get the info I want/need.
> > The stuff I am reading through (x) is not fully understood and it is a
> mix
> > of various chars, floats, integers, etc. of various sizes etc. so I don't
> > know who many bytes to read in unless I read them bit by bit. (The
> > information and structure of the information changes daily so I'm
> skipping
> > over it.)
> > 2) If I skip all in one readBin() my 'n' value is often up to 20 times
> too
> > big (I get an error) and/or R won't let me "allocate a vector of
> size"
> > etc. So I split it up into chunks (divide by 20 etc.) and read each chuck
> > then trash each part that is readBin()'d. Then the last line I get the
> data
> > that I want (data1).
> >
> > Here is my working code:
> >
> > # I have to read 'junk' bits from the to.read file which is huge integer
> so
> > I divide it up and loop through to.read in parts (jb_part).
> >  divr = 20
> >  mod = junk %% divr
> >
> >  jb_part = as.integer(junk/divr)
> >  jb_part_mod = jb_part + mod # catch the remainder/modulus
> >
> >  to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect to
> the
> > binary file
> > # loop in chunks to where I want to be
> >  for(i in 1:(divr-1)){
> >x = readBin(to.read,"raw",n=jb_part,size=1)
> >x = NULL # trash the result b/c I don't want it
> >  }
> > # read a a little more to include the remainder/modulus bits left over by
> > dividing by 20 above
> >  x = readBin(to.read,'raw',n=jb_part_mod,size=1)
> >  x = NULL # trash it
> >
> > # finally get the data that I want
> > data1 = readBin(to.read,double(),n=some_number,size=size_to_use)
> >
> > This works, but it is SLOW!  Any ideas on how to get down to the correct
> > bit a bit quicker (pun intended). :)
> >
> > Thanks for any help!
> >
> > Ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] seek(), skip by bits (not by bytes) in binary file

2012-06-19 Thread Ben quant
Hello,

Has a function been built that will skip to a certain bit in a binary file?

As of 2009 the answer was 'no':
http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html
https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html

If you feel I don't need to (like in the links above), please provide some
help. (Note this is my first time working with binary files.)

I'm still working on the script, but here is where I am right now. The for
loop is being used because:

1) I have to get down to correct position then get the info I want/need.
The stuff I am reading through (x) is not fully understood and it is a mix
of various chars, floats, integers, etc. of various sizes etc. so I don't
know who many bytes to read in unless I read them bit by bit. (The
information and structure of the information changes daily so I'm skipping
over it.)
2) If I skip all in one readBin() my 'n' value is often up to 20 times too
big (I get an error) and/or R won't let me "allocate a vector of size"
etc. So I split it up into chunks (divide by 20 etc.) and read each chuck
then trash each part that is readBin()'d. Then the last line I get the data
that I want (data1).

Here is my working code:

# I have to read 'junk' bits from the to.read file which is huge integer so
I divide it up and loop through to.read in parts (jb_part).
  divr = 20
  mod = junk %% divr

  jb_part = as.integer(junk/divr)
  jb_part_mod = jb_part + mod # catch the remainder/modulus

  to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect to the
binary file
# loop in chunks to where I want to be
  for(i in 1:(divr-1)){
x = readBin(to.read,"raw",n=jb_part,size=1)
x = NULL # trash the result b/c I don't want it
  }
# read a a little more to include the remainder/modulus bits left over by
dividing by 20 above
  x = readBin(to.read,'raw',n=jb_part_mod,size=1)
  x = NULL # trash it

# finally get the data that I want
data1 = readBin(to.read,double(),n=some_number,size=size_to_use)

This works, but it is SLOW!  Any ideas on how to get down to the correct
bit a bit quicker (pun intended). :)

Thanks for any help!

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strings concatenation and organization (fast)

2012-06-15 Thread Ben quant
I'm checking out Phil's solution...so far so good.  Thanks! Yes, 25 not 5
rows, sorry about that.

Rui - I can't modify rep_vec...that's just sample data. I have to start
with rep_vec and go from there.

have a good weekend all...

Ben

On Fri, Jun 15, 2012 at 2:51 PM, Rui Barradas  wrote:

> Hello,
>
> Try
>
>
>
> vec = c("1","2","3","-","-","-","4",**"5","6","1","2","3","-","-","-**")
> nms = c("A","B","C","D")
> rep_vec <- rep(sapply(split(vec, cumsum(rep(c(1, 0, 0), 5))), paste,
> collapse=""), 4)
> mat <- matrix(rep_vec, nrow=5, byrow=TRUE, dimnames=list(NULL,nms))
> mat
>
>
> Hope this helps,
>
> Rui Barradas
>
> Em 15-06-2012 21:11, Ben quant escreveu:
>
>> Hello,
>>
>> What is the fastest way to do this? I has to be done quite a few times.
>> Basically I have sets of 3 numbers (as characters) and sets of 3 dashes
>> and
>> I have to store them in named columns. The order of the sets and the
>> column
>> name they fall under is important. The actual numbers and the
>> pattern/order
>> of the sets should be considered random/unpredictable.
>>
>> Sample data:
>> vec = c("1","2","3","-","-","-","4",**"5","6","1","2","3","-","-","-**")
>> rep_vec = rep(vec,times=20)
>> nms = c("A","B","C","D")
>>
>> I need to get this:
>>   A B C D
>> "123" "---" "456" "123"
>> "---" "123" "---" "456"
>> "123" "---" "123" "---"
>> "456" "123" "---" "123"
>> "---" "456" "123" "---"
>>
>> Note: a matrix of 4 columns and 5 rows of concatenated string sets.
>>
>> Thanks!!
>>
>> Ben
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strings concatenation and organization (fast)

2012-06-15 Thread Ben quant
Hello,

What is the fastest way to do this? I has to be done quite a few times.
Basically I have sets of 3 numbers (as characters) and sets of 3 dashes and
I have to store them in named columns. The order of the sets and the column
name they fall under is important. The actual numbers and the pattern/order
of the sets should be considered random/unpredictable.

Sample data:
vec = c("1","2","3","-","-","-","4","5","6","1","2","3","-","-","-")
rep_vec = rep(vec,times=20)
nms = c("A","B","C","D")

I need to get this:
  A B C D
"123" "---" "456" "123"
"---" "123" "---" "456"
"123" "---" "123" "---"
"456" "123" "---" "123"
"---" "456" "123" "---"

Note: a matrix of 4 columns and 5 rows of concatenated string sets.

Thanks!!

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove leading slash

2012-06-08 Thread Ben quant
Yes, I've been messing with that. I've also been using the hexView package.
Reading as characters first is just helping me figure out the structure of
this binary file. In this situation it really helped. For example:
   È \001 \002
 20012

This probably isn't how I'll do it in my final draft.

I'm now looking for a date or series of dates in the binary file... I'm
guessing the dates will be represented as 3 integers one for month, day,
and year.  Any help on strategy help here would be great...  I'm reading a
file with a dbs extension if that helps.

Thanks!

ben

On Fri, Jun 8, 2012 at 12:44 PM, William Dunlap  wrote:

>  When reading binary files, it is usually best to use readBin's
>
> what=, size=, signed=, and endian= arguments to get what you want.
>
> Reading as characters and then converting them as you are doing
>
> is a very hard way to do things (and this particular conversion doesn't***
> *
>
> make much sense).
>
> ** **
>
> Bill Dunlap****
>
> Spotfire, TIBCO Software
>
> wdunlap tibco.com
>
> ** **
>
> *From:* Ben quant [mailto:ccqu...@gmail.com]
> *Sent:* Friday, June 08, 2012 11:40 AM
> *To:* William Dunlap
>
> *Cc:* r-help@r-project.org
> *Subject:* Re: [R] remove leading slash
>
>  ** **
>
> Okay, Bill smelt something wrong, so I must revise.
>
> This works for large numbers:
>
> prds = sapply(sapply(cnt_str,charToRaw),as.integer)
>
> PS - this also solves an issue I've been having elsewhere...
> PPS- Bill - I'm reading binary files...and learning.
>
> thanks!
> ben
>
> 
>
> On Fri, Jun 8, 2012 at 12:16 PM, William Dunlap  wrote:
> 
>
> Can you tell us why you are interested in this mapping?
> I.e., how did the "\001" and "\102" arise and why do you
> want to convert them to the integers 1 and 102?
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Ben quant
> > Sent: Friday, June 08, 2012 11:00 AM
> > To: Duncan Murdoch
> > Cc: r-help@r-project.org
> > Subject: Re: [R] remove leading slash
> >
> > Thanks for all your help.  I did it this way:
> >
> > > x = sapply(cnt_str,deparse)
> > > x
> >\002\001\002
> > "\"\\002\"" "\"\\001\"" "\"\\102\""
> > > as.numeric(substr(x,3,5))
> > [1]   2   1 102
> >
> > ...which is a bit of a hack, but gets me where I want to go.
> >
> > Thanks,
> > Ben
> >
> > On Fri, Jun 8, 2012 at 11:56 AM, Duncan Murdoch <
> murdoch.dun...@gmail.com>wrote:
> >
> > > On 08/06/2012 1:50 PM, Peter Langfelder wrote:
> > >
> > >> On Fri, Jun 8, 2012 at 10:25 AM, David
>
> > Winsemius>
>
> > >>  wrote:
> > >> >
> > >> >  On Jun 8, 2012, at 1:11 PM, Ben quant wrote:
> > >> >
> > >> >>  Hello,
> > >> >>
> > >> >>  How do I change this:
> > >> >>>
> > >> >>>  cnt_str
> > >> >>
> > >> >>  [1] "\002" "\001" "\102"
> > >> >>
> > >> >>  ...to this:
> > >> >>>
> > >> >>>  cnt_str
> > >> >>
> > >> >>  [1] "2" "1" "102"
> > >> >>
> > >> >>  Having trouble because of this:
> > >> >>>
> > >> >>>  nchar(cnt_str[1])
> > >> >>
> > >> >>  [1] 1
> > >> >
> > >> >
> > >> >  "\001" is ASCII cntrl-A, a single character.
> > >> >
> > >> >  ?Quotes   # not the first, second or third place I looked but I
> knew I
> > >> had
> > >> >  seen it before.
> > >>
> > >> If you still want to obtain the actual codes, you will be able to get
> > >> the number using utf8ToInt from package base or AsciiToInt from
> > >> package sfsmisc. By default, the integer codes will be printed in base
> > >> 10, though.
> > >>
> > >
> > > You could use
> > >
>
> > > > as.octmode(as.integer(**charToRaw("\102")))
>
> > > [1] "102"
> > >
> > &g

Re: [R] remove leading slash

2012-06-08 Thread Ben quant
Okay, Bill smelt something wrong, so I must revise.

This works for large numbers:

prds = sapply(sapply(cnt_str,charToRaw),as.integer)

PS - this also solves an issue I've been having elsewhere...
PPS- Bill - I'm reading binary files...and learning.

thanks!
ben


On Fri, Jun 8, 2012 at 12:16 PM, William Dunlap  wrote:

> Can you tell us why you are interested in this mapping?
> I.e., how did the "\001" and "\102" arise and why do you
> want to convert them to the integers 1 and 102?
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Ben quant
> > Sent: Friday, June 08, 2012 11:00 AM
> > To: Duncan Murdoch
> > Cc: r-help@r-project.org
> > Subject: Re: [R] remove leading slash
> >
> > Thanks for all your help.  I did it this way:
> >
> > > x = sapply(cnt_str,deparse)
> > > x
> >\002\001\002
> > "\"\\002\"" "\"\\001\"" "\"\\102\""
> > > as.numeric(substr(x,3,5))
> > [1]   2   1 102
> >
> > ...which is a bit of a hack, but gets me where I want to go.
> >
> > Thanks,
> > Ben
> >
> > On Fri, Jun 8, 2012 at 11:56 AM, Duncan Murdoch <
> murdoch.dun...@gmail.com>wrote:
> >
> > > On 08/06/2012 1:50 PM, Peter Langfelder wrote:
> > >
> > >> On Fri, Jun 8, 2012 at 10:25 AM, David
> > Winsemius>
> > >>  wrote:
> > >> >
> > >> >  On Jun 8, 2012, at 1:11 PM, Ben quant wrote:
> > >> >
> > >> >>  Hello,
> > >> >>
> > >> >>  How do I change this:
> > >> >>>
> > >> >>>  cnt_str
> > >> >>
> > >> >>  [1] "\002" "\001" "\102"
> > >> >>
> > >> >>  ...to this:
> > >> >>>
> > >> >>>  cnt_str
> > >> >>
> > >> >>  [1] "2" "1" "102"
> > >> >>
> > >> >>  Having trouble because of this:
> > >> >>>
> > >> >>>  nchar(cnt_str[1])
> > >> >>
> > >> >>  [1] 1
> > >> >
> > >> >
> > >> >  "\001" is ASCII cntrl-A, a single character.
> > >> >
> > >> >  ?Quotes   # not the first, second or third place I looked but I
> knew I
> > >> had
> > >> >  seen it before.
> > >>
> > >> If you still want to obtain the actual codes, you will be able to get
> > >> the number using utf8ToInt from package base or AsciiToInt from
> > >> package sfsmisc. By default, the integer codes will be printed in base
> > >> 10, though.
> > >>
> > >
> > > You could use
> > >
> > > > as.octmode(as.integer(**charToRaw("\102")))
> > > [1] "102"
> > >
> > > if you really want the octal versions.  Doesn't work so well on "\1022"
> > > though (because that's two characters long).
> > >
> > > Duncan Murdoch
> > >
> > >
> > >> A roundabout way, assuming your are on a *nix system, would be to
> > >> dump() cnt_str into a file, say tmp.txt, then run in a shell (or using
> > >> system() ) something like
> > >>
> > >> sed --in-place 's/\\//g' tmp.txt
> > >>
> > >> to remove the slashes, then use
> > >>
> > >> cnt_str_new = read.table("tmp.txt")
> > >>
> > >> in R to get the codes back in. I'll let you iron out the details.
> > >>
> > >> Peter
> > >>
> > >> __**
> > >> R-help@r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/**listinfo/r-
> > help<https://stat.ethz.ch/mailman/listinfo/r-help>
> > >> PLEASE do read the posting guide http://www.R-project.org/**
> > >> posting-guide.html <http://www.R-project.org/posting-guide.html>
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > > __**
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/**listinfo/r-help<
> https://stat.ethz.ch/mailman/listinfo/r-
> > help>
> > > PLEASE do read the posting guide http://www.R-project.org/**
> > > posting-guide.html <http://www.R-project.org/posting-guide.html>
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] remove leading slash

2012-06-08 Thread Ben quant
Thanks for all your help.  I did it this way:

> x = sapply(cnt_str,deparse)
> x
   \002\001\002
"\"\\002\"" "\"\\001\"" "\"\\102\""
> as.numeric(substr(x,3,5))
[1]   2   1 102

...which is a bit of a hack, but gets me where I want to go.

Thanks,
Ben

On Fri, Jun 8, 2012 at 11:56 AM, Duncan Murdoch wrote:

> On 08/06/2012 1:50 PM, Peter Langfelder wrote:
>
>> On Fri, Jun 8, 2012 at 10:25 AM, David 
>> Winsemius>
>>  wrote:
>> >
>> >  On Jun 8, 2012, at 1:11 PM, Ben quant wrote:
>> >
>> >>  Hello,
>> >>
>> >>  How do I change this:
>> >>>
>> >>>  cnt_str
>> >>
>> >>  [1] "\002" "\001" "\102"
>> >>
>> >>  ...to this:
>> >>>
>> >>>  cnt_str
>> >>
>> >>  [1] "2" "1" "102"
>> >>
>> >>  Having trouble because of this:
>> >>>
>> >>>  nchar(cnt_str[1])
>> >>
>> >>  [1] 1
>> >
>> >
>> >  "\001" is ASCII cntrl-A, a single character.
>> >
>> >  ?Quotes   # not the first, second or third place I looked but I knew I
>> had
>> >  seen it before.
>>
>> If you still want to obtain the actual codes, you will be able to get
>> the number using utf8ToInt from package base or AsciiToInt from
>> package sfsmisc. By default, the integer codes will be printed in base
>> 10, though.
>>
>
> You could use
>
> > as.octmode(as.integer(**charToRaw("\102")))
> [1] "102"
>
> if you really want the octal versions.  Doesn't work so well on "\1022"
> though (because that's two characters long).
>
> Duncan Murdoch
>
>
>> A roundabout way, assuming your are on a *nix system, would be to
>> dump() cnt_str into a file, say tmp.txt, then run in a shell (or using
>> system() ) something like
>>
>> sed --in-place 's/\\//g' tmp.txt
>>
>> to remove the slashes, then use
>>
>> cnt_str_new = read.table("tmp.txt")
>>
>> in R to get the codes back in. I'll let you iron out the details.
>>
>> Peter
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] remove leading slash

2012-06-08 Thread Ben quant
Hello,

How do I change this:
> cnt_str
[1] "\002" "\001" "\102"

...to this:
> cnt_str
[1] "2" "1" "102"

Having trouble because of this:
> nchar(cnt_str[1])
[1] 1

Thanks!

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass objects into "..." (dot dot dot)

2012-05-15 Thread Ben quant
Yes! Perfect! Thank you very much Michael! I need to get my head around
that do.call(). I'll read up on it

Thanks again!
Ben

On Tue, May 15, 2012 at 11:48 AM, R. Michael Weylandt <
michael.weyla...@gmail.com> wrote:

> Are you perhaps looking for the do.call() construction?
>
> z = Intervals(c(1,10))
> y = Intervals(c(5,10))
> x = Intervals(c(4,6))
>
> do.call("interval_intersection", list(x,y,z))
>
> Michael
>
> On Tue, May 15, 2012 at 12:46 PM, Ben quant  wrote:
> > Hello,
> >
> > Thanks in advance for any help!
> >
> > How do I pass an unknown number of objects into the "..." (dot dot dot)
> > parameter? Put another way, is there some standard way to pass multiple
> > objects into "..." to "fool" the function into thinking the objects are
> > passed in separately/explicitly with common separation (like "x,y,z" when
> > x, y and z are objects to be passed into "...")?
> >
> > Details:
> >
> > I'm working with this parameter list and function:
> >
> > interval_intersection(x, ..., check_valid = TRUE)
> >
> > To illustrate...
> >
> > This works and I get the expected interval:
> >
> > library('intervals')
> > # create individual Intervals objects
> > z = Intervals(c(1,10))
> > y = Intervals(c(5,10))
> > x = Intervals(c(4,6))
> >> interval_intersection(x,y,z)
> > Object of class Intervals
> > 1 interval over R:
> > [5, 6]
> >
> > ...but at run time I don't know how many Intervals objects I will have
> so I
> > can't list them explicitly like this "x,y,z". So I build a matrix of
> > Intervals (per the package manual) and the function doesn't work:
> >
> >> xyz = matrix(c(4,5,1,6,10,10),nrow=3)
> >> xyz
> > [,1] [,2]
> > [1,]46
> > [2,]5   10
> > [3,]1   10
> >> xyz_interval = Intervals(xyz)
> >> interval_intersection(xyz_interval)
> > Object of class Intervals
> > 1 interval over R:
> > [1, 10]
> >
> > ...[1,10] is unexpected/wrong because I want the intersection of the
> three
> > intervals. So I conclude that I need to pass in the individual Intervals
> > objects, but how do I do that if I don't know how many I have at run
> time?
> > I tried putting them in a list, but that didn't work. I also tried using
> > paste(,sep=',') and get().
> >
> > Is there some standard way to pass multiple objects into "..." to "fool"
> > the function into thinking they are passed in separately/explicitly with
> > common separation?
> >
> > Thanks!
> > ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass objects into "..." (dot dot dot)

2012-05-15 Thread Ben quant
Thank you for that. Sorry, I don't know how to use that to solve the issue.
I need to pass a handful (an unknown length) of objects into "...". I see
how you can get the count of what is in "...", but I'm not seeing how
knowing the length in "..." will help me.

ben

On Tue, May 15, 2012 at 10:53 AM, Steve Lianoglou <
mailinglist.honey...@gmail.com> wrote:

> Hi,
>
> On Tue, May 15, 2012 at 12:46 PM, Ben quant  wrote:
> > Hello,
> >
> > Thanks in advance for any help!
> >
> > How do I pass an unknown number of objects into the "..." (dot dot dot)
> > parameter? Put another way, is there some standard way to pass multiple
> > objects into "..." to "fool" the function into thinking the objects are
> > passed in separately/explicitly with common separation (like "x,y,z" when
> > x, y and z are objects to be passed into "...")?
>
> Calling `list(...)` will return a list as long as there are elements
> caught in `...`
>
> Does this help?
>
> R> howMany <- function(...) {
>  args <- list(...)
>  cat("There are", length(args), "items passed in here\n")
> }
>
> R> howMany(1, 2, 3, 4)
> There are 4 items passed in here
>
> R> howMany(10, list(1:10))
> There are 2 items passed in here
>
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pass objects into "..." (dot dot dot)

2012-05-15 Thread Ben quant
Hello,

Thanks in advance for any help!

How do I pass an unknown number of objects into the "..." (dot dot dot)
parameter? Put another way, is there some standard way to pass multiple
objects into "..." to "fool" the function into thinking the objects are
passed in separately/explicitly with common separation (like "x,y,z" when
x, y and z are objects to be passed into "...")?

Details:

I'm working with this parameter list and function:

interval_intersection(x, ..., check_valid = TRUE)

To illustrate...

This works and I get the expected interval:

library('intervals')
# create individual Intervals objects
z = Intervals(c(1,10))
y = Intervals(c(5,10))
x = Intervals(c(4,6))
> interval_intersection(x,y,z)
Object of class Intervals
1 interval over R:
[5, 6]

...but at run time I don't know how many Intervals objects I will have so I
can't list them explicitly like this "x,y,z". So I build a matrix of
Intervals (per the package manual) and the function doesn't work:

> xyz = matrix(c(4,5,1,6,10,10),nrow=3)
> xyz
 [,1] [,2]
[1,]46
[2,]5   10
[3,]1   10
> xyz_interval = Intervals(xyz)
> interval_intersection(xyz_interval)
Object of class Intervals
1 interval over R:
[1, 10]

...[1,10] is unexpected/wrong because I want the intersection of the three
intervals. So I conclude that I need to pass in the individual Intervals
objects, but how do I do that if I don't know how many I have at run time?
I tried putting them in a list, but that didn't work. I also tried using
paste(,sep=',') and get().

Is there some standard way to pass multiple objects into "..." to "fool"
the function into thinking they are passed in separately/explicitly with
common separation?

Thanks!
ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] range segment exclusion using range endpoints

2012-05-14 Thread Ben quant
Thank you Steve!

This does everything I need (at this point):

(this excludes ranges y2 from range y1)

library('intervals')
y1 = Intervals(c(-100,100))
y2 = Intervals(rbind(
  c(-100.5,30),
  c(0.77,10),
  c(25,35),
  c(70,80.3),
  c(90,95)
  ))
interval_difference(y1,y2)
Object of class Intervals_full
3 intervals over R:
(35, 70)
(80.3, 90)
(95, 100]

PS - I'm pretty sure William's solution worked as well, but opted for the
package solution which is a bit more robust.

Thanks everyone!
Ben

On Mon, May 14, 2012 at 1:06 PM, Steve Lianoglou <
mailinglist.honey...@gmail.com> wrote:

> Hi all,
>
> Nice code samples presented all around.
>
> Just wanted to point out that I think the stuff found in the
> `intervals` package might also be helpful:
>
> http://cran.at.r-project.org/web/packages/intervals/index.html
>
> HTH,
> -steve
>
> On Mon, May 14, 2012 at 2:54 PM, Ben quant  wrote:
> > Yes, it is. I'm looking into understanding this now...
> >
> > thanks!
> > Ben
> >
> > On Mon, May 14, 2012 at 12:38 PM, William Dunlap 
> wrote:
> >
> >> To the list of function I sent, add another that converts a list of
> >> intervals
> >> into a Ranges object:
> >>  as.Ranges.list <- function (x, ...) {
> >>  stopifnot(nargs() == 1, all(vapply(x, length, 0) == 2))
> >>  # use c() instead of unlist() because c() doesn't mangle POSIXct
> and
> >> Date objects
> >>  x <- unname(do.call(c, x))
> >>  odd <- seq(from = 1, to = length(x), by = 2)
> >>  as.Ranges(bottoms = x[odd], tops = x[odd + 1])
> >>  }
> >> Then stop using get() and assign() all over the place and instead make
> >> lists of
> >> related intervals and convert them to Ranges objects:
> >>  > x <- as.Ranges(list(x_rng))
> >>  > s <- as.Ranges(list(s1_rng, s2_rng, s3_rng, s4_rng, s5_rng))
> >>  > x
> >>bottoms tops
> >>  1-100  100
> >>  > s
> >>bottoms tops
> >>  1 -250.50 30.0
> >>  20.77 10.0
> >>  3   25.00 35.0
> >>  4   70.00 80.3
> >>  5   90.00 95.0
> >> and then compute the difference between the sets x and s (i.e., describe
> >> the points in x but not s as a union of intervals):
> >>  > setdiffRanges(x, s)
> >>bottoms tops
> >>  135.0   70
> >>  280.3   90
> >>  395.0  100
> >> and for a graphical check do
> >>  > plot(x, s, setdiffRanges(x, s))
> >> Are those the numbers you want?
> >>
> >> I find it easier to use standard functions and data structures for this
> >> than
> >> to adapt the cumsum/order idiom to different situations.
> >>
> >> Bill Dunlap
> >> Spotfire, TIBCO Software
> >> wdunlap tibco.com
> >>
> >>
> >> > -Original Message-
> >> > From: r-help-boun...@r-project.org [mailto:
> r-help-boun...@r-project.org]
> >> On Behalf
> >> > Of Ben quant
> >> > Sent: Monday, May 14, 2012 11:07 AM
> >> > To: jim holtman
> >> > Cc: r-help@r-project.org
> >> > Subject: Re: [R] range segment exclusion using range endpoints
> >> >
> >> > Turns out this solution doesn't work if the s range is outside the
> range
> >> of
> >> > the x range. I didn't include that in my examples, but it is
> something I
> >> > have to deal with quite often.
> >> >
> >> > For example s1_rng below causes an issue:
> >> >
> >> > x_rng = c(-100,100)
> >> > s1_rng = c(-250.5,30)
> >> > s2_rng = c(0.77,10)
> >> > s3_rng = c(25,35)
> >> > s4_rng = c(70,80.3)
> >> > s5_rng = c(90,95)
> >> >
> >> > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE)
> >> > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1))
> >> > for (i in sNames){
> >> >   queue <- rbind(queue
> >> >  , c(get(i)[1], 1)  # enter queue
> >> >  , c(get(i)[2], -1)  # exit queue
> >> >  )
> >> > }
> >> > queue <- queue[order(queue[, 1]), ]  # sort
> >> > queue <- cbind(queue, cumsum(queue[, 2]))  # of people in the queue
> >> > for (i in which(queue[, 3] == 1)){
> >> >   cat("start:", queue[i, 1L], '  end:', queue[i + 1L, 1L], "\n")
> >> > }
> >> >
&

Re: [R] range segment exclusion using range endpoints

2012-05-14 Thread Ben quant
Yes, it is. I'm looking into understanding this now...

thanks!
Ben

On Mon, May 14, 2012 at 12:38 PM, William Dunlap  wrote:

> To the list of function I sent, add another that converts a list of
> intervals
> into a Ranges object:
>  as.Ranges.list <- function (x, ...) {
>  stopifnot(nargs() == 1, all(vapply(x, length, 0) == 2))
>  # use c() instead of unlist() because c() doesn't mangle POSIXct and
> Date objects
>  x <- unname(do.call(c, x))
>  odd <- seq(from = 1, to = length(x), by = 2)
>  as.Ranges(bottoms = x[odd], tops = x[odd + 1])
>  }
> Then stop using get() and assign() all over the place and instead make
> lists of
> related intervals and convert them to Ranges objects:
>  > x <- as.Ranges(list(x_rng))
>  > s <- as.Ranges(list(s1_rng, s2_rng, s3_rng, s4_rng, s5_rng))
>  > x
>bottoms tops
>  1-100  100
>  > s
>bottoms tops
>  1 -250.50 30.0
>  20.77 10.0
>  3   25.00 35.0
>  4   70.00 80.3
>  5   90.00 95.0
> and then compute the difference between the sets x and s (i.e., describe
> the points in x but not s as a union of intervals):
>  > setdiffRanges(x, s)
>bottoms tops
>  135.0   70
>  280.3   90
>  395.0  100
> and for a graphical check do
>  > plot(x, s, setdiffRanges(x, s))
> Are those the numbers you want?
>
> I find it easier to use standard functions and data structures for this
> than
> to adapt the cumsum/order idiom to different situations.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Ben quant
> > Sent: Monday, May 14, 2012 11:07 AM
> > To: jim holtman
> > Cc: r-help@r-project.org
> > Subject: Re: [R] range segment exclusion using range endpoints
> >
> > Turns out this solution doesn't work if the s range is outside the range
> of
> > the x range. I didn't include that in my examples, but it is something I
> > have to deal with quite often.
> >
> > For example s1_rng below causes an issue:
> >
> > x_rng = c(-100,100)
> > s1_rng = c(-250.5,30)
> > s2_rng = c(0.77,10)
> > s3_rng = c(25,35)
> > s4_rng = c(70,80.3)
> > s5_rng = c(90,95)
> >
> > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE)
> > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1))
> > for (i in sNames){
> >   queue <- rbind(queue
> >  , c(get(i)[1], 1)  # enter queue
> >  , c(get(i)[2], -1)  # exit queue
> >  )
> > }
> > queue <- queue[order(queue[, 1]), ]  # sort
> > queue <- cbind(queue, cumsum(queue[, 2]))  # of people in the queue
> > for (i in which(queue[, 3] == 1)){
> >   cat("start:", queue[i, 1L], '  end:', queue[i + 1L, 1L], "\n")
> > }
> >
> > Regards,
> >
> > ben
> > On Sat, May 12, 2012 at 12:50 PM, jim holtman 
> wrote:
> >
> > > Here is an example of how you might do it.  It uses a technique of
> > > counting how many items are in a queue based on their arrival times;
> > > it can be used to also find areas of overlap.
> > >
> > > Note that it would be best to use a list for the 's' end points
> > >
> > > 
> > > > # note the next statement removes names of the format 's[0-9]+_rng'
> > > > # it would be best to create a list with the 's' endpoints, but this
> is
> > > > # what the OP specified
> > > >
> > > > rm(list = grep('s[0-9]+_rng', ls(), value = TRUE))  # Danger Will
> > > Robinson!!
> > > >
> > > > # ex 1
> > > > x_rng = c(-100,100)
> > > >
> > > > s1_rng = c(-25.5,30)
> > > > s2_rng = c(0.77,10)
> > > > s3_rng = c(25,35)
> > > > s4_rng = c(70,80.3)
> > > > s5_rng = c(90,95)
> > > >
> > > > # ex 2
> > > > # x_rng = c(-50.5,100)
> > > >
> > > > # s1_rng = c(-75.3,30)
> > > >
> > > > # ex 3
> > > > # x_rng = c(-75.3,30)
> > > >
> > > > # s1_rng = c(-50.5,100)
> > > >
> > > > # ex 4
> > > > # x_rng = c(-100,100)
> > > >
> > > > # s1_rng = c(-105,105)
> > > >
> > > > # find all the names -- USE A LIST NEXT TIME
> > > > sNames <- grep("s[0-9

Re: [R] range segment exclusion using range endpoints

2012-05-14 Thread Ben quant
Turns out this solution doesn't work if the s range is outside the range of
the x range. I didn't include that in my examples, but it is something I
have to deal with quite often.

For example s1_rng below causes an issue:

x_rng = c(-100,100)
s1_rng = c(-250.5,30)
s2_rng = c(0.77,10)
s3_rng = c(25,35)
s4_rng = c(70,80.3)
s5_rng = c(90,95)

sNames <- grep("s[0-9]+_rng", ls(), value = TRUE)
queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1))
for (i in sNames){
  queue <- rbind(queue
 , c(get(i)[1], 1)  # enter queue
 , c(get(i)[2], -1)  # exit queue
 )
}
queue <- queue[order(queue[, 1]), ]  # sort
queue <- cbind(queue, cumsum(queue[, 2]))  # of people in the queue
for (i in which(queue[, 3] == 1)){
  cat("start:", queue[i, 1L], '  end:', queue[i + 1L, 1L], "\n")
}

Regards,

ben
On Sat, May 12, 2012 at 12:50 PM, jim holtman  wrote:

> Here is an example of how you might do it.  It uses a technique of
> counting how many items are in a queue based on their arrival times;
> it can be used to also find areas of overlap.
>
> Note that it would be best to use a list for the 's' end points
>
> 
> > # note the next statement removes names of the format 's[0-9]+_rng'
> > # it would be best to create a list with the 's' endpoints, but this is
> > # what the OP specified
> >
> > rm(list = grep('s[0-9]+_rng', ls(), value = TRUE))  # Danger Will
> Robinson!!
> >
> > # ex 1
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-25.5,30)
> > s2_rng = c(0.77,10)
> > s3_rng = c(25,35)
> > s4_rng = c(70,80.3)
> > s5_rng = c(90,95)
> >
> > # ex 2
> > # x_rng = c(-50.5,100)
> >
> > # s1_rng = c(-75.3,30)
> >
> > # ex 3
> > # x_rng = c(-75.3,30)
> >
> > # s1_rng = c(-50.5,100)
> >
> > # ex 4
> > # x_rng = c(-100,100)
> >
> > # s1_rng = c(-105,105)
> >
> > # find all the names -- USE A LIST NEXT TIME
> > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE)
> >
> > # initial matrix with the 'x' endpoints
> > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1))
> >
> > # add the 's' end points to the list
> > # this will be used to determine how many things are in a queue (or
> areas that
> > # overlap)
> > for (i in sNames){
> + queue <- rbind(queue
> + , c(get(i)[1], 1)  # enter queue
> + , c(get(i)[2], -1)  # exit queue
> + )
> + }
> > queue <- queue[order(queue[, 1]), ]  # sort
> > queue <- cbind(queue, cumsum(queue[, 2]))  # of people in the queue
> > print(queue)
> [,1] [,2] [,3]
>  [1,] -100.0011
>  [2,]  -25.5012
>  [3,]0.7713
>  [4,]   10.00   -12
>  [5,]   25.0013
>  [6,]   30.00   -12
>  [7,]   35.00   -11
>  [8,]   70.00    12
>  [9,]   80.30   -11
> [10,]   90.0012
> [11,]   95.00   -11
> [12,]  100.0012
> >
> > # print out values where the last column is 1
> > for (i in which(queue[, 3] == 1)){
> + cat("start:", queue[i, 1L], '  end:', queue[i + 1L, 1L], "\n")
> + }
> start: -100   end: -25.5
> start: 35   end: 70
> start: 80.3   end: 90
> start: 95   end: 100
> >
> >
> =
>
> On Sat, May 12, 2012 at 1:54 PM, Ben quant  wrote:
> > Hello,
> >
> > I'm posting this again (with some small edits). I didn't get any replies
> > last time...hoping for some this time. :)
> >
> > Currently I'm only coming up with brute force solutions to this issue
> > (loops). I'm wondering if anyone has a better way to do this. Thank you
> for
> > your help in advance!
> >
> > The problem: I have endpoints of one x range (x_rng) and an unknown
> number
> > of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to
> > remove the x ranges that overlap with the s ranges. The examples below
> > demonstrate what I mean.
> >
> > What is the best way to do this?
> >
> > Ex 1.
> > For:
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-25.5,30)
> > s2_rng = c(0.77,10)
> > s3_rng = c(25,35)
> > s4_rng = c(70,80.3)
> > s5_rng = c(90,95)
> >
> > I would get:
> > -100,-25.5
> > 35,70
> > 80.3,90
> > 95,100
> >
> > Ex 2.
> > For:
> > x_rng = c(-50.5,100)
> >

Re: [R] range segment exclusion using range endpoints

2012-05-14 Thread Ben quant
Great solution! Thanks!

Ben

On Sat, May 12, 2012 at 12:50 PM, jim holtman  wrote:

> Here is an example of how you might do it.  It uses a technique of
> counting how many items are in a queue based on their arrival times;
> it can be used to also find areas of overlap.
>
> Note that it would be best to use a list for the 's' end points
>
> 
> > # note the next statement removes names of the format 's[0-9]+_rng'
> > # it would be best to create a list with the 's' endpoints, but this is
> > # what the OP specified
> >
> > rm(list = grep('s[0-9]+_rng', ls(), value = TRUE))  # Danger Will
> Robinson!!
> >
> > # ex 1
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-25.5,30)
> > s2_rng = c(0.77,10)
> > s3_rng = c(25,35)
> > s4_rng = c(70,80.3)
> > s5_rng = c(90,95)
> >
> > # ex 2
> > # x_rng = c(-50.5,100)
> >
> > # s1_rng = c(-75.3,30)
> >
> > # ex 3
> > # x_rng = c(-75.3,30)
> >
> > # s1_rng = c(-50.5,100)
> >
> > # ex 4
> > # x_rng = c(-100,100)
> >
> > # s1_rng = c(-105,105)
> >
> > # find all the names -- USE A LIST NEXT TIME
> > sNames <- grep("s[0-9]+_rng", ls(), value = TRUE)
> >
> > # initial matrix with the 'x' endpoints
> > queue <- rbind(c(x_rng[1], 1), c(x_rng[2], 1))
> >
> > # add the 's' end points to the list
> > # this will be used to determine how many things are in a queue (or
> areas that
> > # overlap)
> > for (i in sNames){
> + queue <- rbind(queue
> + , c(get(i)[1], 1)  # enter queue
> + , c(get(i)[2], -1)  # exit queue
> + )
> + }
> > queue <- queue[order(queue[, 1]), ]  # sort
> > queue <- cbind(queue, cumsum(queue[, 2]))  # of people in the queue
> > print(queue)
> [,1] [,2] [,3]
>  [1,] -100.0011
>  [2,]  -25.5012
>  [3,]0.7713
>  [4,]   10.00   -12
>  [5,]   25.0013
>  [6,]   30.00   -12
>  [7,]   35.00   -11
>  [8,]   70.0012
>  [9,]   80.30   -11
> [10,]   90.00    12
> [11,]   95.00   -11
> [12,]  100.0012
> >
> > # print out values where the last column is 1
> > for (i in which(queue[, 3] == 1)){
> + cat("start:", queue[i, 1L], '  end:', queue[i + 1L, 1L], "\n")
> + }
> start: -100   end: -25.5
> start: 35   end: 70
> start: 80.3   end: 90
> start: 95   end: 100
> >
> >
> =
>
> On Sat, May 12, 2012 at 1:54 PM, Ben quant  wrote:
> > Hello,
> >
> > I'm posting this again (with some small edits). I didn't get any replies
> > last time...hoping for some this time. :)
> >
> > Currently I'm only coming up with brute force solutions to this issue
> > (loops). I'm wondering if anyone has a better way to do this. Thank you
> for
> > your help in advance!
> >
> > The problem: I have endpoints of one x range (x_rng) and an unknown
> number
> > of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to
> > remove the x ranges that overlap with the s ranges. The examples below
> > demonstrate what I mean.
> >
> > What is the best way to do this?
> >
> > Ex 1.
> > For:
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-25.5,30)
> > s2_rng = c(0.77,10)
> > s3_rng = c(25,35)
> > s4_rng = c(70,80.3)
> > s5_rng = c(90,95)
> >
> > I would get:
> > -100,-25.5
> > 35,70
> > 80.3,90
> > 95,100
> >
> > Ex 2.
> > For:
> > x_rng = c(-50.5,100)
> >
> > s1_rng = c(-75.3,30)
> >
> > I would get:
> > 30,100
> >
> > Ex 3.
> > For:
> > x_rng = c(-75.3,30)
> >
> > s1_rng = c(-50.5,100)
> >
> > I would get:
> > -75.3,-50.5
> >
> > Ex 4.
> > For:
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-105,105)
> >
> > I would get something like:
> > NA,NA
> > or...
> > NA
> >
> > Ex 5.
> > For:
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-100,100)
> >
> > I would get something like:
> > -100,-100
> > 100,100
> > or just...
> > -100
> >  100
> >
> > PS - You may have noticed that in all of the examples I am including the
> s
> > range endpoints in the desired results, which I can deal with later in my
> > program so its not a problem...  I think leaving in the s range endpoints
> > simplifies the problem.
> >
> > Thanks!
> > Ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] range segment exclusion using range endpoints

2012-05-14 Thread Ben quant
rence: return Ranges object describing points that are in x
> but not y
>x <- unionIntervals(x)
>y <- unionIntervals(y)
>nx <- nrow(x)
>ny <- nrow(y)
>u <- c(x[, 1], y[, 1], x[, 2], y[, 2])
>o <- order(u)
>u <- u[o]
>vx <- cumsum(jx <- rep(c(1, 0, -1, 0), c(nx, ny, nx, ny))[o])
>vy <- cumsum(jy <- rep(c(0, -1, 0, 1), c(nx, ny, nx, ny))[o])
>as.Ranges(u[vx == 1 & vy == 0], u[(vx == 1 & jy == -1) | (jx == -1 & vy
> == 0)])
> }
>
> intersectRanges <- function(x, y)
> {
># return Ranges object describing points that are in both x and y
>x <- unionIntervals(x)
>y <- unionIntervals(y)
>nx <- nrow(x)
>ny <- nrow(y)
>u <- c(x[, 1], y[, 1], x[, 2], y[, 2])
>o <- order(u)
>u <- u[o]
>vx <- cumsum(jx <- rep(c(1, 0, -1, 0), c(nx, ny, nx, ny))[o])
>vy <- cumsum(jy <- rep(c(0, 1, 0, -1), c(nx, ny, nx, ny))[o])
>as.Ranges(u[vx == 1 & vy == 1], u[(vx == 1 & jy == -1) | (jx == -1 & vy
> == 1)])
> }
>
> inRanges <- function(x, Ranges)
> {
>if (length(x) == 1) {
>any(x > Ranges[,1] & x <= Ranges[,2])
>} else {
>Ranges <- unionIntervals(Ranges)
>(findInterval(-x, rev(-as.vector(t(Ranges %% 2) == 1
>}
> }
>
> plot.Ranges <- function(x, ...)
> {
># mainly for debugging - no plotting controls, all ... must be Ranges
> objects.
>RangesList <- list(x=x, ...)
>labels <- vapply(as.list(substitute(list(x, ...)))[-1],
> function(x)deparse(x)[1], "")
>oldmar <- par(mar = replace(par("mar"), 2, max(nchar(labels)/2, 10)))
>on.exit(par(oldmar))
>xlim <- do.call("range", c(unlist(RangesList, recursive=FALSE),
> list(finite=TRUE)))
>ylim <-  c(0, length(RangesList)+1)
>plot(type="n", xlim, ylim, xlab="", ylab="", axes=FALSE)
>grid(ny=0)
>axis(side=1)
>axis(side=2, at=seq_along(RangesList), lab=labels, las=1, tck=0)
>box()
>incr <- 0.45 / max(vapply(RangesList, nrow, 0))
>xr <- par("usr")[1:2] # for intervals that extend to -Inf or Inf.
>for(i in seq_along(RangesList)) {
>r <- RangesList[[i]]
>if (nrow(r)>0) {
>y <- i + seq(0, by=incr, len=nrow(r))
>r <- r[order(r[,1]),,drop=FALSE]
>segments(pmax(r[,1], xr[1]), y, pmin(r[,2], xr[2]), y)
> }
>}
> }
>
>
>
> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
> On Behalf
> > Of Ben quant
> > Sent: Saturday, May 12, 2012 10:54 AM
> > To: r-help@r-project.org
> > Subject: [R] range segment exclusion using range endpoints
> >
> > Hello,
> >
> > I'm posting this again (with some small edits). I didn't get any replies
> > last time...hoping for some this time. :)
> >
> > Currently I'm only coming up with brute force solutions to this issue
> > (loops). I'm wondering if anyone has a better way to do this. Thank you
> for
> > your help in advance!
> >
> > The problem: I have endpoints of one x range (x_rng) and an unknown
> number
> > of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to
> > remove the x ranges that overlap with the s ranges. The examples below
> > demonstrate what I mean.
> >
> > What is the best way to do this?
> >
> > Ex 1.
> > For:
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-25.5,30)
> > s2_rng = c(0.77,10)
> > s3_rng = c(25,35)
> > s4_rng = c(70,80.3)
> > s5_rng = c(90,95)
> >
> > I would get:
> > -100,-25.5
> > 35,70
> > 80.3,90
> > 95,100
> >
> > Ex 2.
> > For:
> > x_rng = c(-50.5,100)
> >
> > s1_rng = c(-75.3,30)
> >
> > I would get:
> > 30,100
> >
> > Ex 3.
> > For:
> > x_rng = c(-75.3,30)
> >
> > s1_rng = c(-50.5,100)
> >
> > I would get:
> > -75.3,-50.5
> >
> > Ex 4.
> > For:
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-105,105)
> >
> > I would get something like:
> > NA,NA
> > or...
> > NA
> >
> > Ex 5.
> > For:
> > x_rng = c(-100,100)
> >
> > s1_rng = c(-100,100)
> >
> > I would get something like:
> > -100,-100
> > 100,100
> > or just...
> > -100
> >  100
> >
> > PS - You may have noticed that in all of the examples I am including the
> s
> > range endpoints in the desired results, which I can deal with later in my
> > program so its not a problem...  I think leaving in the s range endpoints
> > simplifies the problem.
> >
> > Thanks!
> > Ben
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] range segment exclusion using range endpoints

2012-05-12 Thread Ben quant
Hello,

I'm posting this again (with some small edits). I didn't get any replies
last time...hoping for some this time. :)

Currently I'm only coming up with brute force solutions to this issue
(loops). I'm wondering if anyone has a better way to do this. Thank you for
your help in advance!

The problem: I have endpoints of one x range (x_rng) and an unknown number
of s ranges (s[#]_rng) also defined by the range endpoints. I'd like to
remove the x ranges that overlap with the s ranges. The examples below
demonstrate what I mean.

What is the best way to do this?

Ex 1.
For:
x_rng = c(-100,100)

s1_rng = c(-25.5,30)
s2_rng = c(0.77,10)
s3_rng = c(25,35)
s4_rng = c(70,80.3)
s5_rng = c(90,95)

I would get:
-100,-25.5
35,70
80.3,90
95,100

Ex 2.
For:
x_rng = c(-50.5,100)

s1_rng = c(-75.3,30)

I would get:
30,100

Ex 3.
For:
x_rng = c(-75.3,30)

s1_rng = c(-50.5,100)

I would get:
-75.3,-50.5

Ex 4.
For:
x_rng = c(-100,100)

s1_rng = c(-105,105)

I would get something like:
NA,NA
or...
NA

Ex 5.
For:
x_rng = c(-100,100)

s1_rng = c(-100,100)

I would get something like:
-100,-100
100,100
or just...
-100
 100

PS - You may have noticed that in all of the examples I am including the s
range endpoints in the desired results, which I can deal with later in my
program so its not a problem...  I think leaving in the s range endpoints
simplifies the problem.

Thanks!
Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] domain/number line/range reduction problem

2012-05-11 Thread Ben quant
Clarification/correction:

Ex 5 isn't consistent with the other examples
To be consistent with the other examples the resulting ranges would be
something like:
xa_rng = c(-100,-100)
xb_rng = c(100,100)
or just...
xa_rng = -100
xb_rng = 100


However, my original Ex 5 would be a good solution if the s range endpoints
were not included in the results per my statement:
"...but in a perfect world the resulting ranges would not include the s
range endpoints and would include endpoints of the x range if they were not
eliminated by an s range."

Sorry, for any confusion.
Thanks!
Ben

On Fri, May 11, 2012 at 8:58 AM, Ben quant  wrote:

> Hello,
>
> Currently I'm only coming up with brute force solutions to this issue.
> Wondering if anyone knows of a better way to do this.
>
> The problem: I have endpoints of one x range (x_rng) and an unknown number
> of s ranges (s[#]_rng) also defined by endpoints. What I want are the parts
> of the x ranges that don't overlap the s ranges. The examples below
> demonstrate what I mean. I'm glossing over an obvious endpoint
> inclusion/exclusion issue here for simplicity, but in a perfect world the
> resulting ranges would not include the s range endpoints and would include
> endpoints of the x range if they were not eliminated by an s range.
>
> Is there some function(s) in R that would make this easy?
>
> Ex 1.
> For:
> x_rng = c(-100,100)
>
> s1_rng = c(-25.5,30)
> s2_rng = c(0.77,10)
> s3_rng = c(25,35)
> s4_rng = c(70,80.3)
> s5_rng = c(90,95)
>
> I would get:
> xa_rng = c(-100,-25.5)
> xb_rng = c(35,70)
> xc_rng = c(80.3,90)
> xd_rng = c(95,100)
>
> Ex 2.
> For:
> x_rng = c(-50.5,100)
>
> s1_rng = c(-75.3,30)
>
> I would get:
> xa_rng = c(30,100)
>
> Ex 3.
> For:
> x_rng = c(-75.3,30)
>
> s1_rng = c(-50.5,100)
>
> I would get:
> xa_rng = c(-75.3,-50.5)
>
> Ex 4.
> For:
> x_rng = c(-100,100)
>
> s1_rng = c(-105,105)
>
> I would get something like:
> xa_rng = c(NA,NA)
> or...
> xa_rng = NA
>
> Ex 5.
> For:
> x_rng = c(-100,100)
>
> s1_rng = c(-100,100)
>
> I would get something like:
> xa_rng = c(NA,NA)
> or...
> xa_rng = NA
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] domain/number line/range reduction problem

2012-05-11 Thread Ben quant
Hello,

Currently I'm only coming up with brute force solutions to this issue.
Wondering if anyone knows of a better way to do this.

The problem: I have endpoints of one x range (x_rng) and an unknown number
of s ranges (s[#]_rng) also defined by endpoints. What I want are the parts
of the x ranges that don't overlap the s ranges. The examples below
demonstrate what I mean. I'm glossing over an obvious endpoint
inclusion/exclusion issue here for simplicity, but in a perfect world the
resulting ranges would not include the s range endpoints and would include
endpoints of the x range if they were not eliminated by an s range.

Is there some function(s) in R that would make this easy?

Ex 1.
For:
x_rng = c(-100,100)

s1_rng = c(-25.5,30)
s2_rng = c(0.77,10)
s3_rng = c(25,35)
s4_rng = c(70,80.3)
s5_rng = c(90,95)

I would get:
xa_rng = c(-100,-25.5)
xb_rng = c(35,70)
xc_rng = c(80.3,90)
xd_rng = c(95,100)

Ex 2.
For:
x_rng = c(-50.5,100)

s1_rng = c(-75.3,30)

I would get:
xa_rng = c(30,100)

Ex 3.
For:
x_rng = c(-75.3,30)

s1_rng = c(-50.5,100)

I would get:
xa_rng = c(-75.3,-50.5)

Ex 4.
For:
x_rng = c(-100,100)

s1_rng = c(-105,105)

I would get something like:
xa_rng = c(NA,NA)
or...
xa_rng = NA

Ex 5.
For:
x_rng = c(-100,100)

s1_rng = c(-100,100)

I would get something like:
xa_rng = c(NA,NA)
or...
xa_rng = NA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GAM, how to set qr=TRUE

2012-05-04 Thread Ben quant
Solution: have package mgcv loaded when you predict...not just for the fit.
:) Silly mistake...

Thanks Simon!

Ben

On Thu, May 3, 2012 at 3:56 PM, Ben quant  wrote:

> Hello,
>
> I don't understand what went wrong or how to fix this. How do I set
> qr=TRUE for gam?
>
> When I produce a fit using gam like this:
>
> fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control =
> list(keepData=T))
>
> ...then try to use predict:
> (see #1 below in the traceback() )
>
> > traceback()
> 6: stop("lm object does not have a proper 'qr' component.\n Rank zero or
> should not have used lm(.., qr=FALSE).") at #81
> 5: qr.lm(object) at #81
> 4: summary.glm(object, dispersion = dispersion) at #81
> 3: summary(object, dispersion = dispersion) at #81
> 2: predict.glm(fit, data.frame(x = xx), type = "response", se.fit = T,
>col = prediction_col, lty = prediction_ln) at #81
> 1: predict(fit, data.frame(x = xx), type = "response", se.fit = T,
>col = prediction_col, lty = prediction_ln) at #81
>
> ...I get this error:
>
> Error in qr.lm(object) : lm object does not have a proper 'qr' component.
>  Rank zero or should not have used lm(.., qr=FALSE).
>
> I read this post: http://tolstoy.newcastle.edu.au/R/devel/06/04/5133.html
>
> So I tried adding qr=T to the gam call but it didn't make any difference.
> This is how I did it:
>
> fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control =
> list(keepData=T),qr=T)
>
> Its all very strange because I've produced fits with this data may times
> before with no issues (and never having to do anything with the qr
> parameter. I don't understand why this is coming up or how to fix it.
>
> PS - I don't think this matters, but I am calling a script called
> FunctionGamFit.r like this:
> err = system(paste('"C:\\Program Files\\R\\R-2.14.1\\bin\\R.exe"', 'CMD
> BATCH FunctionGamFit.r'), wait = T)
> ...to produce the fit.
>
> Thanks for any help!
>
> ben
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GAM, how to set qr=TRUE

2012-05-03 Thread Ben quant
Hello,

I don't understand what went wrong or how to fix this. How do I set qr=TRUE
for gam?

When I produce a fit using gam like this:

fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control =
list(keepData=T))

...then try to use predict:
(see #1 below in the traceback() )

> traceback()
6: stop("lm object does not have a proper 'qr' component.\n Rank zero or
should not have used lm(.., qr=FALSE).") at #81
5: qr.lm(object) at #81
4: summary.glm(object, dispersion = dispersion) at #81
3: summary(object, dispersion = dispersion) at #81
2: predict.glm(fit, data.frame(x = xx), type = "response", se.fit = T,
   col = prediction_col, lty = prediction_ln) at #81
1: predict(fit, data.frame(x = xx), type = "response", se.fit = T,
   col = prediction_col, lty = prediction_ln) at #81

...I get this error:

Error in qr.lm(object) : lm object does not have a proper 'qr' component.
 Rank zero or should not have used lm(.., qr=FALSE).

I read this post: http://tolstoy.newcastle.edu.au/R/devel/06/04/5133.html

So I tried adding qr=T to the gam call but it didn't make any difference.
This is how I did it:

fit = gam(y~s(x),data=as.data.frame(l_yx),family=family,control =
list(keepData=T),qr=T)

Its all very strange because I've produced fits with this data may times
before with no issues (and never having to do anything with the qr
parameter. I don't understand why this is coming up or how to fix it.

PS - I don't think this matters, but I am calling a script called
FunctionGamFit.r like this:
err = system(paste('"C:\\Program Files\\R\\R-2.14.1\\bin\\R.exe"', 'CMD
BATCH FunctionGamFit.r'), wait = T)
...to produce the fit.

Thanks for any help!

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] check if excel file is

2012-04-27 Thread Ben quant
To get around the issue below, I just wrapped it with try(), but would like
to know how to know the question below.

Thanks!

ben

On Fri, Apr 27, 2012 at 10:13 AM, Ben quant  wrote:

> Forgot this: the solution doesn't have to come from the xlsx package...
>
> thanks
>
> ben
>
>
> On Fri, Apr 27, 2012 at 10:08 AM, Ben quant  wrote:
>
>> Hello again,
>>
>> I'd like to determine if an Excel file is open or writable. Can anyone
>> help me with that?
>>
>> I write some stats to an .xlsx Excel file using the xlsx package. I can't
>> write to the file unless its closed. How do I determine if the .xlsx file
>> is open or closed so I can write to it?
>>
>> I've looked at file.info and file.access and I couldn't get those to
>> work for me.
>>
>> Any help would be great!
>> ben
>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] check if excel file is

2012-04-27 Thread Ben quant
Forgot this: the solution doesn't have to come from the xlsx package...

thanks

ben

On Fri, Apr 27, 2012 at 10:08 AM, Ben quant  wrote:

> Hello again,
>
> I'd like to determine if an Excel file is open or writable. Can anyone
> help me with that?
>
> I write some stats to an .xlsx Excel file using the xlsx package. I can't
> write to the file unless its closed. How do I determine if the .xlsx file
> is open or closed so I can write to it?
>
> I've looked at file.info and file.access and I couldn't get those to work
> for me.
>
> Any help would be great!
> ben
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] check if excel file is

2012-04-27 Thread Ben quant
Hello again,

I'd like to determine if an Excel file is open or writable. Can anyone help
me with that?

I write some stats to an .xlsx Excel file using the xlsx package. I can't
write to the file unless its closed. How do I determine if the .xlsx file
is open or closed so I can write to it?

I've looked at file.info and file.access and I couldn't get those to work
for me.

Any help would be great!
ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] get plot axis rounding method

2012-04-27 Thread Ben quant
Hello,

Does anyone know how to get the rounding method used for the axis tick
numbers/values in plot()?

I'm using mtext() to plot the values used to plot vertical and horizontal
lines (using abline()) and I'd like these vertical and horizontal line
values to be rounded like the axis tick values are rounded.

In other words, I want numbers plotted with mtext() to be rounded in the
same fashion as the axis values given by default by plot().

thank you very much for your help!

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lines crosses

2012-04-20 Thread Ben quant
Hello,

If the exact value does not exist in the vector, can I still get at the
intersections? Is there a simple way to do this and avoid looping? Seems
like there would be a simple R function to do this...

Example:
vec <- c(5,4,3,2,3,4,5)
vec
[1] 5 4 3 2 3 4 5
intersect(vec,2.5)
numeric(0)

I want to get:
2.5 and 2.5

My real data is very large and I don't know the values of anything ahead of
time. The vec vector is produced by the gam function so it can be just
about any continuous line.

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] quarter end dates between two date strings

2012-04-18 Thread Ben quant
Hello,

I have two date strings, say "1972-06-30" and "2012-01-31", and I'd like to
get every quarter period end date between those dates? Does anyone know how
to do this? Speed is important...

Here is a small sample:

Two dates:
"2007-01-31"

"2012-01-31"

And I'd like to get this:

[1] "2007-03-31" "2007-06-30" "2007-09-30" "2007-12-31" "2008-03-31"
"2008-06-30" "2008-09-30" "2008-12-31"
 [9] "2009-03-31" "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31"
"2010-06-30" "2010-09-30" "2010-12-31"
[17] "2011-03-31" "2011-06-30" "2011-09-30" "2011-12-31"


Thanks!

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] resetting console

2012-03-21 Thread Ben quant
Hello,

I'm still hoping my issue is preventable and not worthy of a bug/crash
report, hence my post is in 'help'. Anyway, I'd like to know how to reset
the console so it is clear of all residual effects caused by previous
scripts. Details:

I run a script once and it runs successfully (but very slowly). The script
uses fairly sizable data. No problem so far.

Then I run the same exact script again and the console eventually crashes.
Naturally, I thought it was the size of the data so I:

1) run script successfully (which includes a plot)

2) do this:
dev.off()
rm(list=ls(all=TRUE))
gc()

3) run script again

...and it still crashes. There isn't an R error or anything, I just get a
Microsoft error report request window and the console goes away.

However, if I:

1) run script successfully

2) shut down, and reopen console

3) run script again

...everything runs as expected and the console does not crash.

If the script runs successfully the first time and I'm clearing all
available memory (I think) what is 'remaining' that I need to reset in the
console (that restarting the console solves/clears out)?

PS - Because the script runs successfully, I'm thinking the script itself
is not all that important. I'd prefer an answer that indicates generally
how to reset the console. Basically I'm loading some data, manipulating the
data including a log transformation, doing a gam fit (family="binomial"),
and finally I plot the data from the fit. Interestingly, if I set family to
"gaussian" or if I do not log transform the data, the console does not
crash. Or should I post a crash/bug?

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam - Y axis probability scale with confidence/error lines

2012-03-14 Thread Ben quant
Thank you. The binomial()$linkinv() is good to know.

Ben

On Wed, Mar 14, 2012 at 12:23 PM, Patrick Breheny
wrote:

> Actually, I responded a bit too quickly last time, without really reading
> through your example carefully.  You're fitting a logistic regression model
> and plotting the results on the probability scale.  The better way to do
> what you propose is to obtain the confidence interval on the scale of the
> linear predictor and then transform to the probability scale, as in:
>
> x <- seq(0,1,by=.01)
> y <- rbinom(length(x),size=1,p=x)
> require(gam)
> fit <- gam(y~s(x),family=binomial)
> pred <- predict(fit,se.fit=TRUE)
> yy <- binomial()$linkinv(pred$fit)
> l <- binomial()$linkinv(pred$fit-1.**96*pred$se.fit)
> u <- binomial()$linkinv(pred$fit+1.**96*pred$se.fit)
> plot(x,yy,type="l")
> lines(x,l,lty=2)
> lines(x,u,lty=2)
>
>
> --
> Patrick Breheny
> Assistant Professor
> Department of Biostatistics
> Department of Statistics
> University of Kentucky
>
>
>
>
> On 03/14/2012 01:49 PM, Ben quant wrote:
>
>> That was embarrassingly easy. Thanks again Patrick! Just correcting a
>> little typo to his reply. this is probably what he meant:
>>
>> pred = predict(fit,data.frame(x=xx),**type="response",se.fit=TRUE)
>> upper = pred$fit + 1.96 * pred$se.fit
>> lower = pred$fit - 1.96 * pred$se.fit
>>
>> # For people who are interested this is how you plot it line by line:
>>
>> plot(xx,pred$fit,type="l",**xlab=fd$getFactorName(),ylab=**ylab,ylim=
>> c(min(down),max(up)))
>> lines(xx,upper,type="l",lty='**dashed')
>> lines(xx,lower,type="l",lty='**dashed')
>>
>> In my opinion this is only important if the desired y axis is different
>> than what plot(fit) gives you for a gam fit (i.e fit <-
>> gam(...stuff...)) and you want to plot the confidence intervals.
>>
>> thanks again!
>>
>> Ben
>>
>> On Wed, Mar 14, 2012 at 10:39 AM, Patrick Breheny
>> > <mailto:patrick.breheny@uky.**edu>>
>> wrote:
>>
>>The predict() function has an option 'se.fit' that returns what you
>>are asking for.  If you set this equal to TRUE in your code:
>>
>>pred <- predict(fit,data.frame(x=xx),_**_type="response",se.fit=TRUE)
>>
>>
>>will return a list with two elements, 'fit' and 'se.fit'.  The
>>pointwise confidence intervals will then be
>>
>>pred$fit + 1.96*se.fit
>>pred$fit - 1.96*se.fit
>>
>>for 95% confidence intervals (replace 1.96 with the appropriate
>>quantile of the normal distribution for other confidence levels).
>>
>>You can then do whatever "stuff" you want to do with them, including
>>plot them.
>>
>>--Patrick
>>
>>
>>On 03/14/2012 10:48 AM, Ben quant wrote:
>>
>>Hello,
>>
>>How do I plot a gam fit object on probability (Y axis) vs raw
>>values (X
>>axis) axis and include the confidence plot lines?
>>
>>Details...
>>
>>I'm using the gam function like this:
>>l_yx[,2] = log(l_yx[,2] + .0004)
>>fit<- gam(y~s(x),data=as.data.frame(**__l_yx),family=binomial)
>>
>>
>>And I want to plot it so that probability is on the Y axis and
>>values are
>>on the X axis (i.e. I don't want log likelihood on the Y axis or
>>the log of
>>my values on my X axis):
>>
>>xx<- seq(min(l_yx[,2]),max(l_yx[,2]**__),len=101)
>>plot(xx,predict(fit,data.__**frame(x=xx),type="response"),_**
>> _type="l",xaxt="n",xlab="**Churn"__,ylab="P(Top
>>
>>Performer)")
>>at<- c(.001,.01,.1,1,10)  #<-- I'd also like to
>>generalize
>>this rather than hard code the numbers
>>axis(1,at=log(at+ .0004),label=at)
>>
>>So far, using the code above, everything looks the way I want.
>>But that
>>does not give me anything information on
>>variability/confidence/__**certainty.
>>
>>How do I get the dash plots from this:
>>plot(fit)
>>...on the same scales as above?
>>
>>Related question: how do get the dashed values out of the fit
>>object so I
>>can do 'stuff' with it?
>>
>>Thanks,
&

Re: [R] gam - Y axis probability scale with confidence/error lines

2012-03-14 Thread Ben quant
That was embarrassingly easy. Thanks again Patrick! Just correcting a
little typo to his reply. this is probably what he meant:

pred = predict(fit,data.frame(x=xx),type="response",se.fit=TRUE)
upper = pred$fit + 1.96 * pred$se.fit
lower = pred$fit - 1.96 * pred$se.fit

# For people who are interested this is how you plot it line by line:

plot(xx,pred$fit,type="l",xlab=fd$getFactorName(),ylab=ylab,ylim=
c(min(down),max(up)))
lines(xx,upper,type="l",lty='dashed')
lines(xx,lower,type="l",lty='dashed')

In my opinion this is only important if the desired y axis is different
than what plot(fit) gives you for a gam fit (i.e fit <- gam(...stuff...))
and you want to plot the confidence intervals.

thanks again!

Ben

On Wed, Mar 14, 2012 at 10:39 AM, Patrick Breheny
wrote:

> The predict() function has an option 'se.fit' that returns what you are
> asking for.  If you set this equal to TRUE in your code:
>
> pred <- predict(fit,data.frame(x=xx),**type="response",se.fit=TRUE)
>
> will return a list with two elements, 'fit' and 'se.fit'.  The pointwise
> confidence intervals will then be
>
> pred$fit + 1.96*se.fit
> pred$fit - 1.96*se.fit
>
> for 95% confidence intervals (replace 1.96 with the appropriate quantile
> of the normal distribution for other confidence levels).
>
> You can then do whatever "stuff" you want to do with them, including plot
> them.
>
> --Patrick
>
>
> On 03/14/2012 10:48 AM, Ben quant wrote:
>
>> Hello,
>>
>> How do I plot a gam fit object on probability (Y axis) vs raw values (X
>> axis) axis and include the confidence plot lines?
>>
>> Details...
>>
>> I'm using the gam function like this:
>> l_yx[,2] = log(l_yx[,2] + .0004)
>> fit<- gam(y~s(x),data=as.data.frame(**l_yx),family=binomial)
>>
>> And I want to plot it so that probability is on the Y axis and values are
>> on the X axis (i.e. I don't want log likelihood on the Y axis or the log
>> of
>> my values on my X axis):
>>
>> xx<- seq(min(l_yx[,2]),max(l_yx[,2]**),len=101)
>> plot(xx,predict(fit,data.**frame(x=xx),type="response"),**
>> type="l",xaxt="n",xlab="Churn"**,ylab="P(Top
>> Performer)")
>> at<- c(.001,.01,.1,1,10)  #<-- I'd also like to generalize
>> this rather than hard code the numbers
>> axis(1,at=log(at+ .0004),label=at)
>>
>> So far, using the code above, everything looks the way I want. But that
>> does not give me anything information on variability/confidence/**
>> certainty.
>> How do I get the dash plots from this:
>> plot(fit)
>> ...on the same scales as above?
>>
>> Related question: how do get the dashed values out of the fit object so I
>> can do 'stuff' with it?
>>
>> Thanks,
>>
>> Ben
>>
>> PS - thank you Patrick for your help previously.
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> --
> Patrick Breheny
> Assistant Professor
> Department of Biostatistics
> Department of Statistics
> University of Kentucky
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gam - Y axis probability scale with confidence/error lines

2012-03-14 Thread Ben quant
Hello,

How do I plot a gam fit object on probability (Y axis) vs raw values (X
axis) axis and include the confidence plot lines?

Details...

I'm using the gam function like this:
l_yx[,2] = log(l_yx[,2] + .0004)
fit <- gam(y~s(x),data=as.data.frame(l_yx),family=binomial)

And I want to plot it so that probability is on the Y axis and values are
on the X axis (i.e. I don't want log likelihood on the Y axis or the log of
my values on my X axis):

xx <- seq(min(l_yx[,2]),max(l_yx[,2]),len=101)
plot(xx,predict(fit,data.frame(x=xx),type="response"),type="l",xaxt="n",xlab="Churn",ylab="P(Top
Performer)")
at <- c(.001,.01,.1,1,10)  # <-- I'd also like to generalize
this rather than hard code the numbers
axis(1,at=log(at+ .0004),label=at)

So far, using the code above, everything looks the way I want. But that
does not give me anything information on variability/confidence/certainty.
How do I get the dash plots from this:
plot(fit)
...on the same scales as above?

Related question: how do get the dashed values out of the fit object so I
can do 'stuff' with it?

Thanks,

Ben

PS - thank you Patrick for your help previously.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] index values of one matrix to another of a different size

2012-03-12 Thread Ben quant
Joshua,

Just confirming quickly that your method using cmpfun and your f function
below was fastest using my real data.  Again, thank you for your help!

Ben

On Sat, Mar 10, 2012 at 1:21 PM, Joshua Wiley wrote:

> On Sat, Mar 10, 2012 at 12:11 PM, Ben quant  wrote:
> > Very interesting. You are doing some stuff here that I have never seen.
>
> and that I would not typically do or recommend (e.g., fussing with
> storage mode or manually setting the dimensions of an object), but
> that can be faster by sacrificing higher level functions flexibility
> for lower level, more direct control.
>
> > Thank you. I will test it on my real data on Monday and let you know
> what I
> > find. That cmpfun function looks very useful!
>
> It can reduce the overhead of repeated function calls.  I find the
> biggest speedups when it is used with some sort of loop.  Then again,
> many loops can be avoided entirely, which often yields even larger
> performance gains.
>
> >
> > Thanks,
>
> You're welcome.  You might also look at the data table package by
> Matthew Dowle.  It does some *very* fast indexing and subsetting and
> if those operations are serious slow down for you, you would likely
> benefit substantially from using it.  One final comment, since you are
> creating the matrix of indices; if you can create it in such a way
> that it already has the vector position rather than row/column form,
> you could eliminate the need for my f2() function altogether as you
> could use it to directly index your data, and then just add dimensions
> back afterward.
>
> Cheers,
>
> Josh
>
> > Ben
> >
> >
> > On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley 
> > wrote:
> >>
> >> Hi Ben,
> >>
> >> It seems likely that there are bigger bottle necks in your overall
> >> program/use---have you tried Rprof() to find where things really get
> >> slowed down?  In any case, f2() below takes about 70% of the time as
> >> your function in your test data, and 55-65% of the time for a bigger
> >> example I constructed.  Rui's function benefits substantially from
> >> byte compiling, but is still slower.  As a side benefit, f2() seems to
> >> use less memory than your current implementation.
> >>
> >> Cheers,
> >>
> >> Josh
> >>
> >> %%
> >> ##sample data ##
> >> vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3,
> >>  dimnames = list(c('row1','row2','row3'), c('col1','col2','col3')))
> >>
> >> indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3)
> >> storage.mode(indx) <- "integer"
> >>
> >>
> >> f <- function(x, i, di = dim(i), dx = dim(x)) {
> >>  out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol
> >> = di[2L], TRUE))]
> >>  dim(out) <- di
> >>  return(out)
> >> }
> >>
> >>
> >> fun <- function(valdata, inxdata){
> >>nr <- nrow(inxdata)
> >>nc <- ncol(inxdata)
> >>mat <- matrix(NA, nrow=nr*nc, ncol=2)
> >>i1 <- 1
> >>i2 <- nr
> >>for(j in 1:nc){
> >>mat[i1:i2, 1] <- inxdata[, j]
> >>mat[i1:i2, 2] <- rep(j, nr)
> >>i1 <- i1 + nr
> >>i2 <- i2 + nr
> >>}
> >>matrix(valdata[mat], ncol=nc)
> >> }
> >>
> >> require(compiler)
> >> f2 <- cmpfun(f)
> >> fun2 <- cmpfun(fun)
> >>
> >> system.time(for (i in 1:1) f(vals, indx))
> >> system.time(for (i in 1:1) f2(vals, indx))
> >> system.time(for (i in 1:1) fun(vals, indx))
> >> system.time(for (i in 1:1) fun2(vals, indx))
> >> system.time(for (i in 1:1)
> >>
> >>
> matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)))
> >>
> >> ## now let's make a bigger test set
> >> set.seed(1)
> >> vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4)
> >> indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3,
> TRUE))
> >>
> >> dim(vals2)
> >> dim(indx2)
> >>
> >> ## the best contenders from round 1
> >> gold <-
> >>
> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))
&g

Re: [R] index values of one matrix to another of a different size

2012-03-10 Thread Ben quant
Very interesting. You are doing some stuff here that I have never seen.
Thank you. I will test it on my real data on Monday and let you know what I
find. That cmpfun function looks very useful!

Thanks,
Ben

On Sat, Mar 10, 2012 at 10:26 AM, Joshua Wiley wrote:

> Hi Ben,
>
> It seems likely that there are bigger bottle necks in your overall
> program/use---have you tried Rprof() to find where things really get
> slowed down?  In any case, f2() below takes about 70% of the time as
> your function in your test data, and 55-65% of the time for a bigger
> example I constructed.  Rui's function benefits substantially from
> byte compiling, but is still slower.  As a side benefit, f2() seems to
> use less memory than your current implementation.
>
> Cheers,
>
> Josh
>
> %%
> ##sample data ##
> vals <- matrix(LETTERS[1:9], nrow = 3, ncol = 3,
>  dimnames = list(c('row1','row2','row3'), c('col1','col2','col3')))
>
> indx <- matrix(c(1,1,3,3,2,2,2,3,1,2,2,1), nrow=4, ncol=3)
> storage.mode(indx) <- "integer"
>
>
> f <- function(x, i, di = dim(i), dx = dim(x)) {
>  out <- x[c(i + matrix(0:(dx[1L] - 1L) * dx[1L], nrow = di[1L], ncol
> = di[2L], TRUE))]
>  dim(out) <- di
>  return(out)
> }
>
>
> fun <- function(valdata, inxdata){
>nr <- nrow(inxdata)
>nc <- ncol(inxdata)
>mat <- matrix(NA, nrow=nr*nc, ncol=2)
>i1 <- 1
>i2 <- nr
>for(j in 1:nc){
>mat[i1:i2, 1] <- inxdata[, j]
>mat[i1:i2, 2] <- rep(j, nr)
>i1 <- i1 + nr
>i2 <- i2 + nr
>}
>matrix(valdata[mat], ncol=nc)
> }
>
> require(compiler)
> f2 <- cmpfun(f)
> fun2 <- cmpfun(fun)
>
> system.time(for (i in 1:1) f(vals, indx))
> system.time(for (i in 1:1) f2(vals, indx))
> system.time(for (i in 1:1) fun(vals, indx))
> system.time(for (i in 1:1) fun2(vals, indx))
> system.time(for (i in 1:1)
>
> matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx)))
>
> ## now let's make a bigger test set
> set.seed(1)
> vals2 <- matrix(sample(LETTERS, 10^7, TRUE), nrow = 10^4)
> indx2 <- sapply(1:ncol(vals2), FUN = function(x) sample(10^4, 10^3, TRUE))
>
> dim(vals2)
> dim(indx2)
>
> ## the best contenders from round 1
> gold <-
> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2))
> test1 <- f2(vals2, indx2)
> all.equal(gold, test1)
>
> system.time(for (i in 1:20) f2(vals2, indx2))
> system.time(for (i in 1:20)
>
> matrix(vals2[cbind(c(indx2),rep(1:ncol(indx2),each=nrow(indx2)))],nrow=nrow(indx2),ncol=ncol(indx2)))
>
> %%
>
> On Sat, Mar 10, 2012 at 7:48 AM, Ben quant  wrote:
> > Thanks for the info. Unfortunately its a little bit slower after one
> apples
> > to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73
> seconds.
> > Not a big deal, but significant when I have to do this 300 to 500 times.
> >
> > regards,
> >
> > ben
> >
> > On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas  wrote:
> >
> >> Hello,
> >>
> >> I don't know if it's the fastest but it's more natural to have an index
> >> matrix with two columns only,
> >> one for each coordinate. And it's fast.
> >>
> >> fun <- function(valdata, inxdata){
> >>nr <- nrow(inxdata)
> >>nc <- ncol(inxdata)
> >>mat <- matrix(NA, nrow=nr*nc, ncol=2)
> >>i1 <- 1
> >>i2 <- nr
> >>for(j in 1:nc){
> >>mat[i1:i2, 1] <- inxdata[, j]
> >>mat[i1:i2, 2] <- rep(j, nr)
> >>i1 <- i1 + nr
> >>i2 <- i2 + nr
> >>}
> >>matrix(valdata[mat], ncol=nc)
> >> }
> >>
> >> fun(vals, indx)
> >>
> >> Rui Barradas
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html
> >> Sent from the R help mailing list archive at Nabble.com.
> >>
> >> __
> >> R-help@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> &g

Re: [R] index values of one matrix to another of a different size

2012-03-10 Thread Ben quant
Thanks for the info. Unfortunately its a little bit slower after one apples
to apples test using my big data. Mine: 0.28 seconds. Yours. 0.73 seconds.
Not a big deal, but significant when I have to do this 300 to 500 times.

regards,

ben

On Fri, Mar 9, 2012 at 1:23 PM, Rui Barradas  wrote:

> Hello,
>
> I don't know if it's the fastest but it's more natural to have an index
> matrix with two columns only,
> one for each coordinate. And it's fast.
>
> fun <- function(valdata, inxdata){
>nr <- nrow(inxdata)
>nc <- ncol(inxdata)
>mat <- matrix(NA, nrow=nr*nc, ncol=2)
>i1 <- 1
>i2 <- nr
>for(j in 1:nc){
>mat[i1:i2, 1] <- inxdata[, j]
>mat[i1:i2, 2] <- rep(j, nr)
>i1 <- i1 + nr
>i2 <- i2 + nr
>}
>matrix(valdata[mat], ncol=nc)
> }
>
> fun(vals, indx)
>
> Rui Barradas
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Re-index-values-of-one-matrix-to-another-of-a-different-size-tp4458666p4460575.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] index instead of loop?

2012-03-09 Thread Ben quant
Here is my latest. I kind of changed the problem (for speed). In real life
I have over 300 uadata type matrices, each having over 20 rows and over
11,000 columns. However the rddata file is valid for all of the uadata
matrices that I have (300). What I am doing now: I'm creating a matrix of
row indices which will either lag the row values or not based on the report
data (rddata). Then I apply that matrix of row indices to each uadata data
item (300 times) to create a matrix of the correctly row adjusted data
items for the correct columns of the dimensions and periodicity that I want
(weekly in this case). The key being, I only do the 'adjustment' once
(which is comparatively slow) and I apply those results to the data matrix
(fast!).

I'm open to ideas. I put this together quickly so hopefully all is well.

#sample data
zdates =
c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")

nms = c("A","B","C","D")
# these are the report dates that are the real days the data was available
rddata =
matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217",

"20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112",
   "20070426","--","--","--","--","--","--","20090319",
   "--","--","--","--","--","--","--","--"),
 nrow=8,ncol=4)
dimnames(rddata) = list(zdates,nms)

# this is the unadjusted raw data, that always has the same dimensions,
rownames, and colnames as the report dates
uadata = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65,

2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000,
  NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138,
  NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
nrow=8,ncol=4)
dimnames(uadata) = list(zdates,nms)

 I do this once

fix = function(x)
{
  year = substring(x, 1, 4)
  mo = substring(x, 5, 6)
  day = substring(x, 7, 8)
  ifelse(year=="--", "--", paste(year, mo, day, sep = "-"))

}
rd = apply(rddata, 2, fix)
dimnames(rd) = dimnames(rd)

wd1 <- seq(from =as.Date(min(zdates)), to = Sys.Date(), by = "day")
wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly
wd = sapply(wd1, as.character)

mat = matrix(NA,nrow=length(wd),ncol=ncol(uadata))
rownames(mat) = wd
nms = as.Date(rownames(uadata))

for(i in 1:length(wd)){

  d = as.Date(wd[i])
  diff = abs(nms - d)
  rd_row_idx = max(which(diff == min(diff)))
  rd_row_idx_lag = rd_row_idx - 1
  rd_row_idx_lag2 = rd_row_idx - 2
  rd_col_idx = which(as.Date(rd[rd_row_idx,], format="%Y-%m-%d")  < d)
  rd_col_idx_lag = which(as.Date(rd[rd_row_idx_lag,], format="%Y-%m-%d")  <
d)
  rd_col_idx_lag2 = which(as.Date(rd[rd_row_idx_lag2,], format="%Y-%m-%d")
< d)

  ## if(length(rd_col_idx_lag2) && (rd_row_idx - 2) > 0){
  if(rd_row_idx_lag2 > 0){
# mat[i,rd_col_idx_lag2] = ua[rd_row_idx_lag2,rd_col_idx_lag2]
mat[i,rd_col_idx_lag2] = rd_row_idx_lag2
  }
  #if(length(rd_col_idx_lag)){
  mat[i,rd_col_idx_lag] = rd_row_idx_lag
  #}
  #if( length(rd_col_idx)){
  mat[i,rd_col_idx] = rd_row_idx
  #}
  }

indx = mat
vals = uadata
## I do this 300 times

x =
matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx))

Regards,

ben

On Thu, Mar 8, 2012 at 11:40 AM, Rui Barradas  wrote:

> Hello,
>
> > Humm If I understand what you are saying, you are correct. I get
> > 144.138 for 2009-03-20 for column C. Maybe I posted the wrong code?  If
> > so,
> > sorry.
>
> I think I have the fastest so far solution, and it checks with your
> corrected,last one.
>
> I've made just a change: to transform it into a function I renamed the
> parameters
> (only for use inside the function) 'zdates', without the period, 'rddata'
> and 'uadata'.
>
> 'fun1' is yours, 'fun2', mine. Here it goes.
>
>
> fun1 <- function(zdates, rddata, uadata){
> fix = function(x)
>{
>  year = substring(x, 1, 4)
>  mo = substring(x, 5, 6)
>  day = substring(x, 7, 8)
>  ifelse(year=="--", "--", paste(year, mo, day, sep = "-"))
>
>}
> rd = apply(rddata, 2, fix)
>dimnames(rd) = dimnames(rd)
>
>wd1 <- seq(from =as.Date(min(zdates)), to = Sys.Date(), by = "day")
> #wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly
>wd = sapply(wd1, as.character)
> mat = matrix(NA,nrow=length(wd),ncol=ncol(uadata))
>rownames(mat) = wd
>nms = as.Date(rownames(uadata))
>
>for(i in 1:length(wd)){
>  d = as.Date(wd[i])
>  diff = abs(nms - d)
>  rd_row_idx = max(which(diff == min(diff)))
>  rd_col_idx = which(as.Date(rd[rd_row_idx,], format="%Y-%m-%d")  < d)
>  rd_col_idx_lag = which(as.Date(rd[rd_row_idx - 1,], format="%Y-%m-%d")
> < d)
>  rd_col_idx_lag2 = which(as.Date(rd[rd_row_idx - 2,],
> format="%Y-%m-%d")  < d)
>
>  if(length(rd_col_idx_lag2) && (rd_row_idx

Re: [R] index values of one matrix to another of a different size

2012-03-08 Thread Ben quant
> Hello,
>
> Is this the fastest way to use indices from one matrix to reference rows
> in another smaller matrix? I am dealing with very big data (lots of columns
> and I have to do this lots of times).
>
> ##sample data ##
> vals = matrix(LETTERS[1:9], nrow=3,ncol=3)
> colnames(vals) = c('col1','col2','col3')
> rownames(vals) = c('row1','row2','row3')
> > vals
> col1 col2 col3
> row1 "A"  "D"  "G"
> row2 "B"  "E"  "H"
> row3 "C"  "F"  "I"
>
> # this is a matrix of row references to vals above. The values all stay in
> the same column but shift in row via the indices.
>  indx = matrix(c(1,1,3,3,2,2,2,3,1,2,2,1),nrow=4,ncol=3)
> > indx
> [,1] [,2] [,3]
> [1,]121
> [2,]122
> [3,]322
> [4,]331
> ### end sample data 
>
> # my solution
> >
> matrix(vals[cbind(c(indx),rep(1:ncol(indx),each=nrow(indx)))],nrow=nrow(indx),ncol=ncol(indx))
> [,1] [,2] [,3]
> [1,] "A"  "E"  "G"
> [2,] "A"  "E"  "H"
> [3,] "C"  "E"  "H"
> [4,] "C"  "F"  "G"
>
> Thanks,
>
> Ben
>
> PS - Rui - I thought you may want to see this since I think this will be a
> faster way to deal with the issue you were working with me on...although I
> don't show how I build the matrix of indices, I think you get the idea.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] index instead of loop?

2012-03-08 Thread Ben quant
Humm If I understand what you are saying, you are correct. I get
144.138 for 2009-03-20 for column C. Maybe I posted the wrong code?  If so,
sorry.  Let me know if you disagree. I still plan to come back to this and
optimize it more, so if you see anything that would make it faster that
would be great. Of course, the for loop is my focus for optimization. Due
to some issues in the real data I had to add the lag and lag2 stuff in (I
don't think I had that before). In my real data the values don't really
belong in the z.dates the are aligned with, but to avoid lots of empty
values in the flat matrix (ua) they were forced in. I can push them into
their "real" dates via looking at a deeper lag. I'm thinking that all the
"which" stuff in the for look can be nested so that it runs faster. Also
the as.Date, abs() and max(which( etc. stuff seems like it could be handled
better/faster or outside the loop.

If you are interested in helping further, I can post a link to some 'real'
data.

Here is what I am using now and it seems to work.  Sorry, my code is still
very fluid:

z.dates =
c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")

nms = c("A","B","C","D")
# these are the report dates that are the real days the data was available
rd1 =
matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217",

"20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112",
   "20070426","--","--","--","--","--","--","20090319",
   "--","--","--","--","--","--","--","--"),
 nrow=8,ncol=4)
dimnames(rd1) = list(z.dates,nms)

# this is the unadjusted raw data, that always has the same dimensions,
rownames, and colnames as the report dates
ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65,

2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000,
  NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138,
  NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
nrow=8,ncol=4)
dimnames(ua) = list(z.dates,nms)

z.dates = rownames(ua)
## by rows
##  FASTEST

start_t_all = Sys.time()
fix = function(x)
{
  year = substring(x, 1, 4)
  mo = substring(x, 5, 6)
  day = substring(x, 7, 8)
  ifelse(year=="--", "--", paste(year, mo, day, sep = "-"))

}
rd = apply(rd1, 2, fix)
dimnames(rd) = dimnames(rd)

wd1 <- seq(from =as.Date(min(z.dates)), to = Sys.Date(), by = "day")
wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly
wd = sapply(wd1, as.character)

mat = matrix(NA,nrow=length(wd),ncol=ncol(ua))
rownames(mat) = wd
nms = as.Date(rownames(ua))

for(i in 1:length(wd)){
  d = as.Date(wd[i])
  diff = abs(nms - d)
  rd_row_idx = max(which(diff == min(diff)))
  rd_col_idx = which(as.Date(rd[rd_row_idx,], format="%Y-%m-%d")  < d)
  rd_col_idx_lag = which(as.Date(rd[rd_row_idx - 1,], format="%Y-%m-%d")  <
d)
  rd_col_idx_lag2 = which(as.Date(rd[rd_row_idx - 2,], format="%Y-%m-%d")
< d)

  if(length(rd_col_idx_lag2) && (rd_row_idx - 2) > 0){

mat[i,rd_col_idx_lag2] = ua[rd_row_idx - 2,rd_col_idx_lag2]
  }
  if(length(rd_col_idx_lag)){
mat[i,rd_col_idx_lag] = ua[rd_row_idx - 1,rd_col_idx_lag]
  }
  if( length(rd_col_idx)){
mat[i,rd_col_idx] = ua[rd_row_idx,rd_col_idx]
  }
}
colnames(mat)=colnames(ua)
print(Sys.time()-start_t_all)


Let me know if you disagree,

Ben

On Wed, Mar 7, 2012 at 5:57 PM, Rui Barradas  wrote:

> Hello again.
>
>
> Ben quant wrote
> >
> > Hello,
> >
> > In case anyone is interested in a faster solution for lots of columns.
> > This
> > solution is slower if you only have a few columns.  If anyone has
> anything
> > faster, I would be interested in seeing it.
> >
> > ### some mockup data
> > z.dates =
> >
> c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")
> >
> > nms = c("A","B","C","D") # add more columns to see how the code below is
> > fsater
> > # these are the report dates that are the real days the data was
> > available,
> > 

[R] extract same columns and rows in two matrices

2012-03-07 Thread Ben quant
Hello,

I have two matrices. They both have different row names and column names,
but they have some common row names and column names. The row names and
column names that are the same are what I am interested in. I also want the
columns in the two matrices aligned the same. In the end, I need to do
rd[1,1] and ua[1,1], for example and be accessing the same column and row
for both matrices.  Thank you very much for all you help.

I can do it, but I am pretty sure there is a better/faster way:

 make some sample data
ua = matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
colnames(ua) = c('a','b','c')
rownames(ua)= c('ra','rb')
rd1 = matrix(c(7,8,9,10,11,12,13,14,15,16,17,18),nrow=3,ncol=4)
colnames(rd1) = c('c','b','a','d')
rownames(rd1)= c('rc','rb','ra')

> rd1
c  b  a  d
rc 7 10 13 16
rb 8 11 14 17
ra 9 12 15 18
> ua
a b c
ra 1 3 5
rb 2 4 6

# get common columns and rows and order them the same,
this works but is slow'ish
rd1_cn = colnames(rd1)
ua_cn = colnames(ua)
common_t = merge(rd1_cn,ua_cn,by.x=1,by.y=1)
common_t = as.character(common_t[,1])
rd1 = rd1[,common_t]
ua = ua[,common_t]
rd1_d = rownames(rd1)
ua_d = rownames(ua)
common_d = merge(rd1_d,ua_d,by.x=1,by.y=1)
common_d = as.character(common_d[,1])
rd = rd1[common_d,]
ua = ua[common_d,]


 this is what I want
> rd
a  b c
ra 15 12 9
rb 14 11 8
> ua
a b c
ra 1 3 5
rb 2 4 6


Thanks!

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] index instead of loop?

2012-03-06 Thread Ben quant
Hello,

In case anyone is interested in a faster solution for lots of columns. This
solution is slower if you only have a few columns.  If anyone has anything
faster, I would be interested in seeing it.

### some mockup data
z.dates =
c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")

nms = c("A","B","C","D") # add more columns to see how the code below is
fsater
# these are the report dates that are the real days the data was available,
so show the data the day after this date ('after' is a design decision)
rd1 = matrix(c("20070514","20070814","20071115",   "20080213",
"20080514",  "20080814",  "20081114",  "20090217",
   "20070410","20070709","20071009",   "20080109",
"20080407",  "20080708",  "20081007",  "20090112",
   "20070426","--","--","--","--","--","--","20090319",
   "--","--","--","--","--","--","--","--"),
 nrow=8,ncol=4)
dimnames(rd1) = list(z.dates,nms)

# this is the unadjusted raw data, that always has the same dimensions,
rownames, and colnames as the report dates
ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65,

2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000,
  NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138,
  NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
nrow=8,ncol=4)
dimnames(ua) = list(z.dates,nms)

 the fastest code I have found:

start_t_all = Sys.time()
fix = function(x)
{
  year = substring(x, 1, 4)
  mo = substring(x, 5, 6)
  day = substring(x, 7, 8)
  ifelse(year=="--", "NA", paste(year, mo, day, sep = "-"))
}

rd = apply(rd1, 2, fix)
dimnames(rd) = dimnames(rd)

wd1 <- seq(from =as.Date(min(z.dates)), to = Sys.Date(), by = "day")
#wd1 = wd1[weekdays(wd1) == "Friday"] # uncomment to go weekly
wd = sapply(wd1, as.character)

mat = matrix(NA,nrow=length(wd),ncol=ncol(ua))
rownames(mat) = wd
nms = as.Date(rownames(ua))

for(i in 1:length(wd)){
  d = as.Date(wd[i])
  diff = abs(nms - d)
  rd_row_idx = max(which(diff == min(diff)))
  rd_col_idx = which(rd[rd_row_idx,] < d)

  if((rd_row_idx - 1) > 0){
mat[i,] = ua[rd_row_idx - 1,]
  }
  if( length(rd_col_idx)){
mat[i,rd_col_idx] = ua[rd_row_idx,rd_col_idx]
  }
}
colnames(mat)=colnames(ua)
print(Sys.time()-start_t_all)

Regards,

Ben

On Tue, Mar 6, 2012 at 8:22 AM, Rui Barradas  wrote:

> Hello,
>
> > Just looking at this, but it looks like ix doesn't exist:
> >sapply(1:length(inxlist), function(i) if(length(ix[[i]]))
> > fin1[ix[[i]], tkr + 1] <<- ua[i, tkr])
> >
> >  Trying to sort it out now.
>
> Right, sorry.
> I've changed the name from 'ix' to 'inxlist' to make it more readable just
> before posting.
> And since the object 'ix' still existed in the R global environment it
> didn't throw an error...
>
> Your correction in the post that followed is what I meant.
>
> Correction (full loop, tested):
>
> for(tkr in 1:ncol(ua)){
>x  <- c(rd1[, tkr], as.Date("-12-31"))
> ix <- lapply(1:nr, function(i)
> which(x[i] <= dt1 & dt1 < x[i + 1]))
> sapply(1:length(ix), function(i)
> if(length(ix[[i]])) fin1[ix[[i]], tkr + 1] <<- ua[i, tkr])
> }
>
> Rui Barradas
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4450186.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] index instead of loop?

2012-03-06 Thread Ben quant
Unfortunately, your solution is does not scale well.  (Tough for you to
test this without my real data.)  If ua is my data and rd1 are my report
dates (same as the code below) and I use more columns, it appears that your
solution slows considerably. Remember I have ~11k columns in my real data,
so scalability is critical.

Here are the processing times using real data:

Use 4 columns:
ua = ua[,1:4]
rd1 = rd1[,1:4]
mine: 2.4 sec's
yours: 1.39 sec's   Note: yours is faster with 4 columns (like the mockup
data I provided.)

Use 150 columns:
ua = ua[,1:150]
rd1 = rd1[,1:150]
mine: 5 sec's
yours: 9 sec's

Use 300 columns:
ua = ua[,1:300]
rd1 = rd1[,1:300]
mine: 9.5 sec's
yours: 1 min


# data
Here is the mockup date and code used: (Anyone looking to test the
scalability may want to add more columns.)

Mockup date:
z.dates =
c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")

nms = c("A","B","C","D")
# these are the report dates that are the real days the data was available
rd1 = matrix(c("20070514","20070814","20071115",   "20080213",
"20080514",  "20080814",  "20081114",  "20090217",
   "20070410","20070709","20071009",   "20080109",
"20080407",  "20080708",  "20081007",  "20090112",
   "20070426","--","--","--","--","--","--","20090319",
   "--","--","--","--","--","--","--","--"),
 nrow=8,ncol=4)
dimnames(rd1) = list(z.dates,nms)


 My code:

start_t_all = Sys.time()
nms = colnames(ua)

fix = function(x)
{
  year = substring(x, 1, 4);
  mo = substring(x, 5, 6);
  day = substring(x, 7, 8);
  ifelse(year=="--", "NA", paste(year, mo, day, sep = "-"))
}

rd = apply(rd1, 2, fix)
dimnames(rd) = dimnames(rd)

dt1 <- seq(from =as.Date(z.dates[1]), to =
as.Date(z.dates[length(z.dates)]), by =
  "day")
dt = sapply(dt1, as.character)

fin = dt
ck_rows = length(dt)
bad = character(0)

for(cn in 1:ncol(ua)){
  uac = ua[,cn]
  tkr = colnames(ua)[cn]
  rdc = rd[,cn]
  ua_rd = cbind(uac,rdc)
  colnames(ua_rd) = c(tkr,'rt_date')
  xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)
  xx = as.character(xx1[,2])
  values <- c(NA, xx[!is.na(xx)])
  ind = cumsum(!is.na(xx)) + 1
  y <- values[ind]
  if(ck_rows == length(y)){
fin  = data.frame(fin,y)
  }else{
bad = c(bad,tkr)
  }
}
if(length(bad)){
  nms = nms[bad != nms]
}
colnames(fin) = c('daily_dates',nms)

print("over all time for loop")
print(Sys.time()-start_t_all)


 ### Your code:


z.dates = rownames(ua)

start_t_all = Sys.time()
fdate <- function(x, format="%Y%m%d"){
  DF <- data.frame(x)
  for(i in colnames(DF)){
DF[, i] <- as.Date(DF[, i], format=format)
class(DF[, i]) <- "Date"
  }
  DF
}

rd1 <- fdate(rd1)
# This is yours, use it.
dt1 <- seq(from =as.Date(z.dates[1]), to =
as.Date(z.dates[length(z.dates)]), by ="day")
# Set up the result, no time expensive 'cbind' inside a loop
fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1))
fin1[, 1] <- dt1
nr <- nrow(rd1)

# And vectorize
for(tkr in 1:ncol(ua)){
  x  <- c(rd1[, tkr], as.Date("-12-31"))
  # inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1]))
  ix <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1]))
  sapply(1:length(ix), function(i) if(length(ix[[i]])) fin1[ix[[i]], tkr +
1] <<- ua[i, tkr])
}
colnames(fin1) <- c("daily_dates", colnames(ua))
print(Sys.time()-start_t_all)


Thanks for your efforts though,

ben

On Tue, Mar 6, 2012 at 7:39 AM, Ben quant  wrote:

> I think this is what you meant:
>
>
> z.dates =
> c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")
>
> nms = c("A","B","C","D")
> # these are the report dates that are the real days the data was available
> rd1 = matrix(c("20070514","20070814","20071115",   "20080213",
> "20080514",  "20080814",  "20081114",  "20090217",
>"20070410","20070709","20071009",   "20080109",
> "20080407",  &quo

Re: [R] index instead of loop?

2012-03-06 Thread Ben quant
I think this is what you meant:

z.dates =
c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")

nms = c("A","B","C","D")
# these are the report dates that are the real days the data was available
rd1 = matrix(c("20070514","20070814","20071115",   "20080213",
"20080514",  "20080814",  "20081114",  "20090217",
   "20070410","20070709","20071009",   "20080109",
"20080407",  "20080708",  "20081007",  "20090112",
   "20070426","--","--","--","--","--","--","20090319",
   "--","--","--","--","--","--","--","--"),
 nrow=8,ncol=4)
dimnames(rd1) = list(z.dates,nms)

# this is the unadjusted raw data, that always has the same dimensions,
rownames, and colnames as the report dates
ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65,

2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000,
  NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138,
  NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
nrow=8,ncol=4)
dimnames(ua) = list(z.dates,nms)

##


fdate <- function(x, format="%Y%m%d"){
  DF <- data.frame(x)
  for(i in colnames(DF)){
DF[, i] <- as.Date(DF[, i], format=format)
class(DF[, i]) <- "Date"
  }
  DF
}

rd1 <- fdate(rd1)
# This is yours, use it.
dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by =
  "day")
# Set up the result, no time expensive 'cbind' inside a loop
fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1))
fin1[, 1] <- dt1
nr <- nrow(rd1)

# And vectorize
for(tkr in 1:ncol(ua)){
  x  <- c(rd1[, tkr], as.Date("-12-31"))
 # inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1]))
  ix <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i + 1]))
  sapply(1:length(ix), function(i) if(length(ix[[i]])) fin1[ix[[i]], tkr +
1] <<- ua[i, tkr])
}
colnames(fin1) <- c("daily_dates", colnames(ua))

# Check results
str(fin1)
head(fin1)
tail(fin1)

On Tue, Mar 6, 2012 at 7:34 AM, Ben quant  wrote:

> Just looking at this, but it looks like ix doesn't exist:
>
>sapply(1:length(inxlist), function(i) if(length(ix[[i]]))
> fin1[ix[[i]], tkr
> + 1] <<- ua[i, tkr])
>
>  Trying to sort it out now.
>
> Ben
>
>
> On Mon, Mar 5, 2012 at 7:48 PM, Rui Barradas  wrote:
>
>> Hello,
>>
>> >
>> > Mar 05, 2012; 8:53pm — by Ben quant Ben quant
>> > Hello,
>> >
>> > Does anyone know of a way I can speed this up?
>> >
>>
>> Maybe, let's see.
>>
>> >
>> > # change anything below.
>> >
>>
>> # Yes.
>> # First, start by using dates, not characters
>>
>> fdate <- function(x, format="%Y%m%d"){
>>DF <- data.frame(x)
>>for(i in colnames(DF)){
>>DF[, i] <- as.Date(DF[, i], format=format)
>>class(DF[, i]) <- "Date"
>>}
>>DF
>> }
>>
>> rd1 <- fdate(rd1)
>> # This is yours, use it.
>> dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by =
>> "day")
>> # Set up the result, no time expensive 'cbind' inside a loop
>> fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1))
>> fin1[, 1] <- dt1
>> nr <- nrow(rd1)
>>
>> # And vectorize
>> for(tkr in 1:ncol(ua)){
>>x  <- c(rd1[, tkr], as.Date("-12-31"))
>>inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i
>> + 1]))
>>sapply(1:length(inxlist), function(i) if(length(ix[[i]]))
>> fin1[ix[[i]], tkr
>> + 1] <<- ua[i, tkr])
>> }
>> colnames(fin1) <- c("daily_dates", colnames(ua))
>>
>> # Check results
>> str(fin)
>> str(fin1)
>> head(fin)
>> head(fin1)
>> tail(fin)
>> tail(fin1)
>>
>>
>> Note that 'fin' has facotrs, 'fin1' numerics.
>> I haven't timed it but I believe it should be faster.
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4448567.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] index instead of loop?

2012-03-06 Thread Ben quant
Just looking at this, but it looks like ix doesn't exist:
   sapply(1:length(inxlist), function(i) if(length(ix[[i]]))
fin1[ix[[i]], tkr
+ 1] <<- ua[i, tkr])

 Trying to sort it out now.

Ben

On Mon, Mar 5, 2012 at 7:48 PM, Rui Barradas  wrote:

> Hello,
>
> >
> > Mar 05, 2012; 8:53pm — by Ben quant Ben quant
> > Hello,
> >
> > Does anyone know of a way I can speed this up?
> >
>
> Maybe, let's see.
>
> >
> > # change anything below.
> >
>
> # Yes.
> # First, start by using dates, not characters
>
> fdate <- function(x, format="%Y%m%d"){
>DF <- data.frame(x)
>for(i in colnames(DF)){
>DF[, i] <- as.Date(DF[, i], format=format)
>class(DF[, i]) <- "Date"
>}
>DF
> }
>
> rd1 <- fdate(rd1)
> # This is yours, use it.
> dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by =
> "day")
> # Set up the result, no time expensive 'cbind' inside a loop
> fin1 <- data.frame(matrix(NA, nrow=length(dt1), ncol=ncol(ua) + 1))
> fin1[, 1] <- dt1
> nr <- nrow(rd1)
>
> # And vectorize
> for(tkr in 1:ncol(ua)){
>x  <- c(rd1[, tkr], as.Date("-12-31"))
>inxlist <- lapply(1:nr, function(i) which(x[i] <= dt1 & dt1 < x[i +
> 1]))
>sapply(1:length(inxlist), function(i) if(length(ix[[i]]))
> fin1[ix[[i]], tkr
> + 1] <<- ua[i, tkr])
> }
> colnames(fin1) <- c("daily_dates", colnames(ua))
>
> # Check results
> str(fin)
> str(fin1)
> head(fin)
> head(fin1)
> tail(fin)
> tail(fin1)
>
>
> Note that 'fin' has facotrs, 'fin1' numerics.
> I haven't timed it but I believe it should be faster.
>
> Hope this helps,
>
> Rui Barradas
>
>
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/index-instead-of-loop-tp4447672p4448567.html
> Sent from the R help mailing list archive at Nabble.com.
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] index instead of loop?

2012-03-05 Thread Ben quant
Hello,

Does anyone know of a way I can speed this up? Basically I'm attempting to
get the data item on the same row as the report date for each report date
available. In reality, I have over 11k of columns, not just A, B, C, D and
I have to do that over 100 times. My solution is slow, but it works. The
loop is slow because of merge.

# create sample data
z.dates =
c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")

nms = c("A","B","C","D")
# these are the report dates that are the real days the data was available
rd1 =
matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217",

"20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112",
   "20070426","--","--","--","--","--","--","20090319",
   "--","--","--","--","--","--","--","--"),
 nrow=8,ncol=4)
dimnames(rd1) = list(z.dates,nms)

# this is the unadjusted raw data, that always has the same dimensions,
rownames, and colnames as the report dates
ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65,

2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000,
  NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138,
  NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
nrow=8,ncol=4)
dimnames(ua) = list(z.dates,nms)

# change anything below.

# My first attempt at this
fix = function(x)
{
  year = substring(x, 1, 4);
  mo = substring(x, 5, 6);
  day = substring(x, 7, 8);
  ifelse(year=="--", "NA", paste(year, mo, day, sep = "-"))
}

rd = apply(rd1, 2, fix)
dimnames(rd) = dimnames(rd)

dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by =
"day")
dt = sapply(dt1, as.character)

fin = dt
ck_rows = length(dt)
bad = character(0)
start_t_all = Sys.time()
for(cn in 1:ncol(ua)){
  uac = ua[,cn]
  tkr = colnames(ua)[cn]
  rdc = rd[,cn]
  ua_rd = cbind(uac,rdc)
  colnames(ua_rd) = c(tkr,'rt_date')
  xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)
  xx = as.character(xx1[,2])
  values <- c(NA, xx[!is.na(xx)])
  ind = cumsum(!is.na(xx)) + 1
  y <- values[ind]
  if(ck_rows == length(y)){
fin  = data.frame(fin,y)
  }else{
bad = c(bad,tkr)
  }
}

colnames(fin) = c('daily_dates',nms)

print("over all time for loop")
print(Sys.time()-start_t_all)

print(fin)


Thanks,

Ben

PS - the real/over-all issue is below, but it is probably too involved to
follow.

On Sat, Mar 3, 2012 at 2:30 PM, Ben quant  wrote:

> Hello,
>
> Thank you for your help/advice!
>
> The issue here is speed/efficiency. I can do what I want, but its really
> slow.
>
> The goal is to have the ability to do calculations on my data and have it
> adjusted for look-ahead. I see two ways to do this:
> (I'm open to more ideas. My terminology: Unadjusted = values not adjusted
> for look-ahead bias; adjusted = values adjusted for look-ahead bias.)
>
> 1) I could a) do calculations on unadjusted values then b) adjust the
> resulting values for look-ahead bias. Here is what I mean:
>  a) I could say the following using time series of val1: [(val1 - val1 4
> periods ago) / val1 4 periods ago] = resultval. ("Periods" correspond to
> the z.dates in my example below.)
> b) Then I would adjust the resultval for look-ahead based on val1's
> associated report date.
> Note: I don't think this will be the fastest.
>
> 2) I could do the same calculation [(val1 - val1 4 periods ago) / val1 4
> periods ago] = resultval, but my calculation function would get the 'right'
> values that would have no look-ahead bias. I'm not sure how I would do
> this, but maybe a query starting with the date that I want, indexed to
> appropriate report date indexed to the correct value to return. But how do
> I do this in R? I think I would have to put this in our database and do a
> query. The data comes to me in RData format. I could put it all in our
> database via PpgSQL which we already use.
> Note: I think this will be fastest.
>
> Anyway, my first attempt at this was to solve part b of #1 above. Here is
> how my data looks and my first attempt at solving part b of idea #1 above.
> It only takes 0.14 seconds for my mock data, but that is way too slow. The
> major

[R] Matrix Package, sparseMatrix, more NaN's than zeros

2012-03-04 Thread Ben quant
Hello,

I have a lot of data and it has a lot of NaN values. I want to compress the
data so I don't have memory issues later.

Using the Matrix package, sparseMatrix function, and some fiddling around,
I have successfully reduced the 'size' of my data (as measured by
object.size()). However, NaN values are found all over in my data and zeros
are important, but zeros are found very infrequently in my data. So I turn
NaN's into zeros and zeros into very small numbers. I don't like changing
the zeros into small numbers, because that is not the truth. I know this is
a judgement call on my part based on the impact non-zero zeros will have on
my analysis.

My question is: Do I have any other option? Is there a better solution for
this issue?

Here is a small example:

# make sample data
M <- Matrix(10 + 1:28, 4, 7)
M2 <- cBind(-1, M)
M2[, c(2,4:6)] <- 0
M2[1:2,2] <- M2[c(3,4),]<- M2[,c(3,4,5)]<- NaN

M3 = M2

# my 'fiddling' to make sparseMatrix save space
M3[M3==0] = 1e-08 # turn zeros into small values
M3[is.nan(M3)] = 0 # turn NaN's into zeros

# saving space
sM <- as(M3, "sparseMatrix")

#Note that this is just a sample of what I am doing. This reduces the
object.size() if you have a lot more data. In this simple example it
actually increases the object.size() because the data is so small.

What I know about Matrix:
http://cran.r-project.org/web/packages/Matrix/vignettes/Intro2Matrix.pdf

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] removing data look-ahead, something faster.

2012-03-03 Thread Ben quant
Hello,

Thank you for your help/advice!

The issue here is speed/efficiency. I can do what I want, but its really
slow.

The goal is to have the ability to do calculations on my data and have it
adjusted for look-ahead. I see two ways to do this:
(I'm open to more ideas. My terminology: Unadjusted = values not adjusted
for look-ahead bias; adjusted = values adjusted for look-ahead bias.)

1) I could a) do calculations on unadjusted values then b) adjust the
resulting values for look-ahead bias. Here is what I mean:
 a) I could say the following using time series of val1: [(val1 - val1 4
periods ago) / val1 4 periods ago] = resultval. ("Periods" correspond to
the z.dates in my example below.)
b) Then I would adjust the resultval for look-ahead based on val1's
associated report date.
Note: I don't think this will be the fastest.

2) I could do the same calculation [(val1 - val1 4 periods ago) / val1 4
periods ago] = resultval, but my calculation function would get the 'right'
values that would have no look-ahead bias. I'm not sure how I would do
this, but maybe a query starting with the date that I want, indexed to
appropriate report date indexed to the correct value to return. But how do
I do this in R? I think I would have to put this in our database and do a
query. The data comes to me in RData format. I could put it all in our
database via PpgSQL which we already use.
Note: I think this will be fastest.

Anyway, my first attempt at this was to solve part b of #1 above. Here is
how my data looks and my first attempt at solving part b of idea #1 above.
It only takes 0.14 seconds for my mock data, but that is way too slow. The
major things slowing it down A) the loop, B) the merge statement.

# mock data: this is how it comes to me (raw)
# in practice I have over 10,000 columns

# the starting 'periods' for my data
z.dates =
c("2007-03-31","2007-06-30","2007-09-30","2007-12-31","2008-03-31","2008-06-30","2008-09-30","2008-12-31")

nms = c("A","B","C","D")
# these are the report dates that are the real days the data was available
rd1 =
matrix(c("20070514","20070814","20071115","20080213","20080514","20080814","20081114","20090217",

"20070410","20070709","20071009","20080109","20080407","20080708","20081007","20090112",
  "20070426","--","--","--","--","--","--","20090319",
  "--","--","--","--","--","--","--","--"),
nrow=8,ncol=4)
dimnames(rd1) = list(z.dates,nms)

# this is the unadjusted raw data, that always has the same dimensions,
rownames, and colnames as the report dates
ua = matrix(c(640.35,636.16,655.91,657.41,682.06,702.90,736.15,667.65,

2625.050,2625.050,2645.000,2302.000,1972.000,1805.000,1547.000,1025.000,
  NaN, NaN,-98.426,190.304,180.894,183.220,172.520, 144.138,
  NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN),
nrow=8,ncol=4)
dimnames(ua) = list(z.dates,nms)

# change anything below. I can't change
anything above this line.

# My first attempt at this was to solve part b of #1 above.
fix = function(x)
{
  year = substring(x, 1, 4);
  mo = substring(x, 5, 6);
  day = substring(x, 7, 8);
  ifelse(year=="--", "NA", paste(year, mo, day, sep = "-"))
}

rd = apply(rd1, 2, fix)
dimnames(rd) = dimnames(eps_rd)

dt1 <- seq(from =as.Date(z.dates[1]), to = as.Date("2009-03-25"), by =
"day")
dt = sapply(dt1, as.character)

fin = dt
ck_rows = length(dt)
bad = character(0)
start_t_all = Sys.time()
for(cn in 1:ncol(ua)){
  uac = ua[,cn]
  tkr = colnames(ua)[cn]
  rdc = rd[,cn]
  ua_rd = cbind(uac,rdc)
  colnames(ua_rd) = c(tkr,'rt_date')
  xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)
  xx = as.character(xx1[,2])
  values <- c(NA, xx[!is.na(xx)])
  ind = cumsum(!is.na(xx)) + 1
  y <- values[ind]
  if(ck_rows == length(y)){
fin  = data.frame(fin,y)
  }else{
bad = c(bad,tkr)
  }
}

colnames(fin) = c('daily_dates',nms)

# after this I would slice and dice the data into weekly, monthly, etc.
periodicity as needed, but this leaves it in daily format which is as
granular as I will get.

print("over all time for loop")
print(Sys.time()-start_t_all)

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up merge

2012-03-02 Thread Ben quant
I'll have to give this a try this weekend. Thank you!

ben

On Fri, Mar 2, 2012 at 12:07 PM, jim holtman  wrote:

> One way to speed up the merge is not to use merge.  You can use 'match' to
> find matching indices and then manually.
>
> Does this do what you want:
>
> > ua <- read.table(text = '  AName  rt_date
> + 2007-03-31 "14066.580078125" "2007-04-01"
> + 2007-06-30 "14717"   "2007-04-03"
> + 2007-09-30 "15528"   "2007-10-25"
> + 2007-12-31 "17609"   "2008-04-06"
> + 2008-03-31 "17168"   "2008-04-24"
> + 2008-06-30 "17681"   "2008-04-09"', header = TRUE, as.is = TRUE)
> >
> > dt <- c( "2007-03-31" ,"2007-04-01" ,"2007-04-02", "2007-04-03"
> ,"2007-04-04",
> + "2007-04-05" ,"2007-04-06" ,"2007-04-07",
> + "2007-04-08", "2007-04-09")
> >
> > # find matching values in ua
> > indx <- match(dt, ua$rt_date)
> >
> > # create new result matrix
> > xx1 <- cbind(dt, ua[indx,])
> > rownames(xx1) <- NULL  # delete funny names
> > xx1
>dtANamert_date
> 1  2007-03-31   NA   
> 2  2007-04-01 14066.58 2007-04-01
> 3  2007-04-02   NA   
> 4  2007-04-03 14717.00 2007-04-03
> 5  2007-04-04   NA   
> 6  2007-04-05   NA   
> 7  2007-04-06   NA   
> 8  2007-04-07   NA   
> 9  2007-04-08   NA   
> 10 2007-04-09   NA   
> >
>
>
> On Fri, Mar 2, 2012 at 5:24 AM, Ben quant  wrote:
>
>> Hello,
>>
>> I have a nasty loop that I have to do 11877 times. The only thing that
>> slows it down really is this merge:
>>
>> xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)
>>
>> Any ideas on how to speed it up? The output can't change materially (it
>> works), but I'd like it to go faster. I'm looking at getting around the
>> loop (not shown), but I'm trying to speed up the merge first. I'll post
>> regarding the loop if nothing comes of this post.
>>
>> Here is some information on what type of stuff is going into the merge:
>>
>> > class(ua_rd)
>> [1] "matrix"
>> > dim(ua_rd)
>> [1] 20  2
>> > head(ua_rd)
>>   AName  rt_date
>> 2007-03-31 "14066.580078125" "2007-04-26"
>> 2007-06-30 "14717"   "2007-07-19"
>> 2007-09-30 "15528"   "2007-10-25"
>> 2007-12-31 "17609"   "2008-01-24"
>> 2008-03-31 "17168"   "2008-04-24"
>> 2008-06-30 "17681"   "2008-07-17"
>> > class(dt)
>> [1] "character"
>> > length(dt)
>> [1] 1799
>> > dt[1:10]
>>  [1] "2007-03-31" "2007-04-01" "2007-04-02" "2007-04-03" "2007-04-04"
>> "2007-04-05" "2007-04-06" "2007-04-07"
>>  [9] "2007-04-08" "2007-04-09"
>>
>> thanks,
>>
>> Ben
>>
>>[[alternative HTML version deleted]]
>>
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speed up merge

2012-03-02 Thread Ben quant
I'm not sure. I'm still looking into it. Its pretty involved, so I asked
the simplest answer first (the merge question).

I'll reply back with a mock-up/sample that is testable under a more
appropriate subject line. Probably this weekend.

Regards,

Ben


On Fri, Mar 2, 2012 at 4:37 AM, Hans Ekbrand  wrote:

> On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote:
> > Hello,
> >
> > I have a nasty loop that I have to do 11877 times.
>
> Are you completely sure about that? I often find my self avoiding
> loops-by-row by constructing vectors of which rows that fullfil a
> condition, and then creating new vectors out of that vector. If you
> elaborate on the problem, perhaps we could find a way to avoid the
> loops altogether?
>
> Mostly as a note to self, I wrote
> http://code.cjb.net/vectors-instead-of-loop.html, it might be
> understood by others too, but I'm not sure.
>
> --
> Hans Ekbrand (http://sociologi.cjb.net) 
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] speed up merge

2012-03-02 Thread Ben quant
Hello,

I have a nasty loop that I have to do 11877 times. The only thing that
slows it down really is this merge:

xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T)

Any ideas on how to speed it up? The output can't change materially (it
works), but I'd like it to go faster. I'm looking at getting around the
loop (not shown), but I'm trying to speed up the merge first. I'll post
regarding the loop if nothing comes of this post.

Here is some information on what type of stuff is going into the merge:

> class(ua_rd)
[1] "matrix"
> dim(ua_rd)
[1] 20  2
> head(ua_rd)
   AName  rt_date
2007-03-31 "14066.580078125" "2007-04-26"
2007-06-30 "14717"   "2007-07-19"
2007-09-30 "15528"   "2007-10-25"
2007-12-31 "17609"   "2008-01-24"
2008-03-31 "17168"   "2008-04-24"
2008-06-30 "17681"   "2008-07-17"
> class(dt)
[1] "character"
> length(dt)
[1] 1799
> dt[1:10]
 [1] "2007-03-31" "2007-04-01" "2007-04-02" "2007-04-03" "2007-04-04"
"2007-04-05" "2007-04-06" "2007-04-07"
 [9] "2007-04-08" "2007-04-09"

thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data frame of strings formatted

2012-03-01 Thread Ben quant
Thanks a ton! That is great.

ben

On Thu, Mar 1, 2012 at 9:29 PM, Peter Langfelder  wrote:

> On Thu, Mar 1, 2012 at 8:05 PM, Ben quant  wrote:
> > Hello,
> >
> > I have another question
> >
> > I have a data frame that looks like this:
> > a  b
> > 2007-03-31 "20070514" "20070410"
> > 2007-06-30 "20070814" "20070709"
> > 2007-09-30 "20071115" "20071009"
> > 2007-12-31 "20080213" "20080109"
> > 2008-03-31 "20080514" "20080407"
> > 2008-06-30 "20080814" "--"
> > 2008-09-30 "20081114" "20081007"
> > 2008-12-31 "20090217" "20090112"
> > 2009-03-31 "--" "20090407"
> > 2009-06-30 "20090817" "20090708"
> > 2009-09-30 "20091113" "--"
> > 2009-12-31 "20100212" "20100111"
> > 2010-03-31 "20100517" "20100412"
> > 2010-06-30 "20100816" "20100712"
> > 2010-09-30 "20101112" "20101007"
> > 2010-12-31 "20110214" "20110110"
> > 2011-03-31 "20110513" "20110411"
> > 2011-06-30 "20110815" "20110711"
> > 2011-09-30 "2015" "20111011"
> >
> > (actually it has about 10,00 columns)
> >
> > I'd like all of the strings to be formatted like 2011-11-15, 2011-10-11,
> > etc. as a data frame of the same dimensions and all of the and dimnames
> > intact. They don't have to be of date format. "--" can be NA or left the
> > same. It does have to be fast though...
>
> There may be a ready-made function for this, but if not, substring and
> paste are your friends. Look them up.
>
> Here's how I would do it:
>
> fix = function(x)
> {
>  year = substring(x, 1, 4);
>  mo = substring(x, 5, 6);
>  day = substring(x, 7, 8);
>  ifelse(year=="--", "NA", paste(year, mo, day, sep = "-"))
> }
>
> fixed = apply(YourDataFrame, 2, fix)
> dimnames(fixed) = dimnames(YourDataFrame)
>
> Since you don't provide an example I can't test it exhaustively but it
> seems to work for me.
>
> Peter
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data frame of strings formatted

2012-03-01 Thread Ben quant
Hello,

I have another question

I have a data frame that looks like this:
 a  b
2007-03-31 "20070514" "20070410"
2007-06-30 "20070814" "20070709"
2007-09-30 "20071115" "20071009"
2007-12-31 "20080213" "20080109"
2008-03-31 "20080514" "20080407"
2008-06-30 "20080814" "--"
2008-09-30 "20081114" "20081007"
2008-12-31 "20090217" "20090112"
2009-03-31 "--" "20090407"
2009-06-30 "20090817" "20090708"
2009-09-30 "20091113" "--"
2009-12-31 "20100212" "20100111"
2010-03-31 "20100517" "20100412"
2010-06-30 "20100816" "20100712"
2010-09-30 "20101112" "20101007"
2010-12-31 "20110214" "20110110"
2011-03-31 "20110513" "20110411"
2011-06-30 "20110815" "20110711"
2011-09-30 "2015" "20111011"

(actually it has about 10,00 columns)

I'd like all of the strings to be formatted like 2011-11-15, 2011-10-11,
etc. as a data frame of the same dimensions and all of the and dimnames
intact. They don't have to be of date format. "--" can be NA or left the
same. It does have to be fast though...

Thanks!

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fill data forward in data frame.

2012-03-01 Thread Ben quant
That is great! Thank you very much.

Ben

On Thu, Mar 1, 2012 at 2:57 PM, Petr Savicky  wrote:

> On Thu, Mar 01, 2012 at 02:31:01PM -0700, Ben quant wrote:
> > Hello,
> >
> > My direct desire is a good (fast) way to fill values forward until there
> is
> > another value then fill that value foward in the data xx (at the bottom
> of
> > this email).  For example, from row 1 to row 45 should be NA (no change),
> > but from row 46 row 136 the value should be 12649, and from row 137 to
> the
> > next value should be 13039.00.  The last line of code is all you need for
> > this part.
> >
> > If you are so inclined, my goal is this: I want to create a weekly time
> > series out of some data based on the report date. The report date is 'rd'
> > below, and is the correct date for the time series. My idea (in part seen
> > below) is to align rd and ua via the incorrect date (the time series
> date),
> > then merge that using the report date (rd) and a daily series (so I
> capture
> > all of the dates) of dates (dt). That gets the data in the right start
> > period. I've done all of this so far below and it looks fine. Then I plan
> > to roll all of those values forward to the next value (see question
> above),
> > then I'll do something like this:
> >
> > xx[weekdays(xx[,1]) == "Friday",]
> >
> > ...to get a weekly series of Friday values. I'm thinking someone probably
> > has a faster way of doing this. I have to do this many times, so speed is
> > important. Thanks!
> >
> > Here is what I have done so far:
> >
> > dt <- seq(from =as.Date("2009-06-01"), to = Sys.Date(), by = "day")
> >
> > > nms
> > [1] "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30"
> > "2010-09-30" "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30"
> > [11] "2011-12-31"
> >
> > > rd
> > 2009-06-30   2009-09-30   2009-12-31   2010-03-31   2010-06-30
> > 2010-09-30   2010-12-31   2011-03-31   2011-06-30   2011-09-30
> > "2009-07-16" "2009-10-15" "2010-01-19" "2010-04-19" "2010-07-19"
> > "2010-10-18" "2011-01-18" "2011-04-19" "2011-07-18" "2011-10-17"
> > 2011-12-31
> > "2012-01-19"
> >
> > > ua
> > 2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 2010-09-30
> > 2010-12-31 2011-03-31 2011-06-30 2011-09-30 2011-12-31
> > 12649.00   13039.00   13425.00   13731.00   14014.00   14389.00
> > 14833.00   15095.00   15481.43   15846.43   16186.43
> >
> > > x = merge(ua,rd,by='row.names')
> > > names(x) = c('z.date','val','rt_date')
> > > xx = merge(dt,x,by.y= 'rt_date',by.x=1,all.x=T)
> > > xx
> > x  z.date   val
> > 1   2009-06-01   NA
> > 2   2009-06-02   NA
> > 3   2009-06-03   NA
> > 4   2009-06-04   NA
> > 5   2009-06-05   NA
> >
> > ...ect
> >
> > 36  2009-07-06   NA
> > 37  2009-07-07   NA
> > 38  2009-07-08   NA
> > 39  2009-07-09   NA
> > 40  2009-07-10   NA
> > 41  2009-07-11   NA
> > 42  2009-07-12   NA
> > 43  2009-07-13   NA
> > 44  2009-07-14   NA
> > 45  2009-07-15   NA
> > 46  2009-07-16 2009-06-30 12649
> > 47  2009-07-17   NA
> > 48  2009-07-18   NA
> > 49  2009-07-19   NA
> > 50  2009-07-20   NA
> > 51  2009-07-21   NA
> > 52  2009-07-22   NA
> > 53  2009-07-23   NA
> > 54  2009-07-24   NA
> > 55  2009-07-25   NA
> > 56  2009-07-26   NA
> > 57  2009-07-27   NA
> > 58  2009-07-28   NA
> >
> > ...ect
> >
> > 129  2009-10-07  NA
> > 130  2009-10-08  NA
> > 131  2009-10-09  NA
> > 132  2009-10-10  NA
> > 133  2009-10-11  NA
> > 134  2009-10-12  NA
> > 135  2009-10-13  NA
> > 136  2009-10-14  NA
> > 137  2009-10-15 2009-09-30 13039.00
> > 138  2009-10-16  NA
> > 139  2009-10-17  NA
> > 140  2009-10-18  NA
> > 141  2009-10-19  NA
> > 142  2009-10-20  NA
> >

[R] fill data forward in data frame.

2012-03-01 Thread Ben quant
Hello,

My direct desire is a good (fast) way to fill values forward until there is
another value then fill that value foward in the data xx (at the bottom of
this email).  For example, from row 1 to row 45 should be NA (no change),
but from row 46 row 136 the value should be 12649, and from row 137 to the
next value should be 13039.00.  The last line of code is all you need for
this part.

If you are so inclined, my goal is this: I want to create a weekly time
series out of some data based on the report date. The report date is 'rd'
below, and is the correct date for the time series. My idea (in part seen
below) is to align rd and ua via the incorrect date (the time series date),
then merge that using the report date (rd) and a daily series (so I capture
all of the dates) of dates (dt). That gets the data in the right start
period. I've done all of this so far below and it looks fine. Then I plan
to roll all of those values forward to the next value (see question above),
then I'll do something like this:

xx[weekdays(xx[,1]) == "Friday",]

...to get a weekly series of Friday values. I'm thinking someone probably
has a faster way of doing this. I have to do this many times, so speed is
important. Thanks!

Here is what I have done so far:

dt <- seq(from =as.Date("2009-06-01"), to = Sys.Date(), by = "day")

> nms
[1] "2009-06-30" "2009-09-30" "2009-12-31" "2010-03-31" "2010-06-30"
"2010-09-30" "2010-12-31" "2011-03-31" "2011-06-30" "2011-09-30"
[11] "2011-12-31"

> rd
2009-06-30   2009-09-30   2009-12-31   2010-03-31   2010-06-30
2010-09-30   2010-12-31   2011-03-31   2011-06-30   2011-09-30
"2009-07-16" "2009-10-15" "2010-01-19" "2010-04-19" "2010-07-19"
"2010-10-18" "2011-01-18" "2011-04-19" "2011-07-18" "2011-10-17"
2011-12-31
"2012-01-19"

> ua
2009-06-30 2009-09-30 2009-12-31 2010-03-31 2010-06-30 2010-09-30
2010-12-31 2011-03-31 2011-06-30 2011-09-30 2011-12-31
12649.00   13039.00   13425.00   13731.00   14014.00   14389.00
14833.00   15095.00   15481.43   15846.43   16186.43

> x = merge(ua,rd,by='row.names')
> names(x) = c('z.date','val','rt_date')
> xx = merge(dt,x,by.y= 'rt_date',by.x=1,all.x=T)
> xx
x  z.date   val
1   2009-06-01   NA
2   2009-06-02   NA
3   2009-06-03   NA
4   2009-06-04   NA
5   2009-06-05   NA

...ect

36  2009-07-06   NA
37  2009-07-07   NA
38  2009-07-08   NA
39  2009-07-09   NA
40  2009-07-10   NA
41  2009-07-11   NA
42  2009-07-12   NA
43  2009-07-13   NA
44  2009-07-14   NA
45  2009-07-15   NA
46  2009-07-16 2009-06-30 12649
47  2009-07-17   NA
48  2009-07-18   NA
49  2009-07-19   NA
50  2009-07-20   NA
51  2009-07-21   NA
52  2009-07-22   NA
53  2009-07-23   NA
54  2009-07-24   NA
55  2009-07-25   NA
56  2009-07-26   NA
57  2009-07-27   NA
58  2009-07-28   NA

...ect

129  2009-10-07  NA
130  2009-10-08  NA
131  2009-10-09  NA
132  2009-10-10  NA
133  2009-10-11  NA
134  2009-10-12  NA
135  2009-10-13  NA
136  2009-10-14  NA
137  2009-10-15 2009-09-30 13039.00
138  2009-10-16  NA
139  2009-10-17  NA
140  2009-10-18  NA
141  2009-10-19  NA
142  2009-10-20  NA
143  2009-10-21  NA

...ect

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fridays date to date

2012-03-01 Thread Ben quant
Great thanks!

ben

On Thu, Mar 1, 2012 at 1:30 PM, Marc Schwartz  wrote:

> On Mar 1, 2012, at 2:02 PM, Ben quant wrote:
>
> > Hello,
> >
> > How do I get the dates of all Fridays between two dates?
> >
> > thanks,
> >
> > Ben
>
>
> Days <- seq(from = as.Date("2012-03-01"),
>to = as.Date("2012-07-31"),
>by = "day")
>
> > str(Days)
>  Date[1:153], format: "2012-03-01" "2012-03-02" "2012-03-03" "2012-03-04"
> ...
>
> # See ?weekdays
>
> > Days[weekdays(Days) == "Friday"]
>  [1] "2012-03-02" "2012-03-09" "2012-03-16" "2012-03-23" "2012-03-30"
>  [6] "2012-04-06" "2012-04-13" "2012-04-20" "2012-04-27" "2012-05-04"
> [11] "2012-05-11" "2012-05-18" "2012-05-25" "2012-06-01" "2012-06-08"
> [16] "2012-06-15" "2012-06-22" "2012-06-29" "2012-07-06" "2012-07-13"
> [21] "2012-07-20" "2012-07-27"
>
> HTH,
>
> Marc Schwartz
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fridays date to date

2012-03-01 Thread Ben quant
Hello,

How do I get the dates of all Fridays between two dates?

thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] group calculations with other columns for the ride

2012-02-28 Thread Ben quant
Excellent! I wonder why I haven't seen aggregate before.
Thanks!

ben

On Tue, Feb 28, 2012 at 4:51 PM, ilai  wrote:

>  aggregate(val~lvls+nm,data=x,FUN='median')
>
>
>
> On Tue, Feb 28, 2012 at 4:43 PM, Ben quant  wrote:
> > Hello,
> >
> > I can get the median for each factor, but I'd like another column to go
> > with each factor. The nm column is a long name for the lvls column. So
> > unique work except for the order can get messed up.
> >
> > Example:
> > x =
> >
> data.frame(val=1:10,lvls=c('cat2',rep("cat1",4),rep("cat2",4),'cat1'),nm=c('longname2',rep("longname1",4),rep("longname2",4),'longname1'))
> >  x
> > val lvlsnm
> > 11 cat2 longname2
> > 22 cat1 longname1
> > 33 cat1 longname1
> > 44 cat1 longname1
> > 55 cat1 longname1
> > 66 cat2 longname2
> > 77 cat2 longname2
> > 88 cat2 longname2
> > 99 cat2 longname2
> > 10  10 cat1 longname1
> >
> > unique doesn't work in data.frame:
> >  mdn = do.call(rbind,lapply(split(x[,1], x[,2]), median))
> >  data.frame(mdn,ln=as.character(unique(x[,3])))
> > mdnln
> > cat1   4 longname2
> > cat2   7 longname1
> >
> > I want:
> > mdnln
> > cat1   4 longname1
> > cat2   7 longname2
> >
> > Thank you very much!
> >
> > PS - looking for simple'ish solutions. I know I can do it with loops and
> > merges, but is there an option I am not using here?
> >
> > Ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] group calculations with other columns for the ride

2012-02-28 Thread Ben quant
Hello,

I can get the median for each factor, but I'd like another column to go
with each factor. The nm column is a long name for the lvls column. So
unique work except for the order can get messed up.

Example:
x =
data.frame(val=1:10,lvls=c('cat2',rep("cat1",4),rep("cat2",4),'cat1'),nm=c('longname2',rep("longname1",4),rep("longname2",4),'longname1'))
 x
val lvlsnm
11 cat2 longname2
22 cat1 longname1
33 cat1 longname1
44 cat1 longname1
55 cat1 longname1
66 cat2 longname2
77 cat2 longname2
88 cat2 longname2
99 cat2 longname2
10  10 cat1 longname1

unique doesn't work in data.frame:
 mdn = do.call(rbind,lapply(split(x[,1], x[,2]), median))
 data.frame(mdn,ln=as.character(unique(x[,3])))
mdnln
cat1   4 longname2
cat2   7 longname1

I want:
mdnln
cat1   4 longname1
cat2   7 longname2

Thank you very much!

PS - looking for simple'ish solutions. I know I can do it with loops and
merges, but is there an option I am not using here?

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rank with uniform count for each rank

2012-02-22 Thread Ben quant
Thank you everyone! We already use the Hmisc package so I'll likely use
cut2.

Ben

On Wed, Feb 22, 2012 at 2:22 PM, David Winsemius wrote:

>
> On Feb 22, 2012, at 4:01 PM, Ben quant wrote:
>
>  Hello,
>>
>> What is the best way to get ranks for a vector of values, limit the range
>> of rank values and create equal count in each group? I call this uniform
>> ranking...uniform count/number in each group.
>>
>> Here is an example using three groups:
>>
>> Say I have values:
>> x = c(3, 2, -3, 1, 0, 5, 10, 30, -1, 4)
>> names(x) = letters[1:10]
>>
>>> x
>>>
>> a  b  c  d  e  f   g   h   i   j
>> 3  2 -3  1  0  5 10 30 -1  4
>> I would like:
>> a  b  c  d  e  f  g  h  i  j
>> 2  2  1  2  1  3 3  3  1 3
>>
>> Same thing as above, maybe easier to see:
>> c   i  e  d  b  a   j   f  g   h
>> -3 -1  0  1  2  3  4  5 10 30
>> I would get:
>> c  i e d b a  j f  g h
>> 1 1 1 2 2 2 3 3 3 3
>>
>> Note that there are 4 values with a rank of 3 because I can't get even
>> numbers (10/3 = 3.333).
>>
>> Been to ?sort, ?order, ?quantile, ?cut, and ?split.
>>
>
> You may need to look more carefully at the definitions and adjustments to
> `cut` and `quantile` but this does roughly what you asked:
>
> n=3
> as.numeric( cut(x, breaks=quantile(x, prob=(0:n)/n) , include.lowest=TRUE)
> )
> @ [1] 1 1 1 1 2 2 2 3 3 3
>
> It a fairly common task and Harrell's cut2 function has a g= parameter
> (for number of groups)  that I generally use:
>
> library(Hmisc)
> >  cut2(x, g=3)
>  [1] [-3, 2) [-3, 2) [-3, 2) [-3, 2) [ 2, 5) [ 2, 5) [ 2, 5) [ 5,30] [
> 5,30] [ 5,30]
> Levels: [-3, 2) [ 2, 5) [ 5,30]
> >  as.numeric( cut2(x, g=3))
>  [1] 1 1 1 1 2 2 2 3 3 3
>
>
>
>
>> Thanks,
>>
>> Ben
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rank with uniform count for each rank

2012-02-22 Thread Ben quant
Hello,

What is the best way to get ranks for a vector of values, limit the range
of rank values and create equal count in each group? I call this uniform
ranking...uniform count/number in each group.

Here is an example using three groups:

Say I have values:
x = c(3, 2, -3, 1, 0, 5, 10, 30, -1, 4)
names(x) = letters[1:10]
> x
a  b  c  d  e  f   g   h   i   j
3  2 -3  1  0  5 10 30 -1  4
I would like:
a  b  c  d  e  f  g  h  i  j
2  2  1  2  1  3 3  3  1 3

Same thing as above, maybe easier to see:
 c   i  e  d  b  a   j   f  g   h
-3 -1  0  1  2  3  4  5 10 30
I would get:
c  i e d b a  j f  g h
1 1 1 2 2 2 3 3 3 3

Note that there are 4 values with a rank of 3 because I can't get even
numbers (10/3 = 3.333).

Been to ?sort, ?order, ?quantile, ?cut, and ?split.

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] proto: make a parameter persist

2012-02-21 Thread Ben quant
Thank you very much! I'll follow-up with more questions as I dabble...if I
have any.

Thank you

ben


On Tue, Feb 21, 2012 at 7:01 AM, Gabor Grothendieck  wrote:

> On Tue, Feb 21, 2012 at 12:15 AM, Ben quant  wrote:
> > Thanks again for your so far on proto. I have another question.
> >
> > What is the best way to "do stuff" based on data prior to calling a
> > function? I tried the code below without expr (and including commas after
> > data member assignments), but it errors out. I'd like to make decisions
> > based on inputs during proto object construction and prep my data_members
> > for use in the functions that (would) follow. I think I could probably
> get
> > around this with a small function with other functions within the same
> proto
> > object, but I'd rather not repeat that in each function...if that makes
> > sense. See my last line of code below:
> >
> > makeProto = proto( expr={
> >   data_member1=NULL
> >   data_member2=5
> >   data_member3=NULL
> >   if(!is.null(data_member1)){
> > with(.,data_member3 = data_member1 + data_member2)
> >   }
> > })
> > oo = makeProto$proto()
> > oo$data_member1 # NULL
> > oo$data_member2 # 5
> > oo$data_member3 # NULL
> > oo2 = makeProto$proto(data_member1 = 7)
> > oo2$data_member1 # 7
> > oo2$data_member2 # 5
> > oo2$data_member3 # I want this to be 12 (12 = 7 + 5), but I get NULL
> >
> > Its late for me so hopefully this makes sense...
> >
>
> There are multiple issues here:
>
> 1. The expr is executed at the time you define the proto object -- its
> not a method.  Once the proto object is defined the only thing that is
> left is the result of the computation so you can't spawn a child and
> then figure that this code will be rerun as if its a constructor.  You
> need to define a constructor method to do that.
>
> 2. You can't use dot as if it were a special notation -- its not.  A
> single dot is just the name of an ordinary variable and is not
> anything special that proto knows about.  In the examples where dot is
> used its used as the first formal argument to various methods but this
> was the choice of the method writer and not something required by
> proto.  We could have used self or this or any variable name.
>
> 3. Note that the code in expr=... is already evaluated in the
> environment of the proto object so you don't need with.
>
> 4. I personally find it clearer to reserve = for argument assignment
> and use <- for ordinary assignment but that is mostly a style issue
> and its up to you:
>
> 5. The discussion of traits in the proto vignette illustrates
> constructors -- be sure to read that.  Traits are not a special
> construct built into proto but rather its just a way in which you can
> use proto.   That is one of the advantages of the prototype model of
> OO -- you don't need to have special language constructs for many
> situations where ordinary OO needs such constructs since they are all
> subsumed under one more general set of primitives.
>
> Here we define the trait MakeProto (again, traits are not a special
> language feature of proto but are just a way of using it):
>
> MakeProto <- proto(
>   new = function(., ...) {
>  .$proto(expr = if ( !is.null(d1) ) d3 <- d1 + d2, ...)
>   },
>   d1 = NULL,
>   d2 = 5,
>   d3 = NULL
> )
>
> oo <- MakeProto$new()
> oo$d1 # NULL
> oo$d2 # 5
> oo$d3 # NULL
>
> oo2 <- MakeProto$new(d1 = 7)
> oo2$d1 # 7
> oo2$d2 # 5
> oo2$d3 # 12
>
> In the above oo$d1, oo$d2, oo$d3 are actually located in MakeProto and
> delegated to oo so that when one writes oo$d2 it looks into MakeProto
> since it cannot find d2 in oo.  oo2$d2 is also not in oo2 but
> delegated from MakeProto; however, oo2$d1 and oo2$d3 are located in
> oo2 itself.  That is due to the way we set it up and we could have set
> it up differently.  Try str(MakeProto); str(oo); str(oo2) to  see
> this.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] proto: make a parameter persist

2012-02-20 Thread Ben quant
Thanks again for your so far on proto. I have another question.

What is the best way to "do stuff" based on data prior to calling a
function? I tried the code below without expr (and including commas after
data member assignments), but it errors out. I'd like to make decisions
based on inputs during proto object construction and prep my data_members
for use in the functions that (would) follow. I think I could probably get
around this with a small function with other functions within the same
proto object, but I'd rather not repeat that in each function...if that
makes sense. See my last line of code below:

makeProto = proto( expr={
  data_member1=NULL
  data_member2=5
  data_member3=NULL
  if(!is.null(data_member1)){
with(.,data_member3 = data_member1 + data_member2)
  }
})
oo = makeProto$proto()
oo$data_member1 # NULL
oo$data_member2 # 5
oo$data_member3 # NULL
oo2 = makeProto$proto(data_member1 = 7)
oo2$data_member1 # 7
oo2$data_member2 # 5
oo2$data_member3 # I want this to be 12 (12 = 7 + 5), but I get NULL

Its late for me so hopefully this makes sense...

Thanks!

ben

On Fri, Feb 17, 2012 at 11:38 PM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

> On Sat, Feb 18, 2012 at 12:44 AM, Ben quant  wrote:
> > The code below works as expected but:
> > Using the proto package, is this the best way to 1) make a parameter
> > persist if the parameter is passed
> > in with a value, 2) allow for calling the bias() function without a
> > parameter assignment, 3) have
> > the x2 value initialize as 5? Thanks for your feedback. Giving the
> > proto package a test beat and
> > establishing some templates for myself.
> >
> >> oo <- proto(expr = {x = c(10, 20, 15, 19, 17) x2 =
> 5 # so x2 initializes as 5, but can be overwritten with param assignment
>   bias <- function(.,x2=.$x2) { # x2=.$x2 so no default
> param is needed .$x2 = x2 # so x2 persists in the
> env .$x <- .$x + x2 } })> o =
> oo$proto()> o$x # [1] 10 20 15 19 17> o$x2 #[1] 5> o$bias(x2 = 100)> o$x2 #
> [1] 100> o$x # [1] 110 120 115 119 117
> >
>
> This is not very different from what you have already but here it is
> for comparison.  Note that the with(...) line has the same meaning as
> .$x <- .$x + .$x2 :
>
> oo <- proto(
>   x = c(10, 20, 15, 19, 17),
>   x2 = 5,
>   bias = function(., x2) {
>  if (!missing(x2)) .$x2 <- x2
>  with(., x <- x + x2)
>   }
> )
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] proto: make a parameter persist

2012-02-20 Thread Ben quant
I like it better. Thanks!

Ben

On Fri, Feb 17, 2012 at 11:38 PM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

> On Sat, Feb 18, 2012 at 12:44 AM, Ben quant  wrote:
> > The code below works as expected but:
> > Using the proto package, is this the best way to 1) make a parameter
> > persist if the parameter is passed
> > in with a value, 2) allow for calling the bias() function without a
> > parameter assignment, 3) have
> > the x2 value initialize as 5? Thanks for your feedback. Giving the
> > proto package a test beat and
> > establishing some templates for myself.
> >
> >> oo <- proto(expr = {x = c(10, 20, 15, 19, 17) x2 =
> 5 # so x2 initializes as 5, but can be overwritten with param assignment
>   bias <- function(.,x2=.$x2) { # x2=.$x2 so no default
> param is needed .$x2 = x2 # so x2 persists in the
> env .$x <- .$x + x2 } })> o =
> oo$proto()> o$x # [1] 10 20 15 19 17> o$x2 #[1] 5> o$bias(x2 = 100)> o$x2 #
> [1] 100> o$x # [1] 110 120 115 119 117
> >
>
> This is not very different from what you have already but here it is
> for comparison.  Note that the with(...) line has the same meaning as
> .$x <- .$x + .$x2 :
>
> oo <- proto(
>   x = c(10, 20, 15, 19, 17),
>   x2 = 5,
>   bias = function(., x2) {
>  if (!missing(x2)) .$x2 <- x2
>  with(., x <- x + x2)
>   }
> )
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] proto: make a parameter persist

2012-02-17 Thread Ben quant
The code below works as expected but:
Using the proto package, is this the best way to 1) make a parameter
persist if the parameter is passed
in with a value, 2) allow for calling the bias() function without a
parameter assignment, 3) have
the x2 value initialize as 5? Thanks for your feedback. Giving the
proto package a test beat and
establishing some templates for myself.

> oo <- proto(expr = {x = c(10, 20, 15, 19, 17) x2 = 5 # so 
> x2 initializes as 5, but can be overwritten with param assignment 
> bias <- function(.,x2=.$x2) { # x2=.$x2 so no default param is needed 
> .$x2 = x2 # so x2 persists in the env 
> .$x <- .$x + x2 } })> o = oo$proto()> o$x # [1] 
> 10 20 15 19 17> o$x2 #[1] 5> o$bias(x2 = 100)> o$x2 # [1] 100> o$x # [1] 110 
> 120 115 119 117

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sequencing environments

2012-02-17 Thread Ben quant
Thanks Gabor/Duncan,

I might give that proto package a try. The R.oo package is more intuitive
for someone coming from a traditional OO background, but compared to proto,
it looks like it requires a lot more typing to create the same amount of
functionality. I've used R.oo for a number of months now and it works
great.  The other option is to just use get() and assign(), like I
suggested in my original post, which seems to be the simplest, but more
typing than proto.

Thanks for the info! Have a good weekend...

ben

On Wed, Feb 15, 2012 at 11:09 PM, Gabor Grothendieck <
ggrothendi...@gmail.com> wrote:

> On Wed, Feb 15, 2012 at 11:58 PM, Ben quant  wrote:
> > Thank you Duncan. Interesting. I find it strange that you can't get a
> list
> > of the environments. But I'll deal with it...
> >
> > Anyway, I'm about to start a new R dev project for my company. I'm
> thinking
> > about architecture, organization, and gotchas. I went through much of the
> > documentation you sent me. Thanks!. I came up with what I think is the
> best
> > way to implement environments (which I am using like I would use a class
> in
> > a traditional OO language) that can be reused in various programs.
> >
> > I'm thinking of creating different scripts like this:
> > #this is saved as script name EnvTest.R
> > myEnvir = new.env()
> > var1 = 2 + 2
> > assign("myx",var1,envir=myEnvir)
> >
> > Then I will write programs like this that will use the environments and
> the
> > objects/functions they contain:
> > source("EnvTest.r")
> > prgmVar1 = get("myVar1",pos=myEnvir)
> > ## do stuff with env objects
> > print(prgmVar1)
> >
> > Do you think this is the best way to use environments to avoid naming
> > conflicts, take advantage of separation of data, organize scripting
> > logically, etc. (the benefits of traditional OO classes)? Eventually,
> I'll
> > use this on a Linux machine in the cloud using.:
> > https://github.com/armstrtw/rzmq
> > https://github.com/armstrtw/AWS.tools
> > https://github.com/armstrtw/deathstar
> > http://code.google.com/p/segue/
>
>
> Reference classes, the oo.R package and the proto package provide OO
> implementations based on environments.
>
> Being particular familiar with the proto package
> (http://r-proto.googlecode.com), I will discuss it.  The graph.proto
> function in that package will draw a graphViz graph of your proto
> objects (environments).  Using p and x in place of myEnv and myx your
> example is as follows.
>
> library(proto)
> p <- proto(x = 2+2)
> p$x  # 4
>
> # add a method, incr
> p$incr <- function(.) .$x <- .$x + 1
> p$incr() # increment x
> p$x # 5
>
> # create a child
> # it overrides x; inherits incr from p
> ch <- p$proto(x = 100)
> ch$incr()
> ch$x # 101
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sequencing environments

2012-02-15 Thread Ben quant
Thank you Duncan. Interesting. I find it strange that you can't get a list
of the environments. But I'll deal with it...

Anyway, I'm about to start a new R dev project for my company. I'm thinking
about architecture, organization, and gotchas. I went through much of the
documentation you sent me. Thanks!. I came up with what I think is the best
way to implement environments (which I am using like I would use a class in
a traditional OO language) that can be reused in various programs.

I'm thinking of creating different scripts like this:
#this is saved as script name EnvTest.R
myEnvir = new.env()
var1 = 2 + 2
assign("myx",var1,envir=myEnvir)

Then I will write programs like this that will use the environments and the
objects/functions they contain:
source("EnvTest.r")
prgmVar1 = get("myVar1",pos=myEnvir)
## do stuff with env objects
print(prgmVar1)

Do you think this is the best way to use environments to avoid naming
conflicts, take advantage of separation of data, organize scripting
logically, etc. (the benefits of traditional OO classes)? Eventually, I'll
use this on a Linux machine in the cloud using.:
https://github.com/armstrtw/rzmq
https://github.com/armstrtw/AWS.tools
https://github.com/armstrtw/deathstar
http://code.google.com/p/segue/

...do you (or anyone else) see any gotchas here? Any suggestions, help,
things to watch for are welcome...

Note: I am aware of the (surprising?) scoping rules.

Thanks so much for your help.

Ben

On Tue, Feb 14, 2012 at 5:04 AM, Duncan Murdoch wrote:

> On 12-02-14 12:34 AM, Ben quant wrote:
>
>> Hello,
>>
>> I can get at environments if I know their names, but what if want to look
>> at what environments currently exist at some point in a script? In other
>> words, if I don't know what environments exist and I don't know their
>> sequence/hierarchy, how do I display a visual representation of the
>> environments and how they relate to one another?
>>
>
> Environments are objects and most of them are maintained in the same
> places as other objects (including some obscure places, such as in
> structures maintained only in external package code), so it's not easy to
> generate a complete list.
>
>
>
>> I'm looking at getting away from the package R.oo and using R in normal
>> state, but I need a way to "check in on" the status and organization of my
>> environments.
>>
>> I've done considerable research on R's environments, but its a challenging
>> thing to google and come up with meaningful results.
>>
>
> I would suggest reading the technical documentation:  the R Language
> manual, the R Internals manual, and some of the papers on the "Technical
> papers" page.
>
> Duncan Murdoch
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sequencing environments

2012-02-13 Thread Ben quant
Hello,

I can get at environments if I know their names, but what if want to look
at what environments currently exist at some point in a script? In other
words, if I don't know what environments exist and I don't know their
sequence/hierarchy, how do I display a visual representation of the
environments and how they relate to one another?

I'm looking at getting away from the package R.oo and using R in normal
state, but I need a way to "check in on" the status and organization of my
environments.

I've done considerable research on R's environments, but its a challenging
thing to google and come up with meaningful results.

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] folders of path - platform independent (repost)

2011-12-29 Thread Ben quant
Oops. I guess I stopped reading about the fsep param when I saw PATH and
R_LIB because I'm not interested in those. I didn't get to the part I was
interested in. Thanks!

Ben

On Wed, Dec 28, 2011 at 5:33 PM, David Winsemius wrote:

>
> On Dec 28, 2011, at 5:57 PM, Ben quant wrote:
>
>  One quick follow-up on reversing your example. Is there an easy way to get
>> the file.path separator for the platform?  file.path("","") seems the be
>> the only way to do it.
>>
>
> I don't get it. Did you look at ?file.path  ? It's default call shows fsep=
>
> > .Platform$file.sep
> [1] "/"
>
> ?.Platform
>
> --
> David.
>
>
>> So if filename is a valid file path, this will return the folders, drive,
>> and file name in vector form regardless of the platform:
>> folders = strsplit(normalizePath(**filename, winslash="/"), "/")[[1]]
>> This will undo the above regardless of the platform:
>> paste(folders,collapse=file.**path('"",""))
>>
>
>
>
>> Thanks again for your help Duncan!
>>
>> Ben
>>
>>
>>  On Wed, Dec 28, 2011 at 2:37 PM, Duncan Murdoch <
>>> murdoch.dun...@gmail.com>**wrote:
>>>
>>>  On 11-12-28 4:30 PM, Ben quant wrote:
>>>>
>>>>  Hello, (sorry re-posting due to typo)
>>>>>
>>>>> I'm attempting to get the folders of a path in a robust way (platform
>>>>> independent, format independent). It has to run on Windows and Linux
>>>>> and
>>>>> tolerate different formats.
>>>>>
>>>>> For these: (The paths don't actually exist in Linux but you get the
>>>>> idea.)
>>>>>
>>>>> Windows:
>>>>> file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf"
>>>>> file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf"
>>>>> Linux:
>>>>> file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf"
>>>>> file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf"
>>>>>
>>>>> I would get for Windows: "C", "Program Files", "R",
>>>>> "R-2.13.1","NEWS.pdf"
>>>>> I would get for Linux: "home","username", "Program Files", "R",
>>>>> "R-2.13.1","NEWS.pdf"
>>>>> (The drive and/or home/username aren't necessary, but would be nice to
>>>>> have. Also, that file name isn't necessary, but would be nice.)
>>>>>
>>>>> Thank you for your help,
>>>>>
>>>>>
>>>>>  If you use the normalizePath() function with winslash="/", then all
>>>> current platforms will return a path using "/" as the separator, so you
>>>> could do something like
>>>>
>>>> strsplit(normalizePath(filename, winslash="/"), "/")[[1]]
>>>>
>>>>
>>>> You need to be careful with normalizePath:  at least on Windows, it will
>>>> not necessarily do what you wanted if the filename doesn't exist.
>>>>
>>>> Duncan Murdoch
>>>>
>>>>
>>>
>>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] folders of path - platform independent (repost)

2011-12-28 Thread Ben quant
One quick follow-up on reversing your example. Is there an easy way to get
the file.path separator for the platform?  file.path("","") seems the be
the only way to do it.

So if filename is a valid file path, this will return the folders, drive,
and file name in vector form regardless of the platform:
folders = strsplit(normalizePath(filename, winslash="/"), "/")[[1]]
This will undo the above regardless of the platform:
paste(folders,collapse=file.path('"",""))

Thanks again for your help Duncan!

Ben


> On Wed, Dec 28, 2011 at 2:37 PM, Duncan Murdoch 
> wrote:
>
>> On 11-12-28 4:30 PM, Ben quant wrote:
>>
>>> Hello, (sorry re-posting due to typo)
>>>
>>> I'm attempting to get the folders of a path in a robust way (platform
>>> independent, format independent). It has to run on Windows and Linux and
>>> tolerate different formats.
>>>
>>> For these: (The paths don't actually exist in Linux but you get the
>>> idea.)
>>>
>>> Windows:
>>> file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf"
>>> file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf"
>>> Linux:
>>> file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf"
>>> file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf"
>>>
>>> I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf"
>>> I would get for Linux: "home","username", "Program Files", "R",
>>> "R-2.13.1","NEWS.pdf"
>>> (The drive and/or home/username aren't necessary, but would be nice to
>>> have. Also, that file name isn't necessary, but would be nice.)
>>>
>>> Thank you for your help,
>>>
>>>
>> If you use the normalizePath() function with winslash="/", then all
>> current platforms will return a path using "/" as the separator, so you
>> could do something like
>>
>> strsplit(normalizePath(**filename, winslash="/"), "/")[[1]]
>>
>> You need to be careful with normalizePath:  at least on Windows, it will
>> not necessarily do what you wanted if the filename doesn't exist.
>>
>> Duncan Murdoch
>>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] folders of path - platform independent (repost)

2011-12-28 Thread Ben quant
Excellent!

Thanks,

ben

On Wed, Dec 28, 2011 at 2:37 PM, Duncan Murdoch wrote:

> On 11-12-28 4:30 PM, Ben quant wrote:
>
>> Hello, (sorry re-posting due to typo)
>>
>> I'm attempting to get the folders of a path in a robust way (platform
>> independent, format independent). It has to run on Windows and Linux and
>> tolerate different formats.
>>
>> For these: (The paths don't actually exist in Linux but you get the idea.)
>>
>> Windows:
>> file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf"
>> file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf"
>> Linux:
>> file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf"
>> file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf"
>>
>> I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf"
>> I would get for Linux: "home","username", "Program Files", "R",
>> "R-2.13.1","NEWS.pdf"
>> (The drive and/or home/username aren't necessary, but would be nice to
>> have. Also, that file name isn't necessary, but would be nice.)
>>
>> Thank you for your help,
>>
>>
> If you use the normalizePath() function with winslash="/", then all
> current platforms will return a path using "/" as the separator, so you
> could do something like
>
> strsplit(normalizePath(**filename, winslash="/"), "/")[[1]]
>
> You need to be careful with normalizePath:  at least on Windows, it will
> not necessarily do what you wanted if the filename doesn't exist.
>
> Duncan Murdoch
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] folders of path - platform independent (repost)

2011-12-28 Thread Ben quant
Hello, (sorry re-posting due to typo)

I'm attempting to get the folders of a path in a robust way (platform
independent, format independent). It has to run on Windows and Linux and
tolerate different formats.

For these: (The paths don't actually exist in Linux but you get the idea.)

Windows:
file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf"
file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf"
Linux:
file_full_path = "~/Program FilesR/R-2.13.1/NEWS.pdf"
file_full_path = "/home/username/Program FilesR/R-2.13.1/NEWS.pdf"

I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf"
I would get for Linux: "home","username", "Program Files", "R",
"R-2.13.1","NEWS.pdf"
(The drive and/or home/username aren't necessary, but would be nice to
have. Also, that file name isn't necessary, but would be nice.)

Thank you for your help,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] folders of path - platform independent

2011-12-28 Thread Ben quant
Hello,

I'm attempting to get the folders of a path in a robust way (platform
independent, format independent). It has to run on Windows and Linux and
tolerate different formats.

For these: (The paths don't actually exist in Linux but you get the idea.)

Windows:
file_full_path = "C://Program Files//R//R-2.13.1//NEWS.pdf"
file_full_path = "C:\Program Files\R\R-2.13.1\NEWS.pdf"
Linux:
file_full_path = "~/Program FilesR\R-2.13.1\NEWS.pdf"
file_full_path = "/home/username/Program FilesR\R-2.13.1\NEWS.pdf"

I would get for Windows: "C", "Program Files", "R", "R-2.13.1","NEWS.pdf"
I would get for Linux: "home","username", "Program Files", "R",
"R-2.13.1","NEWS.pdf"
(The drive and/or home/username aren't necessary, but would be nice to
have. Also, that file name isn't necessary, but would be nice.)

Thank you for your help,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R on the cloud - Windows to Linux

2011-12-20 Thread Ben quant
: expected primary-expression before â>â token

interface.cpp:202: error: expected `)' before â;â token

interface.cpp:204: error: âmsgâ was not declared in this scope

interface.cpp:210: error: âmsgâ was not declared in this scope

interface.cpp: In function âSEXPREC* receiveInt(SEXPREC*)â:

interface.cpp:227: error: âzmqâ has not been declared

interface.cpp:227: error: expected `;' before âmsgâ

interface.cpp:228: error: âzmqâ has not been declared

interface.cpp:228: error: âsocketâ was not declared in this scope

interface.cpp:228: error: expected type-specifier before âzmqâ

interface.cpp:228: error: expected `>' before âzmqâ

interface.cpp:228: error: expected `(' before âzmqâ

interface.cpp:228: error: âzmqâ has not been declared

interface.cpp:228: error: expected primary-expression before â>â token

interface.cpp:228: error: expected `)' before â;â token

interface.cpp:230: error: âmsgâ was not declared in this scope

interface.cpp:235: error: âmsgâ was not declared in this scope

interface.cpp:240: error: âmsgâ was not declared in this scope

interface.cpp: In function âSEXPREC* receiveDouble(SEXPREC*)â:

interface.cpp:250: error: âzmqâ has not been declared

interface.cpp:250: error: expected `;' before âmsgâ

interface.cpp:251: error: âzmqâ has not been declared

interface.cpp:251: error: âsocketâ was not declared in this scope

interface.cpp:251: error: expected type-specifier before âzmqâ

interface.cpp:251: error: expected `>' before âzmqâ

interface.cpp:251: error: expected `(' before âzmqâ

interface.cpp:251: error: âzmqâ has not been declared

interface.cpp:251: error: expected primary-expression before â>â token

interface.cpp:251: error: expected `)' before â;â token

interface.cpp:253: error: âmsgâ was not declared in this scope

interface.cpp:258: error: âmsgâ was not declared in this scope

interface.cpp:263: error: âmsgâ was not declared in this scope

make: *** [interface.o] Error 1

ERROR: compilation failed for package ârzmqâ

* removing â/home/bnachtrieb/R/x86_64-redhat-linux-gnu-library/2.13/rzmqâ



The downloaded packages are in

â/tmp/RtmpoTdDMm/downloaded_packagesâ

Warning message:

In install.packages("rzmq", dependencies = TRUE) :

  installation of package 'rzmq' had non-zero exit status

>



Thank you for your help!


Ben

On Wed, Dec 7, 2011 at 7:00 PM, Whit Armstrong wrote:

> subscribe to R-hpc.
>
> and check out these:
> https://github.com/armstrtw/rzmq
> https://github.com/armstrtw/AWS.tools
> https://github.com/armstrtw/deathstar
>
> and this:
> http://code.google.com/p/segue/
>
> If you're willing to work, you can probably get deathstar to work
> using a local windows box and remote linux nodes.
>
> -Whit
>
>
> On Wed, Dec 7, 2011 at 6:02 PM, Ben quant  wrote:
> > Hello,
> >
> > I'm working with the gam function and due to the amount of data I am
> > working with it is taking a long time to run. I looked at the tips to get
> > it to run faster, but none have acceptable side effects. That is the real
> > problem.
> >
> > I have accepted that gam will run a long time. I will be running gam many
> > times for many different models. To make gam useable I am looking at
> > splitting the work up and putting all of it on an Amazon EC2 cloud. I
> have
> > a Windows machine and I'm (planning on) running Linux EC2 instances via
> > Amazon.
> >
> > I have R running on one EC2 instance now. Now I'm looking to:
> >
> > 1) division of processing
> > 2) creating/terminating instances via R
> > 3) porting code and data to the cloud
> > 4) producing plots on the cloud and getting them back on my (Windows)
> > computer for review
> > 5) do all of the above programmically (over night)
> >
> > I am new'ish to R, brand new to the cloud, and I am new to Linux (but I
> > have access to a Linux expert at my company). I'm looking for 1) guidance
> > so I am headed in the best direction from the start, 2) any gotchas I can
> > learn from, 3) package suggestions.
> >
> > Thank you very much for your assistance!
> >
> > Regards,
> >
> > Ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam, what is the function(s)

2011-12-09 Thread Ben quant
Thank you Simon. I already ordered your book.

Regards,
Ben

On Fri, Dec 9, 2011 at 10:49 AM, Simon Wood  wrote:

> See help("mgcv-FAQ"), item 2.
>
> best,
> Simon
>
>
> On 09/12/11 15:05, Ben quant wrote:
>
>> Hello,
>>
>> I'd like to understand 'what' is predicting the response for library(mgcv)
>> gam?
>>
>> For example:
>>
>> library(mgcv)
>> fit<- gam(y~s(x),data=as.data.frame(**l_yx),family=binomial)
>> xx<- seq(min(l_yx[,2]),max(l_yx[,2]**),len=101)
>> plot(xx,predict(fit,data.**frame(x=xx),type="response"),**type="l")
>>
>> I want to see the generalized function(s) used to predict the response
>> that
>> is plotted above. In other words, f(x) = {[what?]}. I'm new to gam and
>> relatively new to R. I did read ?gam, but I didn't see what I wanted.
>>
>> Thanks,
>>
>> Ben
>>
>>[[alternative HTML version deleted]]
>>
>> __**
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
> +44 (0)1225 386603   http://people.bath.ac.uk/sw283
>
>
> __**
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] gam, what is the function(s)

2011-12-09 Thread Ben quant
Hello,

I'd like to understand 'what' is predicting the response for library(mgcv)
gam?

For example:

library(mgcv)
fit <- gam(y~s(x),data=as.data.frame(l_yx),family=binomial)
xx <- seq(min(l_yx[,2]),max(l_yx[,2]),len=101)
plot(xx,predict(fit,data.frame(x=xx),type="response"),type="l")

I want to see the generalized function(s) used to predict the response that
is plotted above. In other words, f(x) = {[what?]}. I'm new to gam and
relatively new to R. I did read ?gam, but I didn't see what I wanted.

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R on the cloud - Windows to Linux

2011-12-08 Thread Ben quant
Due to my lack of experience with R and the cloud I am leery about
attempting any patch dev for Windows compatibility. I think it would be
cool to contribute at some point, but I think I am still too new.

Anyway, I'm looking into using my company's linux server via Putty and use
that as my local machine (as you suggest).

Thanks!

Ben

On Thu, Dec 8, 2011 at 9:44 AM, Whit Armstrong wrote:

> > I don't know where to start because, it looks like rzmq is not available
> for
> > Windows and it looks like AWS.tools and deathstar depend on rzmq, so by
>
> Hence my reference to work.  patches welcome.
>
> > Will using a
> > local Windows box continue to be an issue as I progress with R and EC2?
> I've
> > run into several hurdles already, including some that are not associated
> > with the cloud.
>
> My opinion only, but if you want to use big data and hpc, then use linux.
>
> If you move your data into s3, you can simply boot up a micro linux
> instance in the cloud and do your development there (I think usage of
> a micro instance is free w/ a new AWS account).
>
> If you have local linux servers available, then even better.
>
> -Whit
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R on the cloud - Windows to Linux

2011-12-08 Thread Ben quant
Thank you!

I subscribed to R-hpc, thanks. I replied and I'm waiting for list approval.

I am willing to work, but I'm not sure what to do to get these to work.  I
literally started using the cloud yesterday and R a couple months ago.

I don't know where to start because, it looks like rzmq is not available
for Windows and it looks like AWS.tools and deathstar depend on rzmq, so by
dependency they seem unavailable to me since I have a local Windows box. Or
do I have this wrong? I want to work, but where do I start? Will using a
local Windows box continue to be an issue as I progress with R and EC2?
I've run into several hurdles already, including some that are not
associated with the cloud.

Thank you for your help!

Ben




On Wed, Dec 7, 2011 at 7:00 PM, Whit Armstrong wrote:

> subscribe to R-hpc.
>
> and check out these:
> https://github.com/armstrtw/rzmq
> https://github.com/armstrtw/AWS.tools
> https://github.com/armstrtw/deathstar
>
> and this:
> http://code.google.com/p/segue/
>
> If you're willing to work, you can probably get deathstar to work
> using a local windows box and remote linux nodes.
>
> -Whit
>
>
> On Wed, Dec 7, 2011 at 6:02 PM, Ben quant  wrote:
> > Hello,
> >
> > I'm working with the gam function and due to the amount of data I am
> > working with it is taking a long time to run. I looked at the tips to get
> > it to run faster, but none have acceptable side effects. That is the real
> > problem.
> >
> > I have accepted that gam will run a long time. I will be running gam many
> > times for many different models. To make gam useable I am looking at
> > splitting the work up and putting all of it on an Amazon EC2 cloud. I
> have
> > a Windows machine and I'm (planning on) running Linux EC2 instances via
> > Amazon.
> >
> > I have R running on one EC2 instance now. Now I'm looking to:
> >
> > 1) division of processing
> > 2) creating/terminating instances via R
> > 3) porting code and data to the cloud
> > 4) producing plots on the cloud and getting them back on my (Windows)
> > computer for review
> > 5) do all of the above programmically (over night)
> >
> > I am new'ish to R, brand new to the cloud, and I am new to Linux (but I
> > have access to a Linux expert at my company). I'm looking for 1) guidance
> > so I am headed in the best direction from the start, 2) any gotchas I can
> > learn from, 3) package suggestions.
> >
> > Thank you very much for your assistance!
> >
> > Regards,
> >
> > Ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R on the cloud - Windows to Linux

2011-12-07 Thread Ben quant
Hello,

I'm working with the gam function and due to the amount of data I am
working with it is taking a long time to run. I looked at the tips to get
it to run faster, but none have acceptable side effects. That is the real
problem.

I have accepted that gam will run a long time. I will be running gam many
times for many different models. To make gam useable I am looking at
splitting the work up and putting all of it on an Amazon EC2 cloud. I have
a Windows machine and I'm (planning on) running Linux EC2 instances via
Amazon.

I have R running on one EC2 instance now. Now I'm looking to:

1) division of processing
2) creating/terminating instances via R
3) porting code and data to the cloud
4) producing plots on the cloud and getting them back on my (Windows)
computer for review
5) do all of the above programmically (over night)

I am new'ish to R, brand new to the cloud, and I am new to Linux (but I
have access to a Linux expert at my company). I'm looking for 1) guidance
so I am headed in the best direction from the start, 2) any gotchas I can
learn from, 3) package suggestions.

Thank you very much for your assistance!

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Thank you so much for your help.

The data I am using is the last file called l_yx.RData at this link (the
second file contains the plots from earlier):
http://scientia.crescat.net/static/ben/

Seems like the warning went away with pmin(x,1) but now the OR is over
15k.  If I multiple my x's by 1000 I get a much more realistic OR. So I
guess this brings me to a much different question: aren't OR's comparable
between factors/data? In this case they don't seem to be. However, with
different data the OR's only change a very small amount (+8.0e-4) when I
multiply the x's by 1000. I don't understand.

Anyways, here is a run with the raw data and a run with your suggestion
(pmin(x,1)) that removed the error:

> l_logit = glm(y~x, data=as.data.frame(l_yx),
family=binomial(link="logit"))

> l_logit

Call:  glm(formula = y ~ x, family = binomial(link = "logit"), data =
as.data.frame(l_yx))

Coefficients:
(Intercept)x
 -2.2938.059

Degrees of Freedom: 690302 Total (i.e. Null);  690301 Residual
Null Deviance:  448800
Residual Deviance: 447100   AIC: 447100

> l_exp_coef = exp(l_logit$coefficients)[2]

> l_exp_coef
   x
3161.781

> dim(l_yx)
[1] 690303  2

> l_yx = cbind(l_yx[,1],pmin(l_yx[,2],1))

> dim(l_yx)
[1] 690303  2

> colnames(l_yx) = c('y','x')

> mean(l_yx[,2])
[1] 0.01117248

> range(l_yx[,2])
[1] 0 1

> head(l_yx[,2])
[1] 0.00302316 0.07932130 0. 0.01779657 0.16083735 0.

> unique(l_yx[,1])
[1] 0 1

> l_logit = glm(y~x, data=as.data.frame(l_yx),
family=binomial(link="logit"))

> l_logit

Call:  glm(formula = y ~ x, family = binomial(link = "logit"), data =
as.data.frame(l_yx))

Coefficients:
(Intercept)x
 -2.3129.662

Degrees of Freedom: 690302 Total (i.e. Null);  690301 Residual
Null Deviance:  448800
Residual Deviance: 446800   AIC: 446800

> l_exp_coef = exp(l_logit$coefficients)[2]

> l_exp_coef
   x
15709.52


Thanks,

Ben



On Thu, Dec 1, 2011 at 4:32 PM, peter dalgaard  wrote:

>
> On Dec 1, 2011, at 23:43 , Ben quant wrote:
>
> > I'm not proposing this as a permanent solution, just investigating the
> warning. I zeroed out the three outliers and received no warning. Can
> someone tell me why I am getting no warning now?
>
> It's easier to explain why you got the warning before. If the OR for a one
> unit change is 3000, the OR for a 14 unit change is on the order of 10^48
> and that causes over/underflow in the conversion to probabilities.
>
> I'm still baffled at how you can get that model fitted to your data,
> though. One thing is that you can have situations where there are fitted
> probabilities of one corresponding to data that are all one and/or fitted
> zeros where data are zero, but you seem to have cases where you have both
> zeros and ones at both ends of the range of x. Fitting a zero to a one or
> vice versa would make the likelihood zero, so you'd expect that the
> algorithm would find a better set of parameters rather quickly. Perhaps the
> extremely large number of observations that you have has something to do
> with it?
>
> You'll get the warning if the fitted zeros or ones occur at any point of
> the iterative procedure. Maybe it isn't actually true for the final model,
> but that wouldn't seem consistent with the OR that you cited.
>
> Anyways, your real problem lies with the distribution of the x values. I'd
> want to try transforming it to something more sane. Taking logarithms is
> the obvious idea, but you'd need to find out what to do about the zeros --
> perhaps log(x + 1e-4) ? Or maybe just cut the outliers down to size with
> pmin(x,1).
>
> >
> > I did this 3 times to get rid of the 3 outliers:
> > mx_dims = arrayInd(which.max(l_yx), dim(l_yx))
> > l_yx[mx_dims] = 0
> >
> > Now this does not produce an warning:
> > l_logit = glm(y~x, data=as.data.frame(l_yx),
> family=binomial(link="logit"))
> >
> > Can someone tell me why occurred?
> >
> > Also, again, here are the screen shots of my data that I tried to send
> earlier (two screen shots, two pages):
> > http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
> >
> > Thank you for your help,
> >
> > Ben
> >
> > On Thu, Dec 1, 2011 at 3:25 PM, Ben quant  wrote:
> > Oops! Please ignore my last post. I mistakenly gave you different data I
> was testing with. This is the correct data:
> >
> > Here you go:
> >
> > > attach(as.data.frame(l_yx))
> > >  range(x[y==0])
> > [1]  0.0 14.66518
> > > range(x[y==1])
> > [1]  0.0 13.49791
> &

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
I'm not proposing this as a permanent solution, just investigating the
warning. I zeroed out the three outliers and received no warning. Can
someone tell me why I am getting no warning now?

I did this 3 times to get rid of the 3 outliers:
mx_dims = arrayInd(which.max(l_yx), dim(l_yx))
l_yx[mx_dims] = 0

Now this does not produce an warning:
l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link="logit"))

Can someone tell me why occurred?

Also, again, here are the screen shots of my data that I tried to send
earlier (two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you for your help,

Ben

On Thu, Dec 1, 2011 at 3:25 PM, Ben quant  wrote:

> Oops! Please ignore my last post. I mistakenly gave you different data I
> was testing with. This is the correct data:
>
> Here you go:
>
> > attach(as.data.frame(l_yx))
> >  range(x[y==0])
> [1]  0.0 14.66518
> > range(x[y==1])
> [1]  0.0 13.49791
>
>
> How do I know what is acceptable?
>
> Also, here are the screen shots of my data that I tried to send earlier
> (two screen shots, two pages):
> http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
>
> Thank you,
>
> Ben
>
> On Thu, Dec 1, 2011 at 3:07 PM, Ben quant  wrote:
>
>> Here you go:
>>
>> > attach(as.data.frame(l_yx))
>> > range(x[y==1])
>> [1] -22500.746.
>> >  range(x[y==0])
>> [1] -10076.5303653.0228
>>
>> How do I know what is acceptable?
>>
>> Also, here are the screen shots of my data that I tried to send earlier
>> (two screen shots, two pages):
>> http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
>>
>> Thank you,
>>
>> Ben
>>
>>
>> On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard  wrote:
>>
>>>
>>> On Dec 1, 2011, at 21:32 , Ben quant wrote:
>>>
>>> > Thank you for the feedback, but my data looks fine to me. Please tell
>>> me if I'm not understanding.
>>>
>>> Hum, then maybe it really is a case of a transition region being short
>>> relative to the range of your data. Notice that the warning is just that: a
>>> warning. I do notice that the distribution of your x values is rather
>>> extreme -- you stated a range of 0--14 and a mean of 0.01. And after all,
>>> an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.
>>>
>>> Have a look at  range(x[y==0]) and range(x[y==1]).
>>>
>>>
>>> >
>>> > I followed your instructions and here is a sample of the first 500
>>> values : (info on 'd' is below that)
>>> >
>>> > >  d <- as.data.frame(l_yx)
>>> > > x = with(d, y[order(x)])
>>> > > x[1:500] # I have 1's and 0's dispersed throughout
>>> >   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
>>> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
>>> 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
>>> 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
>>> > [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
>>> 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>>> >
>>> > # I get the warning still
>>> > > l_df = as.data.frame(l_yx)
>>> > > l_logit = glm(y~x, data=l_df, family=binomial(link="logit"))
>>> >
>>> > Warning message:
>>> > glm.fit: fitted probabilities numerically 0 or 1 occurred
>>> >
>>> > # some info on 'd' above:
>>> >
>>> > > d[1:500,1]
>>> >   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
&g

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Oops! Please ignore my last post. I mistakenly gave you different data I
was testing with. This is the correct data:

Here you go:

> attach(as.data.frame(l_yx))
>  range(x[y==0])
[1]  0.0 14.66518
> range(x[y==1])
[1]  0.0 13.49791

How do I know what is acceptable?

Also, here are the screen shots of my data that I tried to send earlier
(two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you,

Ben

On Thu, Dec 1, 2011 at 3:07 PM, Ben quant  wrote:

> Here you go:
>
> > attach(as.data.frame(l_yx))
> > range(x[y==1])
> [1] -22500.746.
> >  range(x[y==0])
> [1] -10076.5303653.0228
>
> How do I know what is acceptable?
>
> Also, here are the screen shots of my data that I tried to send earlier
> (two screen shots, two pages):
> http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
>
> Thank you,
>
> Ben
>
>
> On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard  wrote:
>
>>
>> On Dec 1, 2011, at 21:32 , Ben quant wrote:
>>
>> > Thank you for the feedback, but my data looks fine to me. Please tell
>> me if I'm not understanding.
>>
>> Hum, then maybe it really is a case of a transition region being short
>> relative to the range of your data. Notice that the warning is just that: a
>> warning. I do notice that the distribution of your x values is rather
>> extreme -- you stated a range of 0--14 and a mean of 0.01. And after all,
>> an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.
>>
>> Have a look at  range(x[y==0]) and range(x[y==1]).
>>
>>
>> >
>> > I followed your instructions and here is a sample of the first 500
>> values : (info on 'd' is below that)
>> >
>> > >  d <- as.data.frame(l_yx)
>> > > x = with(d, y[order(x)])
>> > > x[1:500] # I have 1's and 0's dispersed throughout
>> >   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
>> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
>> 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
>> 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
>> > [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
>> 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>> >
>> > # I get the warning still
>> > > l_df = as.data.frame(l_yx)
>> > > l_logit = glm(y~x, data=l_df, family=binomial(link="logit"))
>> >
>> > Warning message:
>> > glm.fit: fitted probabilities numerically 0 or 1 occurred
>> >
>> > # some info on 'd' above:
>> >
>> > > d[1:500,1]
>> >   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> > > d[1:500,2]
>> >   [1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.6083

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Here you go:

> attach(as.data.frame(l_yx))
> range(x[y==1])
[1] -22500.746.
>  range(x[y==0])
[1] -10076.5303653.0228

How do I know what is acceptable?

Also, here are the screen shots of my data that I tried to send earlier
(two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you,

Ben

On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard  wrote:

>
> On Dec 1, 2011, at 21:32 , Ben quant wrote:
>
> > Thank you for the feedback, but my data looks fine to me. Please tell me
> if I'm not understanding.
>
> Hum, then maybe it really is a case of a transition region being short
> relative to the range of your data. Notice that the warning is just that: a
> warning. I do notice that the distribution of your x values is rather
> extreme -- you stated a range of 0--14 and a mean of 0.01. And after all,
> an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.
>
> Have a look at  range(x[y==0]) and range(x[y==1]).
>
>
> >
> > I followed your instructions and here is a sample of the first 500
> values : (info on 'd' is below that)
> >
> > >  d <- as.data.frame(l_yx)
> > > x = with(d, y[order(x)])
> > > x[1:500] # I have 1's and 0's dispersed throughout
> >   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
> 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
> 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
> > [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
> 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
> >
> > # I get the warning still
> > > l_df = as.data.frame(l_yx)
> > > l_logit = glm(y~x, data=l_df, family=binomial(link="logit"))
> >
> > Warning message:
> > glm.fit: fitted probabilities numerically 0 or 1 occurred
> >
> > # some info on 'd' above:
> >
> > > d[1:500,1]
> >   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> > > d[1:500,2]
> >   [1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.608374e-01
> 0.00e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02
> 2.080511e-02 1.642404e-01 3.703720e-03  8.901981e-02 1.260415e-03
> >  [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03
> 0.00e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02
> 2.954641e-04 1.434859e-03 6.964976e-04 0.00e+00 1.202162e-02
> >  [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04
> 1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01
> 3.293581e-02 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02
> >  [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02
> 2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02
> 0.00e+00 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04
> >  [61] 5.258664e-02 1.213126e-02 1.493275e-02 8.152657e-03 1.7

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
e-02 6.457757e-03 2.942391e-04 1.628352e-02
8.288831e-03 3.170856e-04 1.251331e+00 1.706954e-02 1.063723e-03
[181] 1.374416e-02 2.140507e-02 2.817009e-02 2.272793e-02 4.365562e-02
6.089414e-03 2.498083e-02 1.360471e-02 1.884079e-02 1.448660e-02
2.341314e-02 8.167064e-03 4.109117e-02 2.660633e-02 7.711723e-03
[196] 9.590278e-03 2.515490e-03 1.978033e-02 3.454990e-02 8.072748e-03
4.718885e-03 1.621131e-01 4.547743e-03 1.081195e-02 9.572051e-04
1.790391e-02 1.618026e-02 1.910230e-02 1.861914e-02 3.485475e-02
[211] 2.844890e-03 1.866889e-02 1.378208e-02 2.451514e-02 2.535044e-03
3.921364e-04 1.557266e-03 3.315892e-03 1.752821e-03 6.786187e-03
1.360921e-02 9.550702e-03 8.114506e-03 5.068741e-03 1.729822e-02
[226] 1.902033e-02 8.196564e-03 2.632880e-03 1.587969e-02 8.354079e-04
1.050023e-03 4.236195e-04 9.181120e-03 4.995919e-04 1.092234e-02
1.207544e-02 2.187243e-01 3.251349e-02 1.269134e-03 1.557751e-04
[241] 1.232498e-02 2.654449e-02 1.049324e-03 8.442729e-03 6.331691e-03
1.715609e-02 1.017800e-03 9.230006e-03 1.331373e-02 5.596195e-02
1.296551e-03 5.272687e-03 2.805640e-02 4.790665e-02 2.043011e-02
[256] 1.047226e-02 1.866499e-02 9.323001e-03 8.920536e-03 1.582911e-03
2.776238e-03 2.914762e-02 4.402356e-03 9.555274e-04 1.681966e-03
7.584319e-04 6.758914e-02 1.505431e-02 2.213308e-02 1.329330e-02
[271] 7.284363e-03 2.687818e-02 2.997535e-03 7.470007e-03 2.070569e-03
3.441944e-02 1.717768e-02 4.523364e-02 1.003558e-02 1.365111e-02
1.906845e-02 1.676223e-02 3.506809e-04 9.164257e-02 9.008416e-03
[286] 1.073903e-02 4.855937e-03 8.618043e-03 2.529247e-02 1.059375e-02
5.834253e-03 2.004309e-02 1.460387e-02 2.899190e-02 5.867984e-03
1.983956e-02 6.834339e-03 1.925821e-03 9.231870e-03 6.839616e-03
[301] 1.029972e-02 2.009769e-02 9.458785e-03 1.183901e-02 8.911549e-03
1.264745e-02 2.995451e-03 7.657983e-04 5.315853e-03 1.325039e-02
1.044103e-02 2.307236e-02 2.780789e-02 1.735145e-02 9.053126e-03
[316] 5.847638e-02 3.815715e-03 5.087690e-03 1.040513e-02 4.475672e-02
6.564791e-02 3.233571e-03 1.076193e-02 8.283819e-02 5.370256e-03
3.533256e-02 1.302812e-02 1.896783e-02 2.055282e-02 3.572239e-03
[331] 5.867681e-03 5.864974e-04 9.715807e-03 1.665469e-02 5.082044e-02
3.547168e-03 3.069631e-03 1.274717e-02 1.858226e-03 3.104809e-04
1.247831e-02 2.073575e-03 3.544110e-04 7.240736e-03 8.452117e-05
[346] 8.149151e-04 4.942461e-05 1.142303e-03 6.265512e-04 3.666717e-04
3.244669e-02 7.242018e-03 6.335951e-04 2.329072e-02 3.719716e-03
2.803425e-02 1.623981e-02 6.387102e-03 8.807679e-03 1.214914e-02
[361] 6.699341e-03 1.148082e-02 1.329736e-02 1.537364e-03 2.004390e-02
1.562065e-02 1.655465e-02 9.960172e-02 2.174588e-02 1.209472e-02
2.328413e-02 2.012760e-04 1.422327e-02 2.194455e-03 2.307362e-02
[376] 4.315764e-03 3.208576e-02 3.826598e-02 1.828001e-02 3.935978e-03
5.294211e-04 1.392423e-02 6.588394e-03 1.040147e-03 1.260787e-02
9.051757e-04 5.353215e-02 6.049058e-02 1.382630e-01 1.064124e-01
[391] 3.380742e-03 1.798038e-02 1.557048e-01 1.217146e-02 4.140520e-02
4.707564e-02 2.786042e-02 8.836988e-03 5.542879e-03 1.862664e-02
8.858770e-03 1.026681e-03 1.692105e-02 8.849238e-03 7.143816e-03
[406] 1.630118e-02 1.165920e-01 9.471496e-03 4.879998e-02 1.388216e-02
1.453267e-02 4.845224e-04 1.415190e-03 1.208627e-02 1.372348e-02
2.573131e-02 1.169595e-02 1.825447e-02 2.574299e-02 5.301360e-02
[421] 6.961110e-03 7.781891e-03 1.013308e-03 3.160916e-03 1.090344e-02
1.530841e-02 9.398088e-04 9.143726e-04 1.286683e-02 2.006193e-02
1.774378e-02 5.681591e-02 9.584676e-03 7.957152e-02 4.485609e-03
[436] 1.086684e-02 2.930273e-03 6.085481e-03 4.342320e-03 1.31e-02
2.120402e-02 4.477545e-02 1.991814e-02 8.893947e-03 7.790133e-03
1.610199e-02 2.441280e-02 2.781231e-03 1.410080e-02 1.639912e-02
[451] 1.797498e-02 1.185382e-02 2.775063e-02 3.797315e-02 1.428883e-02
1.272659e-02 2.390500e-03 7.503478e-03 8.965356e-03 2.139452e-02
2.028536e-02 6.916416e-02 1.615986e-02 4.837412e-02 1.561731e-02
[466] 7.130332e-03 9.208406e-05 1.099934e-02 2.003469e-02 1.395857e-02
9.883482e-03 4.110852e-02 1.202052e-02 2.833039e-02 1.233236e-02
2.145801e-02 7.900161e-03 4.663819e-02 4.410819e-03 5.115056e-04
[481] 9.100270e-04 4.013683e-03 1.227139e-02 3.304697e-03 2.919099e-03
6.112390e-03 1.99e-02 1.208282e-03 1.164037e-02 2.166888e-02
4.381615e-02 5.318929e-03 7.226343e-03 2.732819e-02 2.385092e-04
[496] 4.905250e-02 1.159876e-02 4.068228e-03 3.349013e-02 1.273468e-03


Thanks for your help,

Ben

On Thu, Dec 1, 2011 at 11:55 AM, peter dalgaard  wrote:

>
> On Dec 1, 2011, at 18:54 , Ben quant wrote:
>
> > Sorry if this is a duplicate: This is a re-post because the pdf's
> mentioned
> > below did not go through.
>
> Still not there. Sometimes it's because your mailer doesn't label them
> with the appropriate mime-type (e.g. as application/octet-stream, which is
> "arbitrary binary"). Anyways, see below
>
> [snip]
> >
> > With the above data I do:
> >>l_log

[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Sorry if this is a duplicate: This is a re-post because the pdf's mentioned
below did not go through.

Hello,

I'm new'ish to R, and very new to glm. I've read a lot about my issue:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

...including:

http://tolstoy.newcastle.edu.au/R/help/05/07/7759.html
http://r.789695.n4.nabble.com/glm-fit-quot-fitted-probabilities-numerically-0-or-1-occurred-quot-td849242.html
(note that I never found: "MASS4 pp.197-8"  However, Ted's post was quite
helpful.)

This is a common question, sorry. Because it is a common issue I am posting
everything I know about the issue and how I think I am not falling into the
same trap at the others (but I must be due to some reason I am not yet
aware of).

>From the two links above I gather that my warning "glm.fit: fitted
probabilities numerically 0 or 1 occurred" arises from a "perfect fit"
situation (i.e. the issue where all the high value x's (predictor
variables) are Y=1 (response=1) or the other way around). I don't feel my
data has this issue. Please point out how it does!

The list post instructions state that I can attach pdf's, so I attached
plots of my data right before I do the call to glm.

The attachments are plots of my data stored in variable l_yx (as can be
seen in the axis names):
My response (vertical axis) by row index (horizontal axis):
 plot(l_yx[,1],type='h')
My predictor variable (vertical axis) by row index index (horizontal axis):
 plot(l_yx[,2],type='h')

 So here is more info on my data frame/data (in case you can't see my pdf
attachments):
> unique(l_yx[,1])
[1] 0 1
> mean(l_yx[,2])
[1] 0.01123699
> max(l_yx[,2])
[1] 14.66518
> min(l_yx[,2])
[1] 0
> attributes(l_yx)
$dim
[1] 690303  2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "y" "x"


With the above data I do:
> l_logit = glm(y~x, data=as.data.frame(l_yx),
family=binomial(link="logit"))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Why am I getting this warning when I have data points of varying values for
y=1 and y=0?  In other words, I don't think I have the linear separation
issue discussed in one of the links I provided.

PS - Then I do this and I get a odds ratio a crazy size:
> l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds
$coefficients[2]
> l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the
coeffcients
> l_exp_coef
   x
3161.781

So for one unit increase in the predictor variable I get 3160.781%
(3161.781 - 1 = 3160.781) increase in odds? That can't be correct either.
How do I correct for this issue? (I tried multiplying the predictor
variables by a constant and the odds ratio goes down, but the warning above
still persists and shouldn't the odds ratio be predictor variable size
independent?)

Thank you for your help!

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] variable types - logistic regression

2011-11-25 Thread Ben quant
Hello,

Is there an example out there that shows how to treat each of the predictor
variable types when doing logistic regression in R? Something like this:

glm(y~x1+x2+x3+x4, data=mydata, family=binomial(link="logit"),
na.action=na.pass)

I'm drawing mostly from:
http://www.ats.ucla.edu/stat/r/dae/logit.htm

...but there are only two types of variable in the example given. I'm
wondering if the answer is that easy or if I have to consider more with
different types of variables. It seems like as.factor() is doing a lot of
the organization for me.

I will need to understand how to perform logistic regression in R on all
data types all in the same model (potentially).

As it stands, I think I can solve all of my data type issues with:

as.factor(x,ordered=T) ...for all discrete ordinal variables
as.factor(x, ordered=F) ...for all discrete nominal variables
...and do nothing for everything else.

I'm pretty sure its not that simple because of some other posts I've seen,
but I haven't seen a post that discusses ALL data types in logistic
regression.

Here is what I think will work at this point:

glm(y ~ **all_other_vars + as.factor(disc_ord_var,ordered=T) +
as.factor(disc_nom_var,ordered=F), data=mydata,
family=binomial(link="logit"), na.action=na.pass)

I'm also looking for any best practices help as well. I'm new'ish to
R...and oddly enough I haven't had the pleasure of doing much regression R
yet.

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] zeros to NA's - faster

2011-11-23 Thread Ben quant
Hello,

Is there a faster way to do this? Basically, I'd like to NA all values in
all_data if there are no 1's in the same column of the other matrix, iu.
Put another way, I want to replace values in the all_data columns if values
in the same column in iu are all 0. This is pretty slow for me, but works:

 all_data = matrix(c(1:9),3,3)
 colnames(all_data) = c('a','b','c')
> all_data
 a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
 iu = matrix(c(1,0,0,0,1,0,0,0,0),3,3)
 colnames(iu) = c('a','b','c')
> iu
 a b c
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 0

fun = function(x,d){
  vals = d[,x]
  i = iu[,x]
  if(!any(i==1)){
vals = rep(NA,times=length(vals))
  }else{
vals
  }
  vals
}
all_data = sapply(colnames(iu),fun,all_data)
> all_data
 a b  c
[1,] 1 4 NA
[2,] 2 5 NA
[3,] 3 6 NA

...again, this work, but is slow for a large number of columns. Have
anything faster?

Thanks,

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] activate console

2011-11-16 Thread Ben quant
Perfect, thanks! Naturally, now I need to resize the console so it doesn't
cover my new plots. I'd like to resize it on the fly (from within the
function) then reset the size to its previous size.

So, curConsoleDims() and resize.console() are a made up functions, but they
demonstrate what I am trying to do:

fun = function(x){
  plot(x)
  bringToTop(-1)#bring console to top (activate console or give console
control) thanks Bert!
  dims = curConsoleDims()
  on.exit(dims)
  resize.console(width=100,height=100)
}
fun(1:4)

Thanks!

Ben

On Wed, Nov 16, 2011 at 10:37 AM, Bert Gunter wrote:

> ??focus   ## admittedly, not the first keyword that comes to mind
> ?bringToTop
>
> -- Bert
>
> On Wed, Nov 16, 2011 at 9:07 AM, Ben quant  wrote:
> > Hello,
> >
> > After I plot something how do I reactivate the console (and not the plot
> > window) so I don't have to click on the console each time to go to the
> next
> > command?
> >
> > Example that does not work:
> >
> > fun = function(x){ plot(x); dev.set(dev.prev())}
> > fun(1:4)
> >
> > ...and another that does not work:
> > fun = function(x){ plot(x); dev.set(NULL)}
> > fun(1:4)
> >
> > Again, by 'not work' I mean I can't seem to give control back to the
> > console after I plot. I didn't find anything online.
> >
> > thanks,
> >
> > Ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] activate console

2011-11-16 Thread Ben quant
Hello,

After I plot something how do I reactivate the console (and not the plot
window) so I don't have to click on the console each time to go to the next
command?

Example that does not work:

fun = function(x){ plot(x); dev.set(dev.prev())}
fun(1:4)

...and another that does not work:
fun = function(x){ plot(x); dev.set(NULL)}
fun(1:4)

Again, by 'not work' I mean I can't seem to give control back to the
console after I plot. I didn't find anything online.

thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multi-line query

2011-11-08 Thread Ben quant
Because I don't know anything about sqldf. :)

Here is what happens, but I"m sure it is happening because I didn't read
the manual yet:

> s <- sqldf('create table r.dat("id" int primary key,"val" int)')
Error in ls(envir = envir, all.names = private) :
  invalid 'envir' argument
Error in !dbPreExists : invalid argument type

ben

On Tue, Nov 8, 2011 at 10:41 AM, jim holtman  wrote:

> Why not just send it in as is.  I use SQLite (via sqldf) and here is
> the way I write my SQL statements:
>
>inRange <- sqldf('
>select t.*
>, r.start
>, r.end
>from total t, commRange r
>where t.comm = r.comm and
>t.loc between r.start and r.end and
>t.loc != t.new
>')
>
> On Tue, Nov 8, 2011 at 11:43 AM, Ben quant  wrote:
> > Hello,
> >
> > I'm using package RpgSQL. Is there a better way to create a multi-line
> > query/character string? I'm looking for less to type and readability.
> >
> > This is not very readable for large queries:
> > s <-  'create table r.BOD("id" int primary key,"name" varchar(12))'
> >
> > I write a lot of code, so I'm looking to type less than this, but it is
> > more readable from and SQL standpoint:
> > s <- gsub("\n", "", 'create table r.BOD(
> > "id" int primary key
> > ,"name" varchar(12))
> > ')
> >
> > How it is used:
> > dbSendUpdate(con, s)
> >
> > Regards,
> >
> > Ben
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dbWriteTable with field data type

2011-11-08 Thread Ben quant
Hello,

When I do:

dbWriteTable(con, "r.BOD", cbind(row_names = rownames(BOD), BOD))

...can I specify the data types such as varchar(12), float, double
precision, etc. for each of the fields/columns?

If not, what is the best way to create a table with specified field data
types (with the RpgSQL package/R)?

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multi-line query

2011-11-08 Thread Ben quant
Hello,

I'm using package RpgSQL. Is there a better way to create a multi-line
query/character string? I'm looking for less to type and readability.

This is not very readable for large queries:
s <-  'create table r.BOD("id" int primary key,"name" varchar(12))'

I write a lot of code, so I'm looking to type less than this, but it is
more readable from and SQL standpoint:
s <- gsub("\n", "", 'create table r.BOD(
"id" int primary key
,"name" varchar(12))
')

How it is used:
dbSendUpdate(con, s)

Regards,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RpgSQL row names

2011-11-08 Thread Ben quant
This is great, thanks!

I have another unrelated question. I'll create a new email for that one.

ben

On Mon, Nov 7, 2011 at 4:16 PM, Gabor Grothendieck
wrote:

> On Mon, Nov 7, 2011 at 5:34 PM, Ben quant  wrote:
> > Hello,
> >
> > Using the RpgSQL package, there must be a way to get the row names into
> the
> > table automatically. In the example below, I'm trying to get rid of the
> > cbind line, yet have the row names of the data frame populate a column.
> >
> >> bentest = matrix(1:4,2,2)
> >> dimnames(bentest) = list(c('ra','rb'),c('ca','cb'))
> >> bentest
> >   ca cb
> > ra  1  3
> > rb  2  4
> >> bentest = cbind(item_name=rownames(bentest),bentest)
> >> dbWriteTable(con, "r.bentest", bentest)
> > [1] TRUE
> >> dbGetQuery(con, "SELECT * FROM r.bentest")
> >  item_name ca cb
> > 1ra  1  3
> > 2rb  2  4
> >
> >
>
> The RJDBC based drivers currently don't support that. You can create a
> higher level function that does it.
>
> dbGetQuery2 <- function(...) {
>  out <- dbGetQuery(...)
>  i <- match("row_names", names(out), nomatch = 0)
>  if (i > 0) {
>rownames(out) <- out[[i]]
>out <- out[-1]
>  }
>  out
> }
>
> rownames(BOD) <- letters[1:nrow(BOD)]
> dbWriteTable(con, "BOD", cbind(row_names = rownames(BOD), BOD))
> dbGetQuery2(con, "select * from BOD")
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RpgSQL row names

2011-11-07 Thread Ben quant
Hello,

Using the RpgSQL package, there must be a way to get the row names into the
table automatically. In the example below, I'm trying to get rid of the
cbind line, yet have the row names of the data frame populate a column.

> bentest = matrix(1:4,2,2)
> dimnames(bentest) = list(c('ra','rb'),c('ca','cb'))
> bentest
   ca cb
ra  1  3
rb  2  4
> bentest = cbind(item_name=rownames(bentest),bentest)
> dbWriteTable(con, "r.bentest", bentest)
[1] TRUE
> dbGetQuery(con, "SELECT * FROM r.bentest")
  item_name ca cb
1ra  1  3
2rb  2  4


Thanks,
Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RpgSQL vs RPostgreSQL

2011-11-03 Thread Ben quant
Hello,

Could someone who has experience with or knowledge regarding both
RPostgreSQL and RpgSQL packages provide some feedback? Thanks! I am most
interested in hearing from people who have knowledge regarding both
packages, not just one.

The only real difference I can see is that RpgSQL has a Java dependency,
which I am not apposed to if it provides some added benefit...otherwise I
will probably use the RPostgreSQL package. Both packages look to be
maintained still.

I have skimmed over both of these links:
http://cran.r-project.org/web/packages/RpgSQL/RpgSQL.pdf
http://cran.r-project.org/web/packages/RPostgreSQL/RPostgreSQL.pdf

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] preceding X. and X

2011-10-27 Thread Ben quant
I think it is what I want. The values look OK. I do get a warning. Here is
what you asked for:

> dat=read.csv(file_path, header=F)
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote,  :
  incomplete final line found by readTableHeader on 'y:\ALL
STRATEGIES\INVEST-TRADING\Zacks\RCsvData\VarPortRtns.csv'
> str(dat)
'data.frame':   1 obs. of  251 variables:
 $ V1  : num 0
 $ V2  : num -0.24
 $ V3  : num 0.355
 $ V4  : num -0.211
 $ V5  : num 1.18
 $ V6  : num -0.228
 $ V7  : num 0.748
 $ V8  : num -1.05
 $ V9  : num 0.566
 $ V10 : num -0.184
 $ V11 : num -0.693

...etc

> dat
  V1 V2V3   V4  V5 V6V7
V8V9V10V11V12V13   V14
V15 V16   V17   V18 V19V20
1  0 -0.2404324 0.3554681 -0.21147 1.18128 -0.2279752 0.7483484 -1.049239
0.5660589 -0.1840771 -0.6933892 -0.4749619 -0.5575557 0.3741988 0.5606208
-0.06091273 0.6453374 0.3537118 -0.09991461 -0.3207118
V21   V22   V23   V24   V25   V26
V27V28   V29V30V31   V32V33
V34V35   V36V37V38   V39
1 0.3321949 0.5137949 0.2281249 0.1412501 0.8793599 0.7216529 -1.147237
-0.0871542 0.3021812 0.05944923 -0.2407231 0.3589717 -0.4295433 -0.07399785
-0.1088064 0.6161071 -0.2026699 -0.2004506 0.1542169
  V40  V41V42V43   V44V45
V46V47V48 V49   V50  V51   V52
V53V54   V55V56V57V58
1 -0.06293466 1.160775 -0.1009804 -0.4574294 0.1289299 -0.1434428 -1.090505
-0.2702305 -0.4381005 -0.0691 0.1836706 0.728843 0.2219863 -0.7939716
-0.2580837 0.4604682 -0.6085527 -0.1102456 -0.2470147
 V5

...etc...

Ben

On Thu, Oct 27, 2011 at 1:37 PM, Nordlund, Dan (DSHS/RDA) <
nord...@dshs.wa.gov> wrote:

> > -Original Message-
> > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> > project.org] On Behalf Of Ben quant
> > Sent: Thursday, October 27, 2011 12:26 PM
> > To: r-help@r-project.org
> > Subject: Re: [R] preceding X. and X
> >
> > Figured it out. Solution:
> > dat=read.csv(file_path, header=F)
> > > dat
> >   V1 V2V3   V4  V5 V6V7
> > V8V9V10V11V12V13   V14
> > V15 V16   V17   V18 V19V20
> > 1  0 -0.2404324 0.3554681 -0.21147 1.18128 -0.2279752 0.7483484 -
> > 1.049239
> > 0.5660589 -0.1840771 -0.6933892 -0.4749619 -0.5575557 0.3741988
> > 0.5606208
> > -0.06091273 0.6453374 0.3537118 -0.09991461 -0.3207118
> > V21   V22   V23   V24   V25   V26
> > V27V28   V29V30V31   V32V33
> > V34V35   V36V37V38   V39
> > 1 0.3321949 0.5137949 0.2281249 0.1412501 0.8793599 0.7216529 -1.147237
> > -0.0871542 0.3021812 0.05944923 -0.2407231 0.3589717 -0.4295433 -
> > 0.07399785
> > -0.1088064 0.6161071 -0.2026699 -0.2004506 0.1542169
> >   V40  V41V42V43   V44V45
> > V46V47V48 V49   V50  V51   V52
> > V53V54   V55V56V57V58
> > 1 -0.06293466 1.160775 -0.1009804 -0.4574294 0.1289299 -0.1434428 -
> > 1.090505
> > -0.2702305 -0.4381005 -0.0691 0.1836706 0.728843 0.2219863 -
> > 0.7939716
> > -0.2580837 0.4604682 -0.6085527 -0.1102456 -0.2470147
> >  V59V60  V61V62   V63V64
> > V65  V66  V67V68   V69   V70   V71
> > V72V73   V74   V75V76V77
> > 1 -0.1374674 0.05771337 0.615591 -0.2103958 0.3729799 -0.7636618
> > 1.222489
> > 1.175414 1.349652 -0.0653956 0.4461732 0.7385489 0.4267874 -0.4099944
> > -0.4456437 0.1310654 0.5912901 0.03645256 -0.1760742
> > V78   V79   V80
> >
> > Thanks,
> > Ben
> >
> > On Thu, Oct 27, 2011 at 1:12 PM, Justin Haynes 
> > wrote:
> >
> > > Id look at the actual csv file.  I assume it has the X there also.
> > > sounds like a good candidate for some data munging tools first before
> > > you bring it into R.  also ?str of the data would be helpful. My
> > first
> > > guess is those are all being read as column names.  Were they data in
> > > the data.frame dat the should be quoted:
> > >
> > > > dat<-c('X0.0','X.0.24','X0.35','X.0.211')
> > > > dat
> > > [1] "X0.0""X.0.24&quo

Re: [R] preceding X. and X

2011-10-27 Thread Ben quant
Figured it out. Solution:
dat=read.csv(file_path, header=F)
> dat
  V1 V2V3   V4  V5 V6V7
V8V9V10V11V12V13   V14
V15 V16   V17   V18 V19V20
1  0 -0.2404324 0.3554681 -0.21147 1.18128 -0.2279752 0.7483484 -1.049239
0.5660589 -0.1840771 -0.6933892 -0.4749619 -0.5575557 0.3741988 0.5606208
-0.06091273 0.6453374 0.3537118 -0.09991461 -0.3207118
V21   V22   V23   V24   V25   V26
V27V28   V29V30V31   V32V33
V34V35   V36V37V38   V39
1 0.3321949 0.5137949 0.2281249 0.1412501 0.8793599 0.7216529 -1.147237
-0.0871542 0.3021812 0.05944923 -0.2407231 0.3589717 -0.4295433 -0.07399785
-0.1088064 0.6161071 -0.2026699 -0.2004506 0.1542169
  V40  V41V42V43   V44V45
V46V47V48 V49   V50  V51   V52
V53V54   V55V56V57V58
1 -0.06293466 1.160775 -0.1009804 -0.4574294 0.1289299 -0.1434428 -1.090505
-0.2702305 -0.4381005 -0.0691 0.1836706 0.728843 0.2219863 -0.7939716
-0.2580837 0.4604682 -0.6085527 -0.1102456 -0.2470147
 V59V60  V61V62   V63V64
V65  V66  V67V68   V69   V70   V71
V72V73   V74   V75V76V77
1 -0.1374674 0.05771337 0.615591 -0.2103958 0.3729799 -0.7636618 1.222489
1.175414 1.349652 -0.0653956 0.4461732 0.7385489 0.4267874 -0.4099944
-0.4456437 0.1310654 0.5912901 0.03645256 -0.1760742
V78   V79   V80

Thanks,
Ben

On Thu, Oct 27, 2011 at 1:12 PM, Justin Haynes  wrote:

> Id look at the actual csv file.  I assume it has the X there also.
> sounds like a good candidate for some data munging tools first before
> you bring it into R.  also ?str of the data would be helpful. My first
> guess is those are all being read as column names.  Were they data in
> the data.frame dat the should be quoted:
>
> > dat<-c('X0.0','X.0.24','X0.35','X.0.211')
> > dat
> [1] "X0.0""X.0.24"  "X0.35"   "X.0.211"
>
> versus:
>
> > names(dat)<-c('col_one','X.0.44',0.65,'last_col')
> > dat
>  col_oneX.0.44  0.65  last_col
>   "X0.0"  "X.0.24"   "X0.35" "X.0.211"
>
>
>
> However, if you want to use R to clean it up, I'd use the stringr package.
>
> > library(stringr)
>
> > dat<-str_replace(dat,'X.0.','-0.')
> > dat
> [1] "X0.0"   "-0.24"  "X0.35"  "-0.211"
> > dat<-str_replace(dat,'X','')
> > dat
> [1] "0.0""-0.24"  "0.35"   "-0.211"
> > dat<-as.numeric(dat)
> > dat
> [1]  0.000 -0.240  0.350 -0.211
> >
>
> hope that helps,
>
> Justin
>
>
> On Thu, Oct 27, 2011 at 11:47 AM, Ben quant  wrote:
> > Hello,
> >
> > Why do I get preceding "X." (that is a and X followed by a period) for
> > negative numbers and an "X" for positive numbers when I read a csv file?
> Am
> > I stuck with this? If so, how do I convert it to normal numbers?
> >
> > dat=read.csv(file_path)
> >
> >> dat
> >  [1] X0.0   X.0.240432350374   X0.355468069625
> > X.0.211469972378   X1.1812797415  X.0.227975150826   X0.74834842067
> > X.1.04923922494X0.566058942902X.0.184077147931
> >  [11] X.0.693389240029   X.0.474961946724   X.0.557555716654
> > X0.374198813899X0.560620781209X.0.0609127295732  X0.645337364133
> > X0.353711785227X.0.0999146114953  X.0.320711825714
> >  [21] X0.332194935294X0.513794862516X0.228124868198
> > X0.141250108666X0.879359879038X0.721652892103X.1.14723732497
> > X.0.0871541975062  X0.302181204959X0.0594492294833
> >  [31] X.0.240723094394   X0.358971714966X.0.42954330242
> > X.0.0739978455876  X.0.108806367787   X0.616107131373X.0.202669947993
> > X.0.200450609711   X0.15421692014 X.0.0629346641528
> >  [41] X1.16077454571 X.0.100980386545   X.0.457429357325
> > X0.128929934631X.0.143442822494   X.1.09050490567X.0.270230489547
> > X.0.438100470791   X.0.069111547  X0.18367056566
> >  [51] X0.728842996177X0.221986311856X.0.793971624503
> > X.0.258083713185   X0.460468157809X.0.608552686527   X.0.11024558138
> > X.0.247014689522   X.0.137467423146   X0.0577133684917
> >  [61] X0.615590960098X.0.210395786553   X0.372979876654
> > X.0.763661795812   X1.22248872639  

[R] preceding X. and X

2011-10-27 Thread Ben quant
Hello,

Why do I get preceding "X." (that is a and X followed by a period) for
negative numbers and an "X" for positive numbers when I read a csv file? Am
I stuck with this? If so, how do I convert it to normal numbers?

dat=read.csv(file_path)

> dat
  [1] X0.0   X.0.240432350374   X0.355468069625
X.0.211469972378   X1.1812797415  X.0.227975150826   X0.74834842067
X.1.04923922494X0.566058942902X.0.184077147931
 [11] X.0.693389240029   X.0.474961946724   X.0.557555716654
X0.374198813899X0.560620781209X.0.0609127295732  X0.645337364133
X0.353711785227X.0.0999146114953  X.0.320711825714
 [21] X0.332194935294X0.513794862516X0.228124868198
X0.141250108666X0.879359879038X0.721652892103X.1.14723732497
X.0.0871541975062  X0.302181204959X0.0594492294833
 [31] X.0.240723094394   X0.358971714966X.0.42954330242
X.0.0739978455876  X.0.108806367787   X0.616107131373X.0.202669947993
X.0.200450609711   X0.15421692014 X.0.0629346641528
 [41] X1.16077454571 X.0.100980386545   X.0.457429357325
X0.128929934631X.0.143442822494   X.1.09050490567X.0.270230489547
X.0.438100470791   X.0.069111547  X0.18367056566
 [51] X0.728842996177X0.221986311856X.0.793971624503
X.0.258083713185   X0.460468157809X.0.608552686527   X.0.11024558138
X.0.247014689522   X.0.137467423146   X0.0577133684917
 [61] X0.615590960098X.0.210395786553   X0.372979876654
X.0.763661795812   X1.22248872639 X1.17541364078 X1.34965201031
X.0.0653956005331  X0.446173249776X0.738548926264
 [71] X0.426787360705X.0.409994430265   X.0.445643675958   etc...

Thanks

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R.oo package, inherit two classes

2011-10-27 Thread Ben quant
Hello,

How do I inherit two classes using the R.oo package. Below is kind of a
silly example, but I am trying to create class PerDog from classes Dog and
Person. Error at bottom. I've tried a few other ways of using extend(), but
nothing seems to get me what I want.

Example:

setConstructorS3("Person", function(age=NA) {
 this = extend(Object(), "Person",
.age=age
  )
  this
})
setMethodS3("getAge", "Person", function(this, ...) {
  this$.age;
})
setMethodS3("setAge", "Person", function(this,num, ...) {
  this$.age = num;
})
# ..
setConstructorS3("Dog", function(dog_age=NA) {
 this = extend(Object(), "Dog",
.dog_age=dog_age
  )
  this
})
setMethodS3("getDogAge", "Dog", function(this, ...) {
  this$.dog_age;
})
setMethodS3("setDogAge", "Dog", function(this,num, ...) {
  this$.dog_age = num;
})
#..
setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) {
extend(Person(age=age),Dog(dog_age=dog_age), "PerDog",
.wt=wt
  )
})
setMethodS3("getWeight", "PerDog", function(this, ...) {
  this$.wt;
})
setMethodS3("setWeight", "PerDog", function(this,w, ...) {
  this$.wt = w;
})

> pd = PerDog(67,150,1)
Error in list(`PerDog(67, 150, 1)` = , `extend(Person(age =
age), Dog(dog_age = dog_age), "PerDog", .wt = wt)` = ,  :

[2011-10-27 09:34:06] Exception: Missing name of field #1 in class
definition: Dog: 0x73880408
  at throw(Exception(...))
  at throw.default("Missing name of field #", k, " in class definition: ",
...className)
  at throw("Missing name of field #", k, " in class definition: ",
...className)
  at extend.Object(Person(age = age), Dog(dog_age = dog_age), "PerDog", .wt
= wt)
  at extend(Person(age = age), Dog(dog_age = dog_age), "PerDog", .wt = wt)
  at PerDog(67, 150, 1)


Three (of many) other things I have tried:

1)
setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) {
  this = extend(extend(Person(age=age), "PerDog"),Dog(dog_age=dog_age),
"PerDog",
  .wt=wt
  )
  this
})

2)
setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) {
  this = extend(Dog(dog_age=dog_age), "PerDog",
  .wt=wt
  )
  this
})
setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) {
  this = extend(Person(age=age), "PerDog",
  .wt=wt
  )
  this
})

3)
setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) {
  this = extend(Dog(dog_age=dog_age), "PerDog",
setConstructorS3("PerDog", function(age=NA,wt=NA,dog_age=NULL) {
  extend(Person(age=age), "PerDog",
  .wt=wt
  )
})
  )

  this
})

Thanks,

ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] textplot in layout

2011-10-25 Thread Ben quant
Perfect, thanks!

ben

On Tue, Oct 25, 2011 at 8:12 AM, Eik Vettorazzi  wrote:

> Hi Ben,
> maybe mtext is of more help here?
>
> par(mar=c(7,3,3,3))
> plot(year,rate,main='main',sub='sub')
> mtext('test',cex=1,side=1,line=5)
> box()
>
> cheers
>
> Am 25.10.2011 15:26, schrieb Ben quant:
> > Hello,
> >
> > Someone (Erik) recently posted about putting text on a plot. That thread
> > didn't help. I'd like to put text directly below the 'sub' text (with no
> > gap). The code below is the best I can do. Note the large undesirable gap
> > between 'sub' and 'test'. I'd like the word 'test' to be just below the
> top
> > box() boarder (directly below 'sub').
> >
> > year <- c(2000 ,   2001  ,  2002  ,  2003 ,   2004)
> > rate <- c(9.34 ,   8.50  ,  7.62  ,  6.93  ,  6.60)
> > op <- par(no.readonly = TRUE)
> > on.exit(par(op))
> > layout(matrix(c(1,2), 2, 1, byrow = TRUE),heights=c(8,1))
> > par(mar=c(5,3,3,3))
> > plot(year,rate,main='main',sub='sub')
> > library(gplots)
> > par(mar=c(0,0,0,0),new=F)
> > textplot('test',valign='top',cex=1)
> > box()
> >
> > Note: I'd rather solve it with textplot. If not, my next stop is
> > grid.text(). Also, the text I am plotting with textplot is much longer so
> a
> > multiple line text plot would solve my next issue (of which I have not
> > looked into yet). Lastly, layout is not necessary. I just used it because
> I
> > thought it would do what I wanted.
> >
> > Thanks,
> >
> > Ben
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> Eik Vettorazzi
>
> Department of Medical Biometry and Epidemiology
> University Medical Center Hamburg-Eppendorf
>
> Martinistr. 52
> 20246 Hamburg
>
> T ++49/40/7410-58243
> F ++49/40/7410-57790
>
> --
> Pflichtangaben gemäß Gesetz über elektronische Handelsregister und
> Genossenschaftsregister sowie das Unternehmensregister (EHUG):
>
> Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen
> Rechts; Gerichtsstand: Hamburg
>
> Vorstandsmitglieder: Prof. Dr. Guido Sauter (Vertreter des Vorsitzenden),
> Dr. Alexander Kirstein, Joachim Prölß, Prof. Dr. Dr. Uwe Koch-Gromus
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] textplot in layout

2011-10-25 Thread Ben quant
Hello,

Someone (Erik) recently posted about putting text on a plot. That thread
didn't help. I'd like to put text directly below the 'sub' text (with no
gap). The code below is the best I can do. Note the large undesirable gap
between 'sub' and 'test'. I'd like the word 'test' to be just below the top
box() boarder (directly below 'sub').

year <- c(2000 ,   2001  ,  2002  ,  2003 ,   2004)
rate <- c(9.34 ,   8.50  ,  7.62  ,  6.93  ,  6.60)
op <- par(no.readonly = TRUE)
on.exit(par(op))
layout(matrix(c(1,2), 2, 1, byrow = TRUE),heights=c(8,1))
par(mar=c(5,3,3,3))
plot(year,rate,main='main',sub='sub')
library(gplots)
par(mar=c(0,0,0,0),new=F)
textplot('test',valign='top',cex=1)
box()

Note: I'd rather solve it with textplot. If not, my next stop is
grid.text(). Also, the text I am plotting with textplot is much longer so a
multiple line text plot would solve my next issue (of which I have not
looked into yet). Lastly, layout is not necessary. I just used it because I
thought it would do what I wanted.

Thanks,

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R.oo package: do setMethodS3 work upon construction

2011-10-24 Thread Ben quant
Hello (Heinrich),

I did not know I could do this. It doesn't seem to be documented anywhere.
Thought this would be helpful to the fraction of the community using package
R.oo. Note the call of a setMethodS3 method, xOne, in the setConstructorS3.
This is extremely useful if xOne (in this case) is a very complex method
(that you always want to be called every time you create a new object). If I
have something wrong please let me know! (I'm about to implement this in a
large'ish program.) Great package for OOP programming!

Example 1:

setConstructorS3("ClassA", function() {
   this = extend(Object(), "ClassA",
   .x=NULL
  )

  this$xOne() #  this is useful!
  this
 })

setMethodS3("xOne", "ClassA", function(this,...) {
  this$.x = 1

})
setMethodS3("getX", "ClassA", function(this,...) {
  this$.x
})


So x is always 1:
> a = ClassA()
> a$x
[1] 1

If you are new to R.oo:  if you only want x to be 1 (I.e. xOne above is
simple) you should do something like this:

Example 2:

setConstructorS3("ClassA", function() {
   this = extend(Object(), "ClassA",
   .x=1
  )
  this
 })

setMethodS3("getX", "ClassA", function(this,...) {
  this$.x
})

> a = ClassA()
> a$x
[1] 1

The following further illustrates what you can do with Example 1 above:

Example 3:

setConstructorS3("ClassA", function() {
   this = extend(Object(), "ClassA",
   .x=NULL,
   .y=1
  )
  this$xOne()
  this$xPlusY()
  this
 })
setMethodS3("xOne", "ClassA", function(this,...) {
  this$.x = 1

})
setMethodS3("xPlusY", "ClassA", function(this,...) {
  this$.x = this$.x + this$.y

})
setMethodS3("getX", "ClassA", function(this,...) {
  this$.x
})

> a = ClassA()
> a$x
[1] 2

Hope that helps!

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >