[R] gdata selectively not working

2013-04-02 Thread Robin Jeffries
I can use gdata to successfully read in the example Excel file, but not any
other excel files. Why might this be the case? It seems that the problem
has something to do with opening the database but no indication as to what
the problem is. So i'm at a loss of how to fix it.


 library(gdata)
gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.

snip

 test - read.xls(C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx,
verbose=T)
Using perl at C:\Perl64\bin\perl.exe
Using perl at C:\Perl64\bin\perl.exe

Converting xls file
“C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx”
to csv  file
“C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv”
...

Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/
xls2csv.pl C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx
C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv 1 '...

Loading 'C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx'...
Done.

Orignal Filename: C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx
Number of Sheets: 4

Writing sheet number 1 ('Sheet First') to file
'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv'
Minrow=0 Maxrow=7 Mincol=0 Maxcol=2

0

Done.

Reading csv file
 “C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv” ...
Done.

This tells me that perl can be found, used, and my local temp directory can
be written/read to just fine. Now to try to read one of my own files.


 test - read.xls(C:/Dropbox/Animals/LARPBO/Database.xlsx, verbose=T)
Using perl at C:\Perl64\bin\perl.exe
Using perl at C:\Perl64\bin\perl.exe

Converting xls file
“C:/Dropbox/Animals/LARPBO/Database.xlsx”
to csv  file
“C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv”
...

Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/
xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx
C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1
'...

Unable to open file 'C:/Dropbox/Animals/LARPBO/Database.xlsx'.
2

Done.

Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  :
  Intermediate file
'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv' missing!
In addition: Warning message:
running command 'C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/
xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx
C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1'
had status 2
Error in file.exists(tfn) : invalid 'file' argument

So it appears that it's a problem with the original Excel file. But there's
nothing that tells me what the problem actually is.

Thanks
-Robin Jeffries

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gdata selectively not working

2013-04-02 Thread Robin Jeffries
T
hank you Paul, that was the problem.

I have installed R into Dropbox and so am aware of access issues, I have to
pause syncing whenever I install a new package. I assumed that would work
here as well. Unfortunately even when DB sync is paused (or turned off) I
still can't read the file into Gdata. If I move it into another location,
say C:/Temp then its fine.

Annoying, but I will have to work around it for now.


-Robin



On Tue, Apr 2, 2013 at 4:41 AM, Paul Johnson pauljoh...@gmail.com wrote:


 On Apr 2, 2013 1:28 AM, Robin Jeffries robin.a.jeffr...@gmail.com
 wrote:
 
  I can use gdata to successfully read in the example Excel file, but not
 any
  other excel files. Why might this be the case? It seems that the problem
  has something to do with opening the database but no indication as to
 what
  the problem is. So i'm at a loss of how to fix it.
 
 

 would you please try this NOT a network share (Dropbox). I suspect File
 access issues cause this.  Lots of details under the hood there.

 Just check most obvious problem first.

 Next will need you give link to xls file in question. The gdata functions
 always work for me...
 Pj

   library(gdata)
  gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
  gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
 
  snip
 
   test -
 read.xls(C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx,
  verbose=T)
  Using perl at C:\Perl64\bin\perl.exe
  Using perl at C:\Perl64\bin\perl.exe
 
  Converting xls file
  “C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx”
  to csv  file
  “C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv”
  ...
 
  Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/
  xls2csv.pl C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx
  C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv 1
 '...
 
  Loading 'C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx'...
  Done.
 
  Orignal Filename: C:/Dropbox/R/library/gdata/xls/ExampleExcelFile.xlsx
  Number of Sheets: 4
 
  Writing sheet number 1 ('Sheet First') to file
  'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv'
  Minrow=0 Maxrow=7 Mincol=0 Maxcol=2
 
  0
 
  Done.
 
  Reading csv file
   “C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bccd743d36.csv” ...
  Done.
 
  This tells me that perl can be found, used, and my local temp directory
 can
  be written/read to just fine. Now to try to read one of my own files.
 
 
   test - read.xls(C:/Dropbox/Animals/LARPBO/Database.xlsx, verbose=T)
  Using perl at C:\Perl64\bin\perl.exe
  Using perl at C:\Perl64\bin\perl.exe
 
  Converting xls file
  “C:/Dropbox/Animals/LARPBO/Database.xlsx”
  to csv  file
  “C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv”
  ...
 
  Executing ' C:\Perl64\bin\perl.exe C:/Dropbox/R/library/gdata/perl/
  xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx
  C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1
  '...
 
  Unable to open file 'C:/Dropbox/Animals/LARPBO/Database.xlsx'.
  2
 
  Done.
 
  Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method,  :
Intermediate file
  'C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv'
 missing!
  In addition: Warning message:
  running command 'C:\Perl64\bin\perl.exe
 C:/Dropbox/R/library/gdata/perl/
  xls2csv.pl C:/Dropbox/Animals/LARPBO/Database.xlsx
  C:\Users\Robin\AppData\Local\Temp\RtmpWkmGgn\file1bcc2cfe7499.csv 1'
  had status 2
  Error in file.exists(tfn) : invalid 'file' argument
 
  So it appears that it's a problem with the original Excel file. But
 there's
  nothing that tells me what the problem actually is.
 
  Thanks
  -Robin Jeffries
 
  [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting with missing data

2012-08-15 Thread Robin Jeffries
Simply put, I want to subset the data frame 'a' where 'y=0'.

 a - as.data.frame(cbind(x=1:10, y=c(1,0,NA,1,0,NA,NA,1,1,0)))
 a
x  y
1   1  1
2   2  0
3   3 NA
4   4  1
5   5  0
6   6 NA
7   7 NA
8   8  1
9   9  1
10 10  0

 names(a)
[1] x y

 table(a$y)
0 1
3 4

 table(a$y, useNA=always)
   01 NA
   343

 b - a[a$y==0,]
 b
  x  y
2 2  0
NA   NA NA
5 5  0
NA.1 NA NA
NA.2 NA NA
10   10  0

 is(a$y)
[1] numeric vector


Instead of only pulling the rows where a$y==0, i'm getting where they're 0,
OR NA. ? Again I feel like either something was changed when I wasn't
looking.. or I'm reaaly forgetting something important.

Thanks,

Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics,
UCLA
530-633-STAT(7828)
rjeffr...@ucla.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Resources for utilizing multiple processors

2011-06-08 Thread Robin Jeffries
Hello,

I know of some various methods out there to utilize multiple processors but
am not sure what the best solution would be. First some things to note:
I'm running dependent simulations, so direct parallel coding is out
(multicore, doSnow, etc).
I'm on Windows, and don't know C. I don't plan on learning C or any of the
*nix languages.

My main concern deals with Multiple analyses on large data sets. By large I
mean that when I'm done running 2 simulations R is using ~3G of RAM, the
remaining ~3G is chewed up when I try to create the Gelman-Rubin statistic
to compare the two resulting samples, grinding the process to a halt. I'd
like to have separate cores simultaneously run each analysis. That will save
on time and I'll have to ponder the BGR calculation problem another way. Can
R temporarily use HD space to write calculations to instead of RAM?

The second concern boils down to whether or not there is a way to split up
dependent simulations. For example at iteration (t) I feed a(t-2) into FUN1
to generate a(t), then feed a(t), b(t-1) and c(t-1) into FUN2 to simulate
b(t) and c(t). I'd love to have one core run FUN1 and another run FUN2, and
better yet, a third to run all the pre-and post- processing tidbits!


So if anyone has any suggestions as to a direction I can look into, it would
be appreciated.


Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-633-STAT(7828)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with one of those apply functions

2011-02-02 Thread Robin Jeffries
Hello there,

I'm still struggling with the *apply commands. I have 5 people with id's
from 10 to 14. I have varying amounts (nrep) of repeated outcome (value)
measured on them.

nrep - 1:5
id- rep(c(p1, p2, p3, p4, p5), nrep)
value - rnorm(length(id))

I want to create a new vector that contains the sum of the values per
person.

subject.value[1] - value[1]# 1 measurement
subject.value[2] - sum(value[2:3]) # the next 2 measurements
...
subject.value[5] - sum(value[11:15])  # the next 5 measurements


I'd imagine it'll be some sort of *apply(value, nrep, sum) but I can't seem
to land on the right format.

Can someone give me a heads up as to what the correct syntax and function
is?

Danke,

Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-624-0428

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with one of those apply functions

2011-02-02 Thread Robin Jeffries
Thanks Steve,

I needed the alternative. tapply worked for my toy example, but it didn't
for my real example. it might be b/c it was in a data frame, but i'm not
sure. Using plyr did work however.


Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-624-0428



On Wed, Feb 2, 2011 at 2:34 PM, Steve Lianoglou 
mailinglist.honey...@gmail.com wrote:

 Hi,

 On Wed, Feb 2, 2011 at 4:08 PM, Robin Jeffries rjeffr...@ucla.edu wrote:
  Hello there,
 
  I'm still struggling with the *apply commands. I have 5 people with id's
  from 10 to 14. I have varying amounts (nrep) of repeated outcome (value)
  measured on them.
 
  nrep - 1:5
  id- rep(c(p1, p2, p3, p4, p5), nrep)
  value - rnorm(length(id))
 
  I want to create a new vector that contains the sum of the values per
  person.
 
  subject.value[1] - value[1]# 1 measurement
  subject.value[2] - sum(value[2:3]) # the next 2 measurements
  ...
  subject.value[5] - sum(value[11:15])  # the next 5 measurements
 
 
  I'd imagine it'll be some sort of *apply(value, nrep, sum) but I can't
 seem
  to land on the right format.
 
  Can someone give me a heads up as to what the correct syntax and function
  is?

 In addition to tapply (as Phil pointed out), you can look at the
 functions in plyr.

 I somehow find them more intuitive, at times, then their sister base
 functions, especially since more often than not you'll have your data
 in a data.frame.

 For instance:

 R set.seed(123)
 R nrep - 1:5
 R id - rep(c(p1, p2, p3, p4, p5), nrep)
 R value - rnorm(length(id))
 R DF - data.frame(id=id, value=value)

 R tapply(value, id, sum)
 p1 p2 p3 p4 p5
 -0.5604756  1.3285308  1.9148611 -1.9366599  1.5395087

 R library(plyr)
 R ddply(DF, .(id), summarize, total=sum(value))
  id  total
 1 p1 -0.5604756
 2 p2  1.3285308
 3 p3  1.9148611
 4 p4 -1.9366599
 5 p5  1.5395087

 In this case, though, I'll grant you that tapply is simpler if you
 already know how to use it.

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: 
 http://cbio.mskcc.org/~lianos/contacthttp://cbio.mskcc.org/%7Elianos/contact


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printing pretty' vectors in Sweave

2011-01-19 Thread Robin Jeffries
Ah! I was always trying collapse with sep and other options. Not by itself.
Perfect!

And yes, that was my bad example.


Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-624-0428



On Tue, Jan 18, 2011 at 10:27 PM, Joshua Wiley jwiley.ps...@gmail.comwrote:

 Hi Robin,

 Have you looked at the 'collapse' argument to paste?
 something like:

 myvec - paste(1:4, collapse = , )

 Might do what you want.  Also maybe ?bquote or the like to get rid of
 quotes possibly (I'm not in a position to try presently).

 Side note, it is really probably best not to use 'c' as a variable name
 since it is such a fundamental function.

 Cheers,

 Josh


 On Jan 18, 2011, at 21:46, Robin Jeffries rjeffr...@ucla.edu wrote:

  I am trying to print a nice looking vector in Sweave.
 
  c - 1:4
 
  I want to see (1, 2, 3, 4) in TeX. .
 
  If I use
  paste(c, ,, sep=)
  I get
  1, 2, 3, 4,
 
  If use cat(c, sep=,)
  I can't seem to assign it to an object,
  1,2,3,4 myvec - cat(c, sep=,)
  1,2,3,4 myvec
  NULL
 
  and if I bypass the object assignment and put
  My vector is (\Sweave{cat(c, sep=,)}). 
  prints out
  My vector is (). 
 
  Suggestions?
 
 
  Robin Jeffries
  MS, DrPH Candidate
  Department of Biostatistics
  UCLA
  530-624-0428
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Printing pretty' vectors in Sweave

2011-01-18 Thread Robin Jeffries
I am trying to print a nice looking vector in Sweave.

c - 1:4

I want to see (1, 2, 3, 4) in TeX. .

If I use
paste(c, ,, sep=)
I get
 1, 2, 3, 4,

If use cat(c, sep=,)
I can't seem to assign it to an object,
1,2,3,4 myvec - cat(c, sep=,)
1,2,3,4 myvec
NULL

and if I bypass the object assignment and put
My vector is (\Sweave{cat(c, sep=,)}). 
prints out
My vector is (). 

Suggestions?


Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-624-0428

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How is MissInfo calculated? (mitools)

2010-11-07 Thread Robin Jeffries
What does missInfo compute and how is it computed?
There is only 1 observation missing the ethnic3 variable. There is no other
missing data.
N=1409

 summary(MIcombine(mod1))

Multiple imputation results:
  with(rt.imp, glm(G1 ~ stdage + female + as.factor(ethnic3) + u,
family = binomial()))

  MIcombine.default(mod1)
results   se
(lower upper)missInfo
(Intercept) -0.408954530.14743928 -0.70805544 -0.1098536
53 %
stdage   0.139913600.06046537  0.02140364
0.2584236  0 %
female  -0.055876350.11083362 -0.27310639
0.1613537  0 %
as.factor(ethnic3)1  0.172978350.19556664 -0.21032531  0.5562820  0
%
as.factor(ethnic3)2  0.635070200.18017975  0.28192410  0.9882163  0
%
u  -0.013229760.18896230 -0.40291914
0.3764596 64 %

Thanks,


Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-624-0428

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] GC verbose=false still showing report

2010-10-09 Thread Robin Jeffries
I must be reading the help file for gc() wrong. I thought it said that
gc(verbose=FALSE) will run the garbage collection without printing the
Ncells/Vcells summary. However, this is what I get:

gc(verbose = FALSE)
 used (Mb) gc trigger  (Mb) max used  (Mb)
Ncells 267097 14.3 531268  28.4   531268  28.4
Vcells 429302  3.3   20829406 159.0 55923977 426.7

I'm embedding this in an Sweave/TeX file, so I *really* can't have
this printing out. Suggestions other than manually editing the TeX
file?

Robin Jeffries
MS, DrPH Candidate
Department of Biostatistics
UCLA
530-624-0428

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] GC verbose=false still showing report

2010-10-09 Thread Robin Jeffries
invisible(gc())

worked perfectly. Thanks Jeff.

@ Josh: I know how to toggle showing/hiding command echos, but I
haven't figured out how to toggle on/off any printed output.




On Sat, Oct 9, 2010 at 5:10 PM, Robin Jeffries rjeffr...@ucla.edu wrote:
 I must be reading the help file for gc() wrong. I thought it said that
 gc(verbose=FALSE) will run the garbage collection without printing the
 Ncells/Vcells summary. However, this is what I get:

 gc(verbose = FALSE)
         used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells 267097 14.3     531268  28.4   531268  28.4
 Vcells 429302  3.3   20829406 159.0 55923977 426.7

 I'm embedding this in an Sweave/TeX file, so I *really* can't have
 this printing out. Suggestions other than manually editing the TeX
 file?

 Robin Jeffries
 MS, DrPH Candidate
 Department of Biostatistics
 UCLA
 530-624-0428


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Lattice xyplots plots with multiple lines per cell

2010-08-13 Thread Robin Jeffries
Hello,

I need to plot the means of some outcome for two groups (control vs
intervention) over time (discrete) on the same plot, for various subsets
such as gender and grade level. What I have been doing is creating all
possible subsets first, using the aggregate function to create the means
 over time, then plotting the means over time (as a simple line plot with
both control  intervention on one plot) for one subset. I then use par()
and repeat this plot for each gender x grade level subset so they all appear
on one page.


This appears to me to be very similar to an xyplot, something like
 mean(outcome) ~ gender + gradelevel. However, I can't figure out how I
could get both control and intervention lines in the same plot.

Any suggestions? What i'm doing now -works-, but just seems to be the long
way around.

-Robin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] simple apply syntax

2010-07-11 Thread Robin Jeffries
I know this is a simple question, but I have yet to master the apply
statements. Any help would be appreciated. 

I have a column of probabilities and sample sizes, I would like to create a
column of binomial random variables using those corresponding probabilities.


 

Eg.

 

mat = as.matrix(cbind(p=runif(10,0,1), n=rep(1:5)))

 

  p n

 [1,] 0.5093493 1

 [2,] 0.4947375 2

 [3,] 0.6753015 3

 [4,] 0.8595729 4

 [5,] 0.1004739 5

 [6,] 0.6292883 1

 [7,] 0.3752004 2

 [8,] 0.6889157 3

 [9,] 0.2435880 4

[10,] 0.9619128 5

 

 

I want to create mat$x as binomial(n, p)

 

Thanks, 

Robin


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting indexes

2010-05-25 Thread Robin Jeffries
Hallo!

I have a vector of ID's like so,
id - c(1,2,2,3,3,3,4,5,5)

I would like to create a [start,stop] pair of vectors that index the first
and last observation per ID.

For the ID list above, it would look like
1 1
2 3
4 6
7 7
8 9

I haven't worked with indexes/data manipulation much in R, so any pointers
would be helpful.

Many thanks!

~~~
-Robin Jeffries
Dr.P.H. Candidate in Biostatistics
UCLA School of Public Health
rjeffr...@ucla.edu
530-624-0428

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting indexes

2010-05-25 Thread Robin Jeffries
Awesome! Thanks:)



On Tue, May 25, 2010 at 9:40 PM, Erik Iverson er...@ccbr.umn.edu wrote:

 Robin Jeffries wrote:

 Hallo!

 I have a vector of ID's like so,
 id - c(1,2,2,3,3,3,4,5,5)

 I would like to create a [start,stop] pair of vectors that index the first
 and last observation per ID.

 For the ID list above, it would look like
 1 1
 2 3
 4 6
 7 7
 8 9


 which(!duplicated(id))
 [1] 1 2 4 7 8

 cumsum(rle(id)$lengths)
 [1] 1 3 6 7 9


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sparse matrices in lme4

2010-05-24 Thread Robin Jeffries
I read somewhere (help list, documentation) that the random effects in lme4
uses sparse matrix technology.

I'd like to confirm with others that I can't use a sparse matrix as a fixed
effect? I'm getting an Invalid type (S4)  error.

Thanks.

~~~
-Robin Jeffries
Dr.P.H. Candidate in Biostatistics
UCLA School of Public Health
rjeffr...@ucla.edu
530-624-0428

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression with sparse matricies

2010-05-22 Thread Robin Jeffries
I would like to run a logistic regression on some factor variables (main
effects and eventually an interaction) that are very sparse. I have a
moderately large dataset, ~100k observations with 1500 factor levels for one
variable (x1) and 600 for another (X2), creating ~19000 levels for the
interaction (X1:X2).

I would like to take advantage of the sparseness in these factors to avoid
using GLM. Actually glm is not an option given the size of the design
matrix.

I have looked through the Matrix package as well as other packages without
much help.

Is there some option, some modification of glm, some way that it will
recognize a sparse matrix and avoid large matrix inversions?

-Robin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Indexing with sparse matrices (SparseM)

2010-05-20 Thread Robin Jeffries
Hello,

I'm working with a very large, very sparse X matrix. Let csr.X - *
as.matrix.csr*(X) as described by the SparseM package.


The documentation says that Indexing  work just like they do on dense
matrices. To me this says that I should be able to perform operations on
the rows of csr.X in the same way I would on X itself. E.g.
f - function(x){
  for (i in 1:n){
u[i] - log(1+exp(t(X[i,])%*%beta))
  }
  sm - sum(u)
  return(sm)
}

However, csr.X[i,] doesn't exist.
Now I get how *as.matrix.csr* coerces X into an object with three arrays,
two indexes and a list of the non-zero data. What I can't quite wrap my
brain around is how I would go about using those indices to perform
iterative operations on the rows of X, for example in my toy function
above.

I'm hoping that someone with more experience working with sparse matrices
can provide a few suggestions or pointers? I'm not hooked on this package
either, it was just the first one I came across via Rseek.


Many thanks,
-Robin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Source.R file from cmd line

2010-05-08 Thread Robin Jeffries
I want to set up a windows system task that will run a .R script at
pre-specified times.

Can someone please help with the command line syntax that I would assign to
the task?

I know that i can open a command prompt, type R, and then source the file,
but I don't know how to pass multiple line arguments to the command line in
a system task.

Thanks,

~~~
-Robin Jeffries
Dr.P.H. Candidate in Biostatistics
UCLA School of Public Health
rjeffr...@ucla.edu
530-624-0428

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obvious reason for not looping twice?

2010-04-26 Thread Robin Jeffries
I do get the following error message:

*Error in lookup.svc[i, j] - svc[svc$st == unique(svc$st)[i]  svc$vc ==
 : *
*  replacement has length zero*


I also thought it might be because of how R treats NA, but then I would
expect the loop to stop at the place of error (i=1, j=2) and not continue to
fill out all of column i.

I've tried using %in%, but that seems to do a non-positional check for
whether or not entries are *somewhere *in those vectors. I need to find the
location of where the match occurs.

My goal is to turn this:

st  vc  y
A   Z   .2
B   Z   .4
B   Y   .3
C   Y   .1
C   X   .8

into a 2x2 table with entries 'y'
   vc
Z   Y   X
 A  .2  0   0
st   B  .4  .3  0
 C  0   .1  .8


Right now it's giving me

   vc
Z   Y   X
 A  .2  0.000   0.000
st   B  0  0  0
 C  0   0  0

So it seems to finish out the row that it's currently on, but then won't
continue to loop.


-Robin



On Sun, Apr 25, 2010 at 4:44 PM, Peter Alspach 
peter.alsp...@plantandfood.co.nz wrote:

 Tena koe Robin

 Do you get an error or warning?

 It may have something to do with how == treats NA:

 x - 1:4
 x[x == 1]
 [1] 1
 x - c(1:4, NA)
 x[x == 1]
 [1]  1 NA
 x[x %in% 1]
 [1] 1

 If so, using %in% is one way to avoid the problem.  However, I would
 have thought you'd get an error message if this were the case.

 HTH .

 Peter Alspach

  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of Robin Jeffries
  Sent: Monday, 26 April 2010 10:26 a.m.
  To: r-help@r-project.org
  Subject: [R] Obvious reason for not looping twice?
 
  Is there an obvious reason why this won't loop to i=2 and beyond?
  There are many combinations of *st*  *vc* that don't exist in svc.
 For
  example, when s=1 there's only an entry at v=1.  That's fine, the
 entry
  can
  stay 0.
 
  lookup.svc -
  array(0,dim=c(length(unique(svc$st)),length(unique(svc$vc))),
  dimnames=list(unique(svc$st), unique(svc$vc)))
 
  for (i in 1:length(unique(svc$st))) {
for (j in 1:length(unique(svc$vc))){
   lookup.svc[i,j] - svc[svc$st == unique(svc$st)[i]  svc$vc ==
  unique(svc$vc)[j], 4]
  }}
 
 
  Thanks,
  Robin
 
  ~~~
  -Robin Jeffries
  Dr.P.H. Candidate
  UCLA School of Public Health
  rjeffr...@ucla.edu
  530-624-0428
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Obvious reason for not looping twice?

2010-04-26 Thread Robin Jeffries
Seriously! That easy!

I kept thinking that xtab would just give me frequencies of how many times
the combination occurred, and not the values themselves.

Thanks!

-Robin


On Mon, Apr 26, 2010 at 7:40 AM, Henrique Dallazuanna www...@gmail.comwrote:

 Try this;

  xtabs(y ~ st + vc, data = x)


 On Mon, Apr 26, 2010 at 11:35 AM, Robin Jeffries rjeffr...@ucla.eduwrote:

 I do get the following error message:

 *Error in lookup.svc[i, j] - svc[svc$st == unique(svc$st)[i]  svc$vc ==
  : *
 *  replacement has length zero*


 I also thought it might be because of how R treats NA, but then I would
 expect the loop to stop at the place of error (i=1, j=2) and not continue
 to
 fill out all of column i.

 I've tried using %in%, but that seems to do a non-positional check for
 whether or not entries are *somewhere *in those vectors. I need to find
 the
 location of where the match occurs.

 My goal is to turn this:

 st  vc  y
 A   Z   .2
 B   Z   .4
 B   Y   .3
 C   Y   .1
 C   X   .8

 into a 2x2 table with entries 'y'
   vc
Z   Y   X
 A  .2  0   0
 st   B  .4  .3  0
 C  0   .1  .8


 Right now it's giving me

   vc
Z   Y   X
 A  .2  0.000   0.000
 st   B  0  0  0
 C  0   0  0

 So it seems to finish out the row that it's currently on, but then won't
 continue to loop.


 -Robin



 On Sun, Apr 25, 2010 at 4:44 PM, Peter Alspach 
 peter.alsp...@plantandfood.co.nz wrote:

  Tena koe Robin
 
  Do you get an error or warning?
 
  It may have something to do with how == treats NA:
 
  x - 1:4
  x[x == 1]
  [1] 1
  x - c(1:4, NA)
  x[x == 1]
  [1]  1 NA
  x[x %in% 1]
  [1] 1
 
  If so, using %in% is one way to avoid the problem.  However, I would
  have thought you'd get an error message if this were the case.
 
  HTH .
 
  Peter Alspach
 
   -Original Message-
   From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
   project.org] On Behalf Of Robin Jeffries
   Sent: Monday, 26 April 2010 10:26 a.m.
   To: r-help@r-project.org
   Subject: [R] Obvious reason for not looping twice?
  
   Is there an obvious reason why this won't loop to i=2 and beyond?
   There are many combinations of *st*  *vc* that don't exist in svc.
  For
   example, when s=1 there's only an entry at v=1.  That's fine, the
  entry
   can
   stay 0.
  
   lookup.svc -
   array(0,dim=c(length(unique(svc$st)),length(unique(svc$vc))),
   dimnames=list(unique(svc$st), unique(svc$vc)))
  
   for (i in 1:length(unique(svc$st))) {
 for (j in 1:length(unique(svc$vc))){
lookup.svc[i,j] - svc[svc$st == unique(svc$st)[i]  svc$vc ==
   unique(svc$vc)[j], 4]
   }}
  
  
   Thanks,
   Robin
  
   ~~~
   -Robin Jeffries
   Dr.P.H. Candidate
   UCLA School of Public Health
   rjeffr...@ucla.edu
   530-624-0428
  
 [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-project.org/posting-
   guide.html
   and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Henrique Dallazuanna
 Curitiba-Paraná-Brasil
 25° 25' 40 S 49° 16' 22 O


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Obvious reason for not looping twice?

2010-04-25 Thread Robin Jeffries
Is there an obvious reason why this won't loop to i=2 and beyond?
There are many combinations of *st*  *vc* that don't exist in svc. For
example, when s=1 there's only an entry at v=1.  That's fine, the entry can
stay 0.

lookup.svc - array(0,dim=c(length(unique(svc$st)),length(unique(svc$vc))),
dimnames=list(unique(svc$st), unique(svc$vc)))

for (i in 1:length(unique(svc$st))) {
  for (j in 1:length(unique(svc$vc))){
 lookup.svc[i,j] - svc[svc$st == unique(svc$st)[i]  svc$vc ==
unique(svc$vc)[j], 4]
}}


Thanks,
Robin

~~~
-Robin Jeffries
Dr.P.H. Candidate
UCLA School of Public Health
rjeffr...@ucla.edu
530-624-0428

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problems completely reading in a large sized data set

2010-01-20 Thread Robin Jeffries
I have been through the help file archives a number of times, and still
cannot figure out what is wrong.
I have a tab-delimited text file. 76Mb, so while it's large.. it's not
-that- large. I'm running Win7 x64 w/4G RAM and R 2.10.1

When I open this data in Excel, i have 27 rows and 450932 rows, excluding
the first row containing variable names.

I am trying to get this into R as a dataset for analysis.

z-Data/media1y.txt
f=file(zz,'r') # open the file
rl = readLines(f,1) # Read the first line
colnames-strsplit(rl, '\t')
p = length(colnames[[1]]) # counte the number of columns
nobs-450932
close(f)

Using:
d1-matrix(scan(zz,skip=1,sep=\t,fill=TRUE,what=rep(character,p),
nlines=nobs),ncol=p,nrow=nobs, byrow=TRUE,
dimnames=list(NULL,colnames[[1]]))

produces the error
Read 5761719 items
Warning message:
In matrix(scan(zz, skip = 1, sep = \t, fill = TRUE, what =
rep(character,  :
  data length [5761719] is not a sub-multiple or multiple of the number of
rows [10]

Now, 5761719/27 = 213397.
If I change nobs-213397 it reads in the file with no errors. It produces a
matrix that I can work with from here. But the file obviously is not
complete.

At first I thought it might be reading the first x amount of rows. So I
sorted by the first variable alphabetically in Excel before saving it as a
txt file and reading it into R.
head(d1) shows the correct first 6 rows, but when I ask for tail(d1) the
entry for the first variable in the last row is [213397,] WSAH
The 213397th row in Excel, starts with MM1 and the actual last row starts
with YE. The WSA in question can be found on Excel row # 397548

That, confuses the heck out of me. There are no blank lines.

Since there are 1000 categories for that first variable, i'm not going to
manually match all of the frequencies, but the first 10 were exact, MM1
was correct, and the last few before WSA was also correct. WSA itself
had 3001 observations in R, whereas Excel has 3093. That also makes it seem
that R is stopping reading the table at some point.



It shouldn't be a memory issue right?
 object.size(d1)
56328480 bytes
 memory.size(max=TRUE)
[1] 444.06
 memory.size(max=NA)
[1] 3583.88
 memory.size(max=FALSE)
[1] 251.09



As a side question, i'm reading it all in as characters for now because when
i tried to define a vector of column types wht
-list(rep(character,7),0,logical,0,character)) to use in scan(), it
still read everything in as character. I'm also not sure about the  's, I
had to put them in to get list() to even accept that. Or c(). Any ideas with
this?

Thanks!

-- 
Robin Jeffries
Dr.P.H. Candidate
Department of Biostatistics
UCLA School of Public Health

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] (Solved) Problems completely reading in a large sized data set

2010-01-20 Thread Robin Jeffries
I'm not quite sure why, but reading in the *sorted* data (imported into
Excel, sorted, written to a text file) worked perfectly fine with
read.delim().


Thanks to those that replied!

-Robin

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.