[R] SQL Primer for R

2008-08-25 Thread ivo welch
Dear R wizards:

I decided to take the advice in the R data import/export manual and
want to learn how to work with SQL for large data sets.  I am trying
SQLite with the DBI and RSQLite database interfaces.  Speed is nice.
Alas, I am struggling to find a tutorial that is geared for the kind
of standard operations that I would want in R.  Simple things:

*  how to determine the number of rows in a table.  (Of course, I
could select a row of data and then use this.)

*  how to insert a new column into my existing SQL table---say, the
rank of another variable---and save it back.  Am I supposed to create
a new data frame, then save it as a new table, then delete the old SQL
table?

*  how to save a revised version of my table in a different sort order
 (with or without deleting the original table).  <-- I guess this is
not appropriate, as I should think of SQL tables as unordered.

I guess these would make nice little text snippets in the R Data
import/export manual, too.  help appreciated.

regards,

/ivo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R-ish challenge for dependent ranking

2008-08-25 Thread ivo welch
Dear R wizards:  First, thanks for the notes on SQL.  These pointers
will make it a lot easier to deal with large data sets.   Sorry to
have a second short query the same day.  I have been staring at this
for a while, but I cannot figure out how to do a dependent ranking the
R-sh way.

ds= data.frame( xn=rnorm(32), yn=rnorm(32), zn=rnorm(32) )
ds$drank1group= as.integer((rank( ds$xn )-1)/4)  # ok, the first set
of 8 groups, each with 4 elements

ds$drank2.bydrank1group= ??? ## here I want within each drank1group
the rank based on yn (from 1 to 4)

something like "by(ds,drank1group, rank(ds$yn))".  obviously, this
neither works nor has same dimensional output.

of course, there is a really simple, clever way to do this in
R...except that it totally eludes me.  before I start writing a hand
iterating function, could someone please let me know how to do this?

regards,

/iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-ish challenge for dependent ranking

2008-08-25 Thread ivo welch
thank you everybody, again.  regards, /iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQL Primer for R

2008-08-26 Thread ivo welch
Sorry, chaps.  I need one more:

> dbDisconnect(con.in)
Error in sqliteCloseConnection(conn, ...) :
  RS-DBI driver: (close the pending result sets before closing this connection)
>

I am pretty sure I have fetched everything there is to be fetched.  I
am not sure what I need to do to say goodbye (or to find out what is
still pending).  ?dbDisconnect doesn't tell me.  PS: the documentation
for dbConnect should probably add "dbDisconnect" to its 'See also'
section.

regards,

/iaw

Really irrelevant PS: the "by" function could keep the number of
observations that go into each category.  I know it can be computed
separately, which is what I am doing now.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error instead of warning?

2008-08-31 Thread ivo welch
dear R experts---is it possible to ask R to abort with an error
instead of just giving a warning when I am mis-assigning vectors (or
other data structures) that are not compatible?  that is, I would like

1: In matrix(value, n, p) ... :
  data length [12] is not a sub-multiple or multiple of the number of
columns [11]

to force an error.  are there any other warnings() that are really
more programming errors that I could also convert into an abort?

sincerely,

/iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQL Primer for R

2008-08-31 Thread ivo welch
stumped again by SQL...  If I have a table named "main" in an SQLite
data base, how do I get the names of all its columns?  (I have a mysql
book that claims the SHOW command does this sort of thing, but it does
not seem to work on SQLite.)

regards,  /iaw

PS: Thanks for the earlier emails on "warn=2".

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQL Primer for R

2008-09-01 Thread ivo welch
wow!  the answer seems to be "pragma table_info(main);"  thanks, Gabor.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] debugging

2008-09-01 Thread ivo welch
dear R wizards---I am not sure at what point I owe pennance for asking
so many questions.  I am now wrangling debugging.  I want to write a
function

assert = function( condition, ... ) {
  if (!condition) {
cat(...); cat("\n");
browser();
  }
  stopifnot(condition);
}
assert( nrow(ds)==12, "My data set has ", nrow(ds), ", rows which is a
bad error.");

(Please ignore my semicolons.)  Then, having invoked "browser()", I
would like by hand to be able to move back up one stack frame so that
I can look at my variables there.  Alas, two "little" problems.
First, the cat() does not seem to work.  It prints the arguments
themselves, rather than 'nrow(ds)'.  Second, I have no idea how to
move up one stack frame to the calling function so that I can examine
better what went wrong.  Is this possible?

As always, advice is highly appreciated.

Regards,

/ivo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] email just sent

2008-09-01 Thread ivo welch
please ignore part 1.  of course, the cat works.  my mistake.  I just
need to learn how to step up frames, please.

regards,

/iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] basic boxplot questions

2009-01-16 Thread ivo welch
dear R experts:

I am playing with boxplots for the first time.  most of it is
intuitive, although there was less info on the web than I had hoped.

alas, for some odd reason, my R boxplots have some fat black dots, not
just the hollow outlier plots.  Is there a description of when R draws
hollow vs. fat dots somewhere?

[and what is the parameter to change just the size of these dots?]

Also, let me show my fundamental ignorance:  I am a little surprised
that the average box boxplot would not show the mean and sdv, too, at
least optionally.  Is there a common way to accomplish this (e.g., in
a different color), or do I just construct it myself with standard R
graphics line() commands?

advice appreciated.

regards,

/iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R 2.70 + ps2pdf14

2008-05-17 Thread ivo welch
dear R graphics experts---if anyone is running the combination of R
2.7.0 and ghostscript (2.62), could you please run the following and
let me know if you get the same strange symbol size that I do, or if
there is something weird on my system?regards, /ivo



pdf(file = "testhere.PDF", version="1.4", pointsize=14);

plot(0, xlim=c(0,26), ylim=c(-1,4.5), type="n" );

text(10, -0.5, "line 1 is plain, line 2 should be the same, except
blue\nline 3 should be 1.5 times the size of line 1, but otherwise the
same\nline 4 should be like line 4 except blue.\n\nthis gets weird
symbol sizes at certain spots after ps2pdf14 is applied\n --- R output
or ps2pdf error?", cex=0.5);

points(1:25,rep(1,25), pch=1:25);
points(1:25,rep(2,25), pch=1:25, col="blue");

points(1:25,rep(3,25), pch=1:25, cex=1.5);
points(1:25,rep(4,25), pch=1:25, col="blue", cex=1.5);

text(1, 0.2, "1 != 2", srt=90, cex=0.5);
text(16, 0.2, "1 != 2", srt=90, cex=0.5);
text(19, 0.2, "3 != 4, 1 != 2", srt=90, cex=0.5);
text(20, 0.2, "(1,2) > (3,4)", srt=90, cex=0.5);
#text(20, 0.4, "shrunk on cex=1.5", srt=90);

dev.off();

retcode= system( "ps2pdf14 testhere.PDF" );

#  now look at testhere.PDF.pdf , which is the ps2pdf14 output ;

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R 2.70 + ps2pdf14

2008-05-18 Thread ivo welch
thanks, berwin.  yes, I meant ghostscript 8.62, of course.  ps2pdf14
is the equivalent of a distiller and is needed to embed fonts.  (R
does not embed fonts itself afaik.)  if someone knows another way to
embed all the fonts, I would love to know so that I can avoid ps2pdf14
altogether (not just in this example, but generally; I use lucida
fonts most of the time).

if developers from the R graphics group are reading this, given that
this strange output is not just my imagination, maybe it would be
worthwhile to see if the R output pdf could be made more robust to
avoid this "feature."  I stumbled onto it deep in a program, and spend
an afternoon distilling it down to the R script that I posted.  It was
quite puzzling.

PS: semicolons are a hobby, and one explicitly allowed by R.  ;-).

regards,

/ivo


On Sun, May 18, 2008 at 4:18 AM, Berwin A Turlach
<[EMAIL PROTECTED]> wrote:
> G'day Ivo,
>
> On Sat, 17 May 2008 21:33:35 -0400
> "ivo welch" <[EMAIL PROTECTED]> wrote:
>
>> dear R graphics experts---
>
> Not belonging to this group, but can confirm that I can see the same,
> in particular the circles are changing their size.
>
> However, I am a bit surprised that you run ps2pdf14 on a PDF file,
> according to the documentation the input should be a (E)PS file.
>
>> if anyone is running the combination of R 2.7.0 and ghostscript
>> (2.62),
>
> and I guess you mean ghostscript 8.62?  Ghostscript 2.62 would be
> really ancient, probably from before the time that PDF was created... :)
>
>> could you please run the following
>
> I will also leave it to somebody else on the list, specifically to
> people who find such coding particularly ugly if not offensive, to point
> out that semicolons are not needed at the end of lines of R scripts. :)
>
> HTH.
>
> Cheers,
>
>Berwin
>
> === Full address =
> Berwin A TurlachTel.: +65 6515 4416 (secr)
> Dept of Statistics and Applied Probability+65 6515 6650 (self)
> Faculty of Science  FAX : +65 6872 3919
> National University of Singapore
> 6 Science Drive 2, Blk S16, Level 7  e-mail: [EMAIL PROTECTED]
> Singapore 117546http://www.stat.nus.edu.sg/~statba
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R 2.70 + ps2pdf14

2008-05-20 Thread ivo welch
thanks.  I am now using R-patched 2008-05-18 r45723 .  This is
probably intended, but if not, I wanted to note it briefly: on the pdf
output device, symbol 1 is always black, no matter what color is
selected.  symbols 10 and 13 contain black.  symbol 19 is the
replacement for symbol 1 that takes on the color.

forgive the semicolons:

pdf.start("test");  # just encapsulates what you would expect.
NM=25;
plot( 0, type="n", ylim=c(0,6), xlim=c(0,NM), xlab="0-8", ylab="0-5"  );
points( 1:NM, rep(1,NM), pch=1:NM, col="black");
points( 1:NM, rep(2,NM), pch=1:NM, col="green");
text( 1:NM, rep(3,NM), 1:NM, col=1:NM, cex=0.75);
pdf.end();


/iaw


On Sun, May 18, 2008 at 12:51 PM, hadley wickham <[EMAIL PROTECTED]> wrote:
>> if developers from the R graphics group are reading this, given that
>> this strange output is not just my imagination, maybe it would be
>> worthwhile to see if the R output pdf could be made more robust to
>> avoid this "feature."  I stumbled onto it deep in a program, and spend
>> an afternoon distilling it down to the R script that I posted.  It was
>> quite puzzling.
>
> Have you tried R-patched?  I think Brian Ripley fixed this some days ago.
>
> Hadley
>
> --
> http://had.co.nz/
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] x86 SSE* Pointer Favors

2008-06-12 Thread ivo welch
Dear Statisticians--- This is not even an R question, so please
forgive me.  I have so much ignorance in this matter that I do not
know where to begin.  I hope someone can point me to documentation
and/or a sample.

I want to compute a covariance as quickly as non-humanly possible on
an Intel core processor (up to SSE4) under linux.  Alas, I have no
idea how to engage CPU vectorization.  Do I need to use special data
types, or is "double" correct?  Does SSE* understand NaN?  Should I
rely on gcc autodetection of the vectorized meaning of my code, or are
there specific libraries that I should call?

What I want to learn about is as simple as it gets:
  typedef double Double;  // or whatever SSE* needs as close equivalent
  Double vector1[N], vector2[N];
  // then fill them with stuff.
  vector3= vector_mult(vector1,vector2, N);
  vector4= sum(vector1, N);

I just need a pointer and/or primer.  PS: If someone knows of a
superfast vectorized implementation of Gentleman's WLS algorithm,
please point me to it, too.  I am still using my old non-vectorized C
routines.

if this email offends as spam, apologies.

regards,

/iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] recursive beta with cutoffs on large data set

2008-06-15 Thread ivo welch
dear R experts:  I have an academic question that borders on asking
for consulting help, so I hope I am not too imposing.  If I am, please
ignore me.

My data set has 100MB data set of daily stock returns.  I want to
compute rolling (recursive?) betas---either bivariate or
multivariate---with respect to some other data time series.  Many of
these regressions are "take away the first observation, add one
observation at the end," which means I really have only about 30,000
unique regressions---still, quite a good number.   Worse, I want to
winsorize the rolling y-vector at different levels (99%&1%, 98%&2%,
...), so I want to repeat this procedure a few hundred times at
different winsorization levels.

The most important version of my task is bivariate regressions, which
may mean that I don't even need MV overhead.

I was even thinking of coding in C rather than R for speed sake, but I
am now thinking that learning the intricacies of fast vector
processing on x86 processors is so difficult, I would be done running
in R faster before I would be done programming it.

Has anyone done something like this?  Any recommendations for what
could help give me high-speed the I probably need for a task like
this?  Any thoughts?

(I am right now working on getting blas-atlas to compile on my gentoo
system.  It just died in the compilation over something.)

regards,

/ivo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] very fast OLS regression?

2009-03-25 Thread ivo welch
Dear R experts:

I just tried some simple test that told me that hand computing the OLS
coefficients is about 3-10 times as fast as using the built-in lm()
function.   (code included below.)  Most of the time, I do not care,
because I like the convenience, and I presume some of the time goes
into saving a lot of stuff that I may or may not need.  But when I do
want to learn the properties of an estimator whose input contains a
regression, I do care about speed.

What is the recommended fastest way to get regression coefficients in
R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
low-level C form somewhere?  that one was always lightning fast for
me.)

regards,

/ivo



bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));

byhand = function( y, x ) {
  xy<-t(x)%*%y;
  xxi<- solve(t(x)%*%x)
  b<-as.vector(xxi%*%xy)
  ## I will need these later, too:
  ## res<-y-as.vector(x%*%b)
  ## soa[i]<-b[2]
  ## sigmas[i]<-sd(res)
  b;
}


MC=500;
N=1;


set.seed(0);
x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
y= matrix( rnorm(N*MC), nrow=N, ncol=MC );

ptm = proc.time()
for (mc in 1:MC) byhand(y[,mc],x[,mc]);
cat("By hand took ", proc.time()-ptm, "\n");

ptm = proc.time()
for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
cat("By built-in took ", proc.time()-ptm, "\n");

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-25 Thread ivo welch
thanks, dimitris.  I also added Bill Dunlap's "solve(qr(x),y)"
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).

regards,

/iaw


On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
 wrote:
> check the following options:
>
> ols1 <- function (y, x) {
>    coef(lm(y ~ x - 1))
> }
>
> ols2 <- function (y, x) {
>    xy <- t(x)%*%y
>    xxi <- solve(t(x)%*%x)
>    b <- as.vector(xxi%*%xy)
>    b
> }
>
> ols3 <- function (y, x) {
>    XtX <- crossprod(x)
>    Xty <- crossprod(x, y)
>    solve(XtX, Xty)
> }
>
> ols4 <- function (y, x) {
>    lm.fit(x, y)$coefficients
> }
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package installation on OSX --- suggestion

2009-03-26 Thread ivo welch
dear R experts:

I am trying to install packages in OSX, R 2.8.1.  Since I do this
about every 2 years, I have completely forgotten it.  However, this
should not be difficult:

   
http://wiki.r-project.org/rwiki/doku.php?id=getting-started:installation:packages

nice document.  beautiful method.  so, I start with

   update.packages()

the final message tells me that it saved all the packages into
/var/folders/Ia/IaQbr8K+GQ8DqdaGMAC18yU/-Tmp-/RtmpjRkMV7/downloaded_packages/
.  not exactly user-friendly.  at this point, I don't know whether
they were also installed or just downloaded.  the same happens when I
do an install.package("plm", dependencies=T).  would it not make sense
if the package were installed in the standard R library location at
this point, and the final message to tell me that the package was
indeed installed, and not about the temporary directory?

[I suspect that it actually did the install, so this is just a "final
message" issue.]

just a suggestion...

[and thanks everybody for all the help yesterday.  now back to my moments.]

regards,

/ivo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pgmm (blundell-bond) help needed

2009-03-26 Thread ivo welch
I have been playing with more examples, and I now know that with
larger NF's my example code actually produces a result, instead of a
singular matrix error.  interestingly, stata's xtabond2 command seems
ok with these sorts of data sets.  either R has more stringent
requirements, or stata is too casual.  in any case, I find it strange
that Blundell-Bond would not work on data sets in which N=20 and T=10,
and there is only one parameter to estimate.  there should be more
than enough degrees of freedom.

I will experiment more with it.

regards,

/iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Hurwicz Bias Correction

2009-04-13 Thread ivo welch
Dear Experts---Sorry, I need some help again.  I need a very fast
estimator for small sample time-series in which the autocoefficient
can be anything between 0 and 2 (i.e., even beyond the unit-root).  I
think this means that I will need to run OLS.  Of course, this means
that I will run into the Hurwicz bias.  So I am wondering whether
there is a reasonably fast approximate correction for the
autocoefficient, presumably as a function of N, Var(x), and estimated
a, b, and Var(e).   Even a function with some reasonable amount of
lookup would be ok.  (I have searched google and found nothing.)
Pointers appreciated.

sincerely,  /iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] static variable?

2009-04-16 Thread ivo welch
dear R experts:

does R have "static" variables that are local to functions?  I know
that they are usually better avoided (although they are better than
globals).

However, I would like to have a function print how often it was
invoked when it is invoked, or at least print its name only once to
STDOUT when it is invoked many times.

possible without <<- ?

sincerely,

/iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pointers needed to expected values of fractions

2009-05-09 Thread ivo welch
I apologize in advance for a more statistical question.  I am trying
to find out whether a transformation of two random variables X and Y (
z= g(X,Y) ) exists whose expected value is E(X)/E(Y).  obviously, it
ain't E(X/Y).  is there a book or place where I could learn this?
(Also, I would be interested to learn more about the properties of
E(X/Y) if they have been worked out (and not just when X and Y are
independent), so if there is a book for this one, I would again be
quite interested, too.)  this is not an R topic, so please email me
directly if you know where I could look.  thanks in advance.  again,
apologies for the clutter... /iaw

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] constrOptim parameters

2009-09-11 Thread ivo welch
Dear R wizards:   I am playing (and struggling) with the example in the
constrOptim function.  simple example.  let's say I want to constrain my
variables to be within -1 and 1.I believe I want a whole lot of
constraints where ci is -1 and ui is either -1 or 1.  That is, I have 2*N
constraints.  Should the following work?

N=10
x= rep(1:N)
ci= rep(-1, 2*N)
ui= c(rep(1, N), rep(-1, N))
constrOptim( x, f, NULL, ui, ci, method="Nelder-Mead");

actually, my suggestions would be to give an example in the constrOptim docs
where the number of constraints is something like this example.  the current
ones have 2*2 constraints, so it is harder to figure out the appropriate
dimensions for different cases by extending the examples.  on another note,
the "non-conformable arguments" error could be a little more informative,
telling the end user what the two incompatible dimensions actually are.
this is not hard to find out by hand, but it would still be useful.

regards,

/iaw
-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 64-bit OSX binary for 2.9.2

2009-09-14 Thread ivo welch
dear R wizards:  I am looking for a binary package distribution of R 2.9.2
for OSX .  Looking at http://r.research.att.com/ , there seems to be only a
binary for 2.9.0 .  is the 2.9.2 version binary package available
somewhere?   (at this point, would it make sense to elevate the 64-bit
version to a "standard recommended" rather than just a "boutique" version?)

sincerely, /iaw
-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Location of Packages?

2009-09-14 Thread ivo welch
Sorry, one more: on OSX, I deleted my old 2.9.2 R.app, and installed the 64
bit version of 2.9.0.  I then did an "install.packages("car")" under my new
2.9.0.  It seems to have worked, but alas, I still get an error that package
'car' was built under R version 2.9.2 .  Where exactly does R under OSX
install its packages?  (is it a bug that another car is loaded?)

PS: do I need to install the car packages under the 64-bit version, or will
it be seen by the 64 bit version if I do a 32-bit install?  Or do I need to
do a double install?  for safety, I did it under the command line version,
which I presume is still 32-bit, and the 64 bit GUI.

PPS: how do I learn which version of R is running?

regards, /iaw

-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Location of Packages?

2009-09-14 Thread ivo welch
thanks, everyone.  I was a bit confused.

I now think that the "car" error was because the package on the cran website
itself was built under 2.9.2, not because I had an old version lying around,
which my package continued to use instead of a newer version that I would
just have installed.

I was also ambiguous about asking about version.  Sorry.  I did not mean
version number of R, but the bit version.  the answer is that the two
methods are sessionInfo() and bit64= ifelse(.Machine$sizeof.point == 8, T,
F) .  It would be nice if the standard R startup message would state whether
the version is 64bit or 32bit, but this is just a suggestion.

now, all I need is a more recent packaged R version than 2.9.0.  can I ask
who maintains http://r.research.att.com/ , so I send a short email to this
person?

regards,

/iaw

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fastest OLS w/ NA's and need for SE's

2009-09-14 Thread ivo welch
dear R wizards:  apologies for two queries in one day.   I have a long form
data set, which identifies about 5,000 regressions, each with about 1,000
observations.

unit  date   y x
120060101 
120060102 
...
5000   20081230  
5000   20081231  

I need to run such regressions many many times, because they are part of an
optimization.  thus, getting my code to be fast is paramount.   I will need
to pick off the 5,000 coefficients on x (i.e., b) and the standard errors of
b's.  I can ignore the 5,000 intercept.

by( dataset, as.factor(dataset$unit), function(x) coef(lm( y ~ x,
data=x)) )
gives me the coefficients.  of course, I could use the summary method to lm
to pick off the coefficient standard errors, too.  my guess is that this
would be slow.

I think the alternative would be to delete all NAs first, and then use a
building block function (such as lm.fit(), or solve(qr(),y)).  this would be
fast for getting the coefficients, but I wonder whether there is a *FAST*
way to obtain the standard error of b.  (I do know slow ways, but this would
defeat the purpose.)  is this the right idea?  or will I just end up with
more code but not more speed than I would with summary(lm())?  can someone
tell me the "fastest" way to generate b and se(b)?

is there anything else that comes to mind as a recommended way to speed this
up in R, short of writing everything in C?

as always, advice highly appreciated.

/iaw
-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] why is nrow() so slow?

2009-09-15 Thread ivo welch
dear R wizards:  here is the strange question for the day.  It seems to me
that nrow() is very slow.  Let me explain what I mean:

ds= data.frame( NA, x=rnorm(1) )   ##  a sample data set

> system.time( { for (i in 1:1) NA } )   ## doing nothing takes
virtually no time
   user  system elapsed
  0.000   0.000   0.001

## this is something that should take time; we need to add 10,000 values
10,000 times
> system.time( { for (i in 1:1) mean(ds$x) } )
   user  system elapsed
  0.416   0.001   0.416

## alas, this should be very fast.  it is just reading off an attribute of
ds.  it takes almost a quarter of the time of mean()!
> system.time( { for (i in 1:1) nrow(ds) } )
   user  system elapsed
  0.124   0.001   0.125

## here is an alternative way to implement nrows, which is already much
faster:
> system.time( { for (i in 1:1) length(ds$x) } )
   user  system elapsed
  0.041   0.000   0.041

is there a faster way to learn how big a data frame is?  I know this sounds
silly, but this is inside a "by" statement, where I figure out how many
observations are in each subset.  strangely, this takes a whole lot of
time.  I don't believe it is possible to ask "by" to attach an attribute to
the data frame that stores the number of observations that it is actually
passing.

pointers appreciated.

regards,

/iaw
-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why is nrow() so slow?

2009-09-15 Thread ivo welch
hi david---no, this time I actually know what I was asking ( ;-) ).   I do
need speed computed on many data sets, each of which is created by a "by"
statement.  so, no iterative programming on my side.

thanks, hadley for the pointer to .row_names_info() in lieu of dim() or
nrows().  I don't seem to understand the second (type) argument, despite
reading the docs, but all of them are giving the same answer in my data
frames.  so, I guess I will stick to "2" for the time being.

regards,

/iaw

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why is nrow() so slow?

2009-09-15 Thread ivo welch
interestingly, in my case, the opposite seems to be the case.  data frames
seem faster than matrices when it comes to "by" computation (which is where
most of my calculations are in):

### here is my data frame and some information about it
> dim(rets.subset)
[1] 132508  3
> names(rets.subset)
[1] "PERMNO" "RET""mdate"
> length(unique(as.factor(rets.subset$PERMNO)))
[1] 6832
> length((as.factor(rets.subset$PERMNO)))
[1] 132508

### calculation using data frame
> system.time( { by( rets.subset, as.factor(rets.subset$PERMNO), mean) } )
   user  system elapsed
  3.295   2.798   6.095

### same as matrix
> m=as.matrix(rets.subset)
> system.time( { a=by( m, as.factor(m[,1]), mean) } )
   user  system elapsed
  5.371   5.557  10.928

PS: Any speed suggestions are appreciated.  This is "experimenting time" for
me.


> One note:  if you're worried about speed, it almost always makes sense to
use matrices rather than dataframes.  If you've got mixed types this is
tedious and error-prone (each type needs to be in a separate matrix), but if
your data is all numeric, it's very simple, and will make things a lot
faster.




>
> Duncan Murdoch
>



-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)
CV Starr Professor of Economics (Finance), Brown University
http://welch.econ.brown.edu/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cluster-lite

2009-09-15 Thread ivo welch
I am about to write a "cluster-lite" R solution for myself.  I wanted to
know whether it already exists.  If not, I will probably write up how I do
this, and I will make the code available.

Background: we have various linux and OSX systems, which are networked, but
not set up as a cluster.  I have no one here to set up a cluster, so I need
a "hack" that facilitates parallel programming on standard networked
machines.   I have accounts on all the machines, ssh access (of course
password-less), and networked file directory access.

what I am ultimately trying to accomplish is built around a "simple"
function, that my master program would invoke:

master.R:
   multisystem( c("R slv.R 1 20 file1.out", "R slv.R 21 40 file2.out", "ssh
anotherhost R slv.R 41 80 file3.out"), announce=300)

multisystem() should submit all jobs simultaneously and continue only after
all are completed.  it should also tell me every 300 seconds what jobs it is
still waiting for, and which have completed.

with basically no logic in the cluster, my master and slv programs have to
make up for it.  master.R must have the smarts to know where it can spawn
jobs and how big each job should be.  slv.R must have the smarts to place
its outputs into the marked files on the networked file directory.  master.R
needs the smarts to combine the outputs of all jobs, and to resubmit jobs
that did not complete successfully.  again, the main reason for doing all of
this is to avoid setting up a cluster across OSX and linux system, and still
to make parallel processing across linux/osx as easy as possible.  I don't
think it gets much simpler than this.

now, I know how to write the multisystem() in perl, but not in R.  so, if I
roll it myself, I will probably rely on a mixed R/perl system here.  This is
not desirable, but it is the only way I know how to do this.  if something
like multisystem() already exists in R native, please let me know and save
me from reinventing the wheel.  if it does not, some perl/R combo for this
soon will.

regards,

/iaw


-- 
Ivo Welch (ivo.we...@brown.edu, ivo.we...@gmail.com)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.