[Rd] calculating means per group

2008-09-04 Thread RFTW

Hi all
I have a very basic question, yet i have not found how to do it.

Suppose my dataset looks like this:

YearAreavalue
1   a   20
1   a   25
1   a   28
1   a   31
1   a   23
1   b   25
1   b   28
1   b   23
1   b   19
2   a   25
2   a   23
2   a   24
2   a   26
2   b   27
2   b   28
2   b   20
2   b   25
2   b   28


Now, i want to calculate a MEAN per year per area. How do i do that?

With mean(value) i calculate the mean of all values of course. I just need
to know how to group year and area correctly.

I assume that i can use this grouping in other calculations too, right?


Cheers,

Luc
-- 
View this message in context: 
http://www.nabble.com/calculating-means-per-group-tp19271479p19271479.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] calculating means per group

2008-09-04 Thread ONKELINX, Thierry
Dear Luc,

You should send this kind of questions to the general mailing list
([EMAIL PROTECTED]) instead of the developer list.

To answer your question: have a look at ?by and ?aggregate

HTH,

Thierry 




ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
[EMAIL PROTECTED] 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens RFTW
Verzonden: donderdag 4 september 2008 9:07
Aan: r-devel@r-project.org
Onderwerp: [Rd] calculating means per group


Hi all
I have a very basic question, yet i have not found how to do it.

Suppose my dataset looks like this:

YearAreavalue
1   a   20
1   a   25
1   a   28
1   a   31
1   a   23
1   b   25
1   b   28
1   b   23
1   b   19
2   a   25
2   a   23
2   a   24
2   a   26
2   b   27
2   b   28
2   b   20
2   b   25
2   b   28


Now, i want to calculate a MEAN per year per area. How do i do that?

With mean(value) i calculate the mean of all values of course. I just
need
to know how to group year and area correctly.

I assume that i can use this grouping in other calculations too, right?


Cheers,

Luc
-- 
View this message in context:
http://www.nabble.com/calculating-means-per-group-tp19271479p19271479.ht
ml
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.%CRLF%The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document%CRLF%

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion of new API function for embedded programming.

2008-09-04 Thread EBo

I stumbled onto a near trivial solution... here is some example code:

  EBo --

#include 
#include 
#include 
#include 


SEXP
LineEval (char *cmd)
{
SEXP ans;
int error;

ans = R_tryLineEval (cmd, R_GlobalEnv, &error);

if (error)
{
fprintf (stderr, "Error evaluating line \"%s\"\n", cmd);
return R_NilValue;
}

return ans;
}


SEXP
R_tryLineEval (char *cmd, SEXP rho, int *error)
{
SEXP cmdSexp, cmdexpr, ans = R_NilValue;
int i;
ParseStatus status;

*error = 0;

// parse the R epression
PROTECT(cmdSexp = allocVector(STRSXP, 1));
SET_STRING_ELT(cmdSexp, 0, mkChar(cmd));
cmdexpr = PROTECT(R_ParseVector(cmdSexp, -1, &status, R_NilValue));
if (status != PARSE_OK) {
UNPROTECT(2);
return R_NilValue;
}
// Loop is needed here as EXPSEXP will be of length > 1 
for(i = 0; i < length(cmdexpr); i++)
{
ans = R_tryEval(VECTOR_ELT(cmdexpr, i), R_GlobalEnv, error);
if (*error) {
UNPROTECT(2);
return R_NilValue;
}
}
UNPROTECT(2);

return ans;
}


int
main (int argc, char *argv[])
{
char *cmd[] =
{"t.test(x,conf.level=0.67)",
 "t.test(x,conf.level=0.67)$conf.int[2]",
 "xyz=c(9,8,7,6); xyz=median(x)",
 "print(x)",
 NULL
};
SEXP ans;
int  i;

char *args[] = {"bla", "--gui=none", "--silent", "--no-save"};
Rf_initEmbeddedR (4, args);

// set the variable "x" to the dataset.
SEXP value;
value = NEW_NUMERIC(11);
for (i=10; 0<=i; i--)
NUMERIC_DATA(value)[i] = i;
PROTECT(value);
setVar(install("x"), value, R_GlobalEnv);
 

// spin through several R expressions and evaluate each.
for (i=0; cmd[i]; i++)
{
ans = LineEval (cmd[i]);
if (R_NilValue != ans)
{
printf ("cmd = \"%s\"\n", cmd[i]);
if (IS_NUMERIC(ans))
printf ("  ans is %f\n", REAL(ans)[0]);
else 
PrintValue(ans);
printf ("#\n\n");
}
}

return 0;
}






EBo <[EMAIL PROTECTED]> said:

> Luke Tierney <[EMAIL PROTECTED]> said:
> 
> > On Wed, 3 Sep 2008, EBo wrote:
> > 
> > > Luke Tierney <[EMAIL PROTECTED]> said:
> > >
> > >> ...
> > >>> do something like the following:
> > >>>
> > >>>  R_Expr = R_Parse1Buffer(&R_ConsoleIob, 0, &status);
> > >>>  if (PARSE_OK==status) {
> > >>>...
> > >>>value = eval(R_CurrentExpr, rho);
> > >>>...
> > >>>  }
> > >>
> > >> We definitely do NOT want this frozen into the public API.
> > >
> > > What is your objection with making something like this a part of the 
> > > public
> > > API?  I understand that having to use the IOBuffer seems a bit much, but
I do
> > > not understand your concern.
> > 
> > We need the freedom to completely change these internals if doing so
> > proves useful.
> 
> Ah, that makes perfect sense.
> 
> Thanks,
> 
>   EBo --
> 



--

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] calculating means per group

2008-09-04 Thread Kjell Konis

Hi Luc,

First of all, questions like this should really be asked on the R-help  
mailing list.


The tapply function does what you want:

> year
 [1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
Levels: 1 2
> area
 [1] a a a a a b b b b a a a a b b b b b
Levels: a b
> value
 [1] 20 25 28 31 23 25 28 23 19 25 23 24 26 27 28 20 25 28

Note that both year and area are factors.

Get the mean for each area:

> tapply(value, area, mean)
   ab
25.0 24.8

If you make the second argument a list then you can subset on both  
factor columns:


> tapply(value, list(year, area), mean)
 a b
1 25.4 23.75
2 24.5 25.60

Kjell


On 4 sept. 08, at 09:06, RFTW wrote:



Hi all
I have a very basic question, yet i have not found how to do it.

Suppose my dataset looks like this:

YearAreavalue
1   a   20
1   a   25
1   a   28
1   a   31
1   a   23
1   b   25
1   b   28
1   b   23
1   b   19
2   a   25
2   a   23
2   a   24
2   a   26
2   b   27
2   b   28
2   b   20
2   b   25
2   b   28


Now, i want to calculate a MEAN per year per area. How do i do that?

With mean(value) i calculate the mean of all values of course. I  
just need

to know how to group year and area correctly.

I assume that i can use this grouping in other calculations too,  
right?



Cheers,

Luc
--
View this message in context: 
http://www.nabble.com/calculating-means-per-group-tp19271479p19271479.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] calculating means per group

2008-09-04 Thread Prof Brian Ripley

On Thu, 4 Sep 2008, ONKELINX, Thierry wrote:


Dear Luc,

You should send this kind of questions to the general mailing list
([EMAIL PROTECTED]) instead of the developer list.

To answer your question: have a look at ?by and ?aggregate


And ?ave .


HTH,

Thierry



-Oorspronkelijk bericht-
Van: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Namens RFTW
Verzonden: donderdag 4 september 2008 9:07
Aan: r-devel@r-project.org
Onderwerp: [Rd] calculating means per group


Hi all
I have a very basic question, yet i have not found how to do it.

Suppose my dataset looks like this:

YearAreavalue
1   a   20
1   a   25
1   a   28
1   a   31
1   a   23
1   b   25
1   b   28
1   b   23
1   b   19
2   a   25
2   a   23
2   a   24
2   a   26
2   b   27
2   b   28
2   b   20
2   b   25
2   b   28


Now, i want to calculate a MEAN per year per area. How do i do that?

With mean(value) i calculate the mean of all values of course. I just
need
to know how to group year and area correctly.

I assume that i can use this grouping in other calculations too, right?


Cheers,

Luc
--
View this message in context:
http://www.nabble.com/calculating-means-per-group-tp19271479p19271479.ht
ml
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en 
binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is 
door een geldig ondertekend document.%CRLF%The views expressed in  this message 
and any annex are purely those of the writer and may not be regarded as stating 
an official position of INBO, as long as the message is not confirmed by a duly 
signed document%CRLF%

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] PDFs on R-devel ...

2008-09-04 Thread Martin Maechler
> "OlegS" == Sklyar, Oleg \(London\) <[EMAIL PROTECTED]>
> on Wed, 3 Sep 2008 13:47:56 +0100 writes:

[.]

OlegS> I attached three PDFs ..

OlegS> Just in case PDFs are removed by the mail server here
OlegS> is the description of the problem:  

[..]

and the PDFs *were* removed ... for a good reason :

Attachments are allowed if they have the correct 'MIME type'
("google" if you don't know what that is).
Your attached files had  content type  'application/octet-stream'
which basically means "unknown binary format"
i.e. the same format that any executable virus / trojan /
... would also come in.

You may need a smarter e-mail client (than Microsoft's) which
uses the correct type, or one where you can specify the MIME
type of an attachment explictly.  

Martin Maechler, ETH Zurich,
Maintainer of the @r-project.org mailing lists.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] "incorrect" ticks in plot(with xlim/ylim) and matplot in R2.7.2, R2.8.0

2008-09-04 Thread Sklyar, Oleg (London)
Dear Martin,

I understand the reasons behind PDF removal, but I actually added a
description of the problem with R2.7+ before... (I cannot select which
email client I use in the office).

Now the problem remains and here is the illustration. The reason for
setting xlim beyond the data range can be e.g. that more data are added
afterwards:

plot(c(-5,5),1:2, xlim=c(-10,10))

R2.6.1 outputs X axis ticks correctly as in:

+--- (-10) - (-5) - (0) - (5) - (10)
---+

R2.8.0 and R2.7.2 patched output X-axis ticks "incorrectly" as in the
following illustration (well, the ticks are correct, but the plots are
ugly because ticks do not cover the whole range requested):

+- (-6) - (-4) - (-2) - (0) - (2) - (4) - (6)
--+

Also, try the following code in R2.6.1 and R2.7+:

m = matrix(c(-0.033, 0.009, 0.064, 0.050, 0.097,
-0.008, 0.037, 0.070, 0.060, 0.077,
-0.027, 0.051, 0.060, 0.106, 0.049,
-0.068, -0.009, 0.095, 0.091, 0.125,
-0.065, 0.013, 0.062, 0.111, 0.080), ncol=5, byrow=TRUE)

plot(c(-5,5),c(0,10),xlim=c(-10,10),ylim=c(-5,15))
x11(); matplot(m)


Here are the seesionInfo's:

-
R version 2.8.0 Under development (unstable) (2008-08-05 r46234) 
x86_64-unknown-linux-gnu 

locale:
C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

-
R version 2.7.2 Patched (2008-08-26 r46442) 
x86_64-unknown-linux-gnu 

locale:
C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

-
R version 2.6.1 (2007-11-26) 
x86_64-unknown-linux-gnu 

locale:
C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base


loaded via a namespace (and not attached):
[1] rcompgen_0.1-17


Dr Oleg Sklyar
Technology Group
Man Investments Ltd
+44 (0)20 7144 3803
[EMAIL PROTECTED] 



**
The contents of this email are for the named addressee(s...{{dropped:22}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion of new API function for embedded programming.

2008-09-04 Thread Simon Urbanek


On Sep 3, 2008, at 9:51 , EBo wrote:



While doing some embedded programming and trying to figure out how  
to generate

a hand coded SEXP equivalent of the line
"t.test(x,conf.level=(1-p))$conf.int[2]" I had an idea for an  
addition to the

embedded API.

There are a number of hidden or static parse functions (R_ParseBuffer,
R_Parse1Buffer, etc.) which take an IoBuffer* and returns a parsed  
tree.  If
one or more of these functions were exported to the Rembedded.h API  
we could

do something like the following:

 R_Expr = R_Parse1Buffer(&R_ConsoleIob, 0, &status);
 if (PARSE_OK==status) {
   ...
   value = eval(R_CurrentExpr, rho);
   ...
 }

or possibly simplifying the interface to take the CMDL string:

 R_Expr = R_Parse1Line("t.test(x,conf.level=(1-p))$conf.int[2]",  
&status);




Why do you think is R_ParseVector not sufficient for this? That is  
what most of use use to achieve exactly what you describe...
For something that even mimics the continuation behavior of the R  
console have a look at parseString function in Rserve.


Cheers,
Simon


I think this would be a useful addition to the embedding interface,  
and
hopefully not difficult to incorporate by someone more experienced  
with the
internals than I currently am.  I took a few hours to look into  
adding this
interface, but will not have time to try to do so for a few months  
-- I have a
couple of hard and fast deadlines over the next couple of weeks.   
So, I would

like to make the suggestion and participate in the dialog.

 Best regards,

 EBo --

ps: if someone can suggest how to hand code
"t.test(x,conf.level=(1-p))$conf.int[2]" so I can embedd it I would  
be most

appreciative.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] patch for graphics/R/plot.R that fixes incorrect tick positions

2008-09-04 Thread Sklyar, Oleg (London)
As I haven't got any replies to my earlier posts about incorrect tick
positions in plot and matplot, here is the simplest patch to correct
this issue (it fixes both plot with xlim/ylim and matplot). The plot.R
was unchanged between 2.7 and current R-devel. It would be great if the
patch could be (tested and) applied to both the current patched and the
current devel versions.

70,71c70,71
<   localAxis(if(is.null(y)) xy$x else x, side = 1, ...)
<   localAxis(if(is.null(y))  x   else y, side = 2, ...)
---
> localAxis(xlim, side = 1, ...)
> localAxis(ylim, side = 2, ...)

It works fine with y given or y NULL, which was an issue before when the
above is.null test was introduced about half a year ago to avoid
conversion to double and thus dropping custom Axis methods. The patched
version was tested with custom time/date Axis methods and works fine, as
well as it works fine with POSIXct objects.

Thanks,
Oleg

Dr Oleg Sklyar
Technology Group
Man Investments Ltd
+44 (0)20 7144 3803
[EMAIL PROTECTED]


**
The contents of this email are for the named addressee(s...{{dropped:22}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Request for consideration of small change to R documentation file names

2008-09-04 Thread Don MacQueen

I hesitate to spend R-core's time with such a small request, but here goes.

On the Manuals page of CRAN one can  download the primary R 
documentation in PDF format. All of the files but one have names 
beginning with "R-". The exception is the R Reference Index 
(fullrefman.pdf). This means that after downloading, all of the files 
but one appear together in a directory listing. It would be 
convenient if *all* of them appeared together.


Hence, I'm requesting that R-core consider changing the name 
"fullrefman.pdf" to "R-fullrefman.pdf".


In the interest of saving time, I don't need an email response to 
this request. The decision will become apparent next time I download 
the documentation, which I do after every update to R.


Thank you,
-Don
--
--
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion of new API function for embedded programming.

2008-09-04 Thread EBo
Simon Urbanek <[EMAIL PROTECTED]> said:

> Why do you think is R_ParseVector not sufficient for this? That is  
> what most of use use to achieve exactly what you describe...
> For something that even mimics the continuation behavior of the R  
> console have a look at parseString function in Rserve.

Thank you for the reply.  

I am new to R, and struggling to learn most of it's aspects and embedding
issues in particular.  When I first posted my questions and comments on the
IRC channel and here I asked how I would go about about achieving this
functionality.  Last night I stumbled onto an example of what R_ParseVector
actually does I was finially able to get it working.  Part of my post was
intended to say I have found a solution, and part of it to show how my
original thought would be done.  There is only on reason to add the API is
code clarity and convenience.  Actually a few additional sentences in the
embedded documentation better explaining what R_ParseVector does would
havekept me from creating this thread and a couple of days of pain.  As for
parseString, this is the first I have read of it, so will now check into it.

Thanks and best regards,

  EBo --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion of new API function for embedded programming.

2008-09-04 Thread Jeffrey Horner

EBo wrote on 09/04/2008 10:33 AM:

Simon Urbanek <[EMAIL PROTECTED]> said:

Why do you think is R_ParseVector not sufficient for this? That is  
what most of use use to achieve exactly what you describe...
For something that even mimics the continuation behavior of the R  
console have a look at parseString function in Rserve.


Thank you for the reply.  


I am new to R, and struggling to learn most of it's aspects and embedding
issues in particular.  When I first posted my questions and comments on the
IRC channel and here I asked how I would go about about achieving this
functionality.  Last night I stumbled onto an example of what R_ParseVector
actually does I was finially able to get it working.  Part of my post was
intended to say I have found a solution, and part of it to show how my
original thought would be done.  There is only on reason to add the API is
code clarity and convenience.  Actually a few additional sentences in the
embedded documentation better explaining what R_ParseVector does would
havekept me from creating this thread and a couple of days of pain.  As for
parseString, this is the first I have read of it, so will now check into it.


Also, study the source code in the littler project. As it is a simple 
command line alternative to the R shell script and executable, it may 
bring you up to speed on simple embedding and parsing; another example 
at least.


http://biostat.mc.vanderbilt.edu/LittleR


Jeff



Thanks and best regards,

  EBo --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
http://biostat.mc.vanderbilt.edu/JeffreyHorner

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion of new API function for embedded programming.

2008-09-04 Thread EBo
Jeffrey Horner <[EMAIL PROTECTED]> said:

> Also, study the source code in the littler project. As it is a simple 
> command line alternative to the R shell script and executable, it may 
> bring you up to speed on simple embedding and parsing; another example 
> at least.
> 
> http://biostat.mc.vanderbilt.edu/LittleR


Jeff,

Thank you for the pointer.  Tis is exactly the kind of examples I had
originally asked for and had hoped to find.

  EBo --

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-04 Thread Andrew Piskorski
I see about 7 different R packages for multi-process parallel
programming.  Which do you think is the best, most complete, and most
robust to pick for general purpose Erlang-style message-passing
programming in R, and why?

First here's my use case, and then my analysis so far.  I often have
code whose basic organization looks something like this:

1. Fetch step: For each date, gather up or pre-process a bunch of
   data.  Return a big list of data, one item on the list for each date.
2. Compute step: For each date on the big list of data, do a bunch of
   computations.

Of course, when the number of dates is large, it's pretty annoying to
wait for all the fetches to complete before starting the compute step.
(Especially when the compute step then hits a bug on the very first
date.)  So in practice, I end up breaking things apart to fetch and
then compute one date at a time, etc.

However, instead of completely serializing everything the way I do
now, it would be nice to have 2 concurrent threads of control
(processes, threads, coroutines, or whatever) which talk to each
other.  Then the compute thread can just periodically say to the fetch
thread, "Give me the next date's worth of data, please."  And usually
the fetch thread will already have that data fetched and ready to go.

Also, sometimes my "compute step" is slow, and has a lots of readily
parallelizable work, so it would be even better if I can optionally
run things across multiple physical machines in a cluster.

How to do it?  R is single-threaded and not thread safe, so threads
are out.  Coroutines are also probably out.  The obvious approach is
to use multiple R processes which talk to each other via some message
passing library.

Fortunately, R has a plethora of such packages.  My question is, which
is the best choice for this sort of use?  From reading their API docs,
here are my brief thoughts on each so far:

- papply:  Not suitable, no bi-directional communication.  Slave
  process return values when the papply() call completes, that's it.

- biopara:  Not suitable, simple one-way master/slave communication
  only, just like papply.

- snow:  Not directly suitable, the supported communication is intended
  to be very simple.  But since it runs on top of Rmpi, perhaps its
  utility code would be useful in conjunction with Rmpi?

- taskPR:  Sounds equivalent to snow.  Also uses MPI underneath.

- Rmpi:  Probably.  Should definitely work for my needs, only question
  is if it's the best choice.  Is it stable, complete, robust, etc.?

- rpvm:  Maybe.  Should be equivalent to Rmpi, but MPI is much more
  popular on clusters than PVM these days.

- NetWorkSpaces:  Maybe.  This looks like a rather mature and
  well-supported multi-language TupleSpace implementation, so it could
  certainly be made to work.

  Passing all my large R data objects back and forth solely as strings
  seems very unappealing, but the docs hint that it includes direct
  (or at least transparent) support for binary R objects.  I need to
  start up and run an explicit NetWorkSpaces Python/Twisted server.

  Also, TupleSpace programming sounds somewhat more limiting than
  Erlang-style message passing (although I definitely do not know that
  for sure!).  On the other hand, the TupleSpace APIs sound a lot
  simpler than MPI.

Since I've never done MPI programming before, I'm also curious about
some of the practical semantics of Rmpi.  E.g., is it possible to send
a message to a busy R process that says, "Stop what you're doing right
now!" and have it obeyed immediately?  Probably not, as I think that
would require either multiple threads or an active event loop
somewhere in either R or the MPI stack.

Finally, here are links and some notes on each of the above 7 packages
(converted from HTML with 'lynx -dump'):

* [1]Rmpi ([2]CRAN, [3]tutorial), [4]rpvm ([5]CRAN). 
* [6]SNOW ([7]CRAN) - Simple Network of Workstations for R, high 
  level interface for parallel R on clusters, uses sockets, MPI, or 
  PVM underneath. Reportedly intended for "embarassingly parallel" 
  not closely coupled problems. 

* [8]papply ([9]CRAN) 
* The [10]Parallel-R project provides both [11]RScaLAPACK ([12]CRAN) 
  and [13]taskPR ([14]old), using MPI. 
* [15]biopara - One-way master/slave communication, much like papply 
  or taskPR. Uses R sockets, no MPI or PVM underneath. 

* [16]NetWorkSpaces for R ([17]article, [18]FAQ) from [19]SCAI is a 
  [20]dual licenced (GPL and commercial) Linda/tuplespace 
  implementation. Also, some aspects sound similar to the [21]data 
  flow variables in [22]Van Roy's [23]CTM and [24]Mozart/Oz. 
 
References 
   1. http://www.stats.uwo.ca/faculty/yu/Rmpi/ 
   2. http://cran.us.r-project.org/src/contrib/Descriptions/Rmpi.html 
   3. http://ace.acadiau.ca/math/ACMMaC/Rmpi/ 
   4. http://www.analytics.washington.edu/statcomp/projects/rhpc/rpvm/ 
   5. http://cran.us.r-project.org/src/contrib/Descriptions/rpvm.html 
   6. http://www.stat.uiowa.edu/~luke/R/cluster/clus

Re: [Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-04 Thread David Bauer
> - taskPR:  Sounds equivalent to snow.  Also uses MPI underneath.

Actually, it is very different from snow.  taskPR was an attempt to get 'free' 
parallelism out of already existing programs by using simple data dependencies 
to figure out which individual statements in a program can be run in parallel.  
The name comes from the description of the program as exploiting task-level 
parallelism.  Compare this to snow which uses data-level parallelism 
(performing the same operation on many pieces of data at once).  Additionally, 
MPI is optional, and only used for the initial setup of processes.
(If anybody actually uses or has successfully used this package, I would love 
to hear about it, btw.  While the package *does* work, there are probably few 
cases where it is worth it.)


David Bauer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] lapply(NULL, ...) returns empty list

2008-09-04 Thread Vadim Organovich
Dear R-devel,

Is there a reason that lapply(NULL, ...) returns the empty list, rather than 
NULL? It seems intuitive to expect the latter, and rather counterintuitive that 
lapply(list(), ... ) returns the same value as lapply(NULL, ...).

> lapply(list(), function(x) 1)
list()
>  lapply(NULL, function(x) 1)
list()
> version
   _
platform   i386-pc-mingw32
arch   i386
os mingw32
system i386, mingw32
status
major  2
minor  7.1
year   2008
month  06
day23
svn rev45970
language   R
version.string R version 2.7.1 (2008-06-23)
>

Regards,
Vadim

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments.  Email 
transmission cannot be guaranteed to be secure or error-free.  Jump Trading, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments.  This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] rgui.exe memory used (PR#12697)

2008-09-04 Thread cedric . carpentier
Full_Name: cedric Carpentier
Version: 2.6.1
OS: windows 2000
Submission from: (NULL) (212.11.24.48)


Hello, 

I will try to be comprehensive, but excuse me for my approximative english.

I use R to extract data from Factset and  then upload its to a personnal
database. After the storage I create different matrix to do some calculations.

The problem is that when I Open Rgui.exe, this exe take only 24 Mb of memory,
and after matrix creation more than 500mb...

I use rm (list =ls()) + gc() to clear all the R objects..in R i've no more
object (ls () return 0) but Rgui.exe keep 400 Mb of memory used in windows.

In fact I don't succeed to return to the original memory used, that is to say
24Mb.

I've trying to see if an environment have an abnormaly growth but not...?

And then more I advance in my script more the memory growth until the rupture.

Then there is something to do ?

Please note that I used the RODBC package

Many thanks

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-04 Thread Andrew Piskorski
On Thu, Sep 04, 2008 at 04:06:31PM -0400, David Bauer wrote:

> taskPR was an attempt to get 'free' parallelism out of already
> existing programs by using simple data dependencies to figure out
> which individual statements in a program can be run in parallel.
> The name comes from the description of the program as exploiting
> task-level parallelism.

Ah, and thus your reference to Tomasulo's algorithm, interesting.
Thanks for straightening me out there.

  http://users.ece.gatech.edu/~gte810u/Parallel-R/

> (If anybody actually uses or has successfully used this package, I
> would love to hear about it, btw.  While the package *does* work,
> there are probably few cases where it is worth it.)

What would you say typically limits taskPR's approach, not finding
enough instruction-level parallelism at the R script level, or the
communications overhead (probably latency) of trying to make use of
it?

If latency, then perhaps taskPR would work better in a multi-threaded
R interpreter, rather than across a TCP/IP network fabric.  To roughly
test that empirically (assuming you are in fact using MPI for the
communications), I suppose you could start up your several R processes
on a single fat SMP node, and use an MPI that sends messages through
fast shared memory.  That's probably still slower than
thread-to-thread communications, but it should be much lower latency
than TCP/IP.  Maybe you already tried something like that?

-- 
Andrew Piskorski <[EMAIL PROTECTED]>
http://www.piskorski.com/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] lapply(NULL, ...) returns empty list

2008-09-04 Thread hadley wickham
On Thu, Sep 4, 2008 at 4:01 PM, Vadim Organovich
<[EMAIL PROTECTED]> wrote:
> Dear R-devel,
>
> Is there a reason that lapply(NULL, ...) returns the empty list, rather than 
> NULL? It seems intuitive to expect the latter, and rather counterintuitive 
> that lapply(list(), ... ) returns the same value as lapply(NULL, ...).


  X: a vector (atomic or list) or an expressions vector.  Other
  objects (including classed objects) will be coerced by
  'as.list'.

> as.list(NULL)
list()

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Erlang-style message-passing in R: Rmpi, Snow, NetWorkSpaces, etc.

2008-09-04 Thread David Bauer

What would you say typically limits taskPR's approach, not finding
enough instruction-level parallelism at the R script level, or the
communications overhead (probably latency) of trying to make use of
it?


Depends on the specific function.  The communication cost is 
significant, especially serialization and deserialization.  (Since I 
finally found the right way to force a flush of the TCP data, the actual 
network cost isn't a problem for moderate sized data.)  For reasons of 
simplicity of implementation and ease of correctness, a lot of the R 
environment is serialized and sent over with *each* operation.


In terms of the instruction-level parallelism available, code that is a 
performance bottle-neck is usually re-written in C or Fortran and called 
in large blocks.  So now the program is trying to find parallelism in 
the large blocks, which it usually can't.


I didn't have a lot of suitable code to try, and so the best example 
program was one that did a complex calculation followed by an accumulate 
operation in a loop.  Parallel-R/taskPR dynamically unrolled the loop 
(just like Tomosulo's algorithm does on a processor) and got a 
reasonable speedup (about half of linear).  Unfortunately, I don't even 
have that code example any more.




If latency, then perhaps taskPR would work better in a multi-threaded
R interpreter, rather than across a TCP/IP network fabric.


Yes, most especially if serialization and deserialization could be 
avoided.  However, I don't believe R is thread-safe?  (Using shared 
memory, but between multiple R processes, was on the TODO list when the 
project ended.)


I was fortunate to have access to a very large NUMA machine at the time 
that I was originally working on this project, so the network itself 
wasn't a limiting factor.  (The network stack turned out to be a 
problem, though.)



David Bauer

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel