Re: [R] Agreggating data using external aggregation rules

2008-06-06 Thread Gabor Grothendieck
Use aggregate() for aggregation and use indexing or subset() for selection.
Alternately try the sqldf package: http://sqldf.googlecode.com
which allows one to perform SQL operations on data frames.

On Fri, Jun 6, 2008 at 6:12 AM,  <[EMAIL PROTECTED]> wrote:
> Dear R experts,
>
> I am currently facing a tricky problem which I have read a lot about in
> the various R mailing lists without finding exactly what I need.
> I have a big data frame DF (about 2,000,000 rows) with 7 columns being
> variables and 1 being a measure (using reshape package nomeclature).
> There are no "duplicates" in it.
> Fot each of the variables I have some "rules" to apply, being COD_IN the
> value of the variable in the DF, COD_OUT the one to be transformed to;
> once obtained the "new codes" in the DF I have to aggregate the "new DF"
> (for example summing the measure).
> Usually the total transformation (merge+aggregate) really decreases the
> number of  lines in the data frame, but sometimes it can grows depending
> on the rule. Just to give an idea, the first "rule" in v1 maps 820
> different values into 7 ones.
> Using SQL and a database this can be done in a very straightforward way
> (for example on the variable v1):
>
> Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
> >From DF, RULE_v1
> Where v1=COD_IN
> Group by v2, v3,v4, v5, v6, v7
>
> So the first choice would be using a database; the second one would be
> splitting the data frame and then joining the results.
> Is there any other possibility to merge+aggregate caused by the merge ?
>
> Thank you in advance
>
> Angelo Linardi
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e 
> non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che 
> cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La 
> preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il 
> contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi 
> allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are neither 
> binding on
> the Bank nor to be understood as creating any obligation on its part except 
> where
> provided for in a written agreement. This e-mail is confidential. If you have 
> received it
> by mistake, please inform the sender by reply e-mail and delete it from your 
> system.
> Please also note that the unauthorized disclosure or use of the message or any
> attachments could be an offence. Thank you for your cooperation. **
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Agreggating data using external aggregation rules

2008-06-06 Thread ANGELO.LINARDI
Dear R experts,

I am currently facing a tricky problem which I have read a lot about in
the various R mailing lists without finding exactly what I need.
I have a big data frame DF (about 2,000,000 rows) with 7 columns being
variables and 1 being a measure (using reshape package nomeclature).
There are no "duplicates" in it. 
Fot each of the variables I have some "rules" to apply, being COD_IN the
value of the variable in the DF, COD_OUT the one to be transformed to;
once obtained the "new codes" in the DF I have to aggregate the "new DF"
(for example summing the measure).
Usually the total transformation (merge+aggregate) really decreases the
number of  lines in the data frame, but sometimes it can grows depending
on the rule. Just to give an idea, the first "rule" in v1 maps 820
different values into 7 ones. 
Using SQL and a database this can be done in a very straightforward way
(for example on the variable v1):

Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
>From DF, RULE_v1
Where v1=COD_IN
Group by v2, v3,v4, v5, v6, v7

So the first choice would be using a database; the second one would be
splitting the data frame and then joining the results.
Is there any other possibility to merge+aggregate caused by the merge ?

Thank you in advance

Angelo Linardi



** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e 
non 
comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che 
cio' non 
sia espressamente previsto da un accordo scritto.
Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La 
preghiamo di 
comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. 
La 
informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi 
allegati 
potrebbe costituire reato. Grazie per la collaborazione.
-- E-mails from the Bank of Italy are sent in good faith but they are neither 
binding on 
the Bank nor to be understood as creating any obligation on its part except 
where 
provided for in a written agreement. This e-mail is confidential. If you have 
received it 
by mistake, please inform the sender by reply e-mail and delete it from your 
system. 
Please also note that the unauthorized disclosure or use of the message or any 
attachments could be an offence. Thank you for your cooperation. **

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.