Re: [R] aggregating along bins and bin-quantiles

2008-10-22 Thread Ivan Alves

Dear Mark and all interested,

Unfortunately the code provided by Mark does not work - there is a  
syntax error when run as provided. I looked at possibly solving the  
problem, but without much knowledge of the output of split (looks  
like a list of lists, and not a list of data frames), it is difficult  
to identify where in the call to lapply the problem arises. The  
problem both in Mark's code and my original (with tapply) is on the  
format of the output of the call to an implicit loop.  In fact I find  
this area of R one of the most obscure to my simplistic way of  
thinking (I would expect the output to have the same format as the  
input (data.frame to data.frame), but I am certain there must be good  
reasons for the way implicit loop functions return what they do).


Any further help would be appreciated, as I may have to resort to some  
(less elegant) loop...


Kind regards,
Ivan

On 22 Oct 2008, at 00:22, [EMAIL PROTECTED] wrote:
Hi Ivan: I think I understand better so below is some new code  but  
I'm still not totally sure that it's what you want. If not, then I  
think it brings you closer anyway ? the split function is very  
useful and I think that's what you need. let me know if below is  
what you needed.
if it's close but not quite right, i can look at it again. it's not  
a problem. if i'm totally off, maybe you should resend to the list  
because that means I probably can't fix it.


#= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
==
a - read.csv ( file = /opt/mark/research/equity/projects/R_mails/ 
example.csv , colClasses = c ( Date , numeric ) ) #beware of  
the path


# SPLIT BY DATE
# TO CREATE A LIST OF
# DATAFRAMES
DFlist - split(a,a$Date)
print(str(DFlist))

# USE LAPPLY TO CALL cut AND
# THEN aggregate ON EACH COMPONENT
# DATAFRAME IN THE LIST
tempresult - lapply(DFlist,function(.df) {
 .df$quantile - cut(.df$value,breaks=quantile(.df 
$value,probs=seq(0,1,0.1),na.rm=TRUE))

  aggregate(.df$value,list(DATE=.df$Date,QUANTILE=.df$quantile),sum)
})

# CHECK IF IT WORKED
print(tempresult)

# RBIBND EVERYTHING BACK TOGETHER
# SO THAT ITS ONE DATAFRAME
finalresult - do.call(rbind,tempresult)
print(finalresult)




On Tue, Oct 21, 2008 at  5:47 PM, Ivan Alves wrote:


Hello Mark,
Many thanks for the reply.  Your suggestion is essentially  
equivalent to my first attempt: the quantiles are estimated for the  
WHOLE of the a.value column.  Essentially what I would need is to  
first break down the value column by bins determined by the  
a.date column and THEN estimate the quantile for each bin.  you  
see, I would need the quantiles for each data entry, not for all  
the entries, thus if there are 12 dates (or bins), then I would  
need 12x#10 deciles, not just 10.

Kind regards,
Ivan

On 21 Oct 2008, at 22:20, [EMAIL PROTECTED] wrote:

Hi: I still wasn't very clear on what you wanted but that might be  
because i didn't save your original email ? I doubt that below
helps. i used cut instead of cut2 because I didn't have Hmisc  
loaded and I think cut does what you want ? Jim will probably  
later with a better answer.
He's the real expert with this type of thing. I just like to  
practice.


a - read.csv ( file = /opt/mark/research/equity/projects/ 
R_mails/ example.csv , colClasses = c ( Date , numeric ) )
a$quantile - cut(a$value,breaks=quantile(a  
$value,probs=seq(0,1,0.1),na.rm=TRUE))

aggregate(a$value,list(DATE=a$Date,QUANTILE=a$quantile),sum)


On 21 Oct 2008, at 09:25, Ivan Alves wrote:


Dear all,

Thanks to Jim and Mark for suggesting including the reproducible  
code.  Please note that the enclosed file would need to go to into  
the home folder or that the path for reading the CSV file be  
changed.  I hope no encoding issues emerge when reading it.


And the code

library(Hmisc) #need the cut2 function to mark the quantile a given  
line belongs to
a - read.csv(file = ~/example.csv,  
colClasses=c(Date,numeric)) #beware of the path

dim(a) #should give [1] 50762
aggregate(a$value, list(Date = a[,Date],Quantile=cut2(a 
$value,g=10)),sum) #should give the sum by year but on the quantiles  
for the whole population
aggregate(a$value, list(Date = a[,Date],Quantile=tapply(a 
$value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below


Once again, many thanks for any help
Ivan

On 21 Oct 2008, at 02:40, jim holtman wrote:


PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You need to at least post a subset of your data so that we can
understand the data structures that you are using.  'dput' will  
create

an easily readable format for posting your data (much easier than if
you post the listing of a table).  Usually it is some 'type mismatch'
which says you really have to have the data to run the script  
against.


On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves [EMAIL PROTECTED] wrote:

Dear all,

I would like 

Re: [R] aggregating along bins and bin-quantiles

2008-10-21 Thread Ivan Alves

Dear all,

Thanks to Jim and Mark for suggesting including the reproducible  
code.  Please note that the enclosed file would need to go to into the  
home folder or that the path for reading the CSV file be changed.  I  
hope no encoding issues emerge when reading it.


And the code

library(Hmisc) #need the cut2 function to mark the quantile a given  
line belongs to
a - read.csv(file = ~/example.csv, colClasses=c(Date,numeric))  
#beware of the path

dim(a) #should give [1] 50762
aggregate(a$value, list(Date = a[,Date],Quantile=cut2(a 
$value,g=10)),sum) #should give the sum by year but on the quantiles  
for the whole population
aggregate(a$value, list(Date = a[,Date],Quantile=tapply(a 
$value,use.filter$Date,cut2,g=10)),sum) #gives error mentioned below


Once again, many thanks for any help
Ivan

On 21 Oct 2008, at 02:40, jim holtman wrote:


PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

You need to at least post a subset of your data so that we can
understand the data structures that you are using.  'dput' will create
an easily readable format for posting your data (much easier than if
you post the listing of a table).  Usually it is some 'type mismatch'
which says you really have to have the data to run the script against.

On Mon, Oct 20, 2008 at 6:38 PM, Ivan Alves [EMAIL PROTECTED] wrote:

Dear all,

I would like to aggregate a data frame (consisting of 2 columns - one
for the bins, say factors, and one for the values) along bins and
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=cut2(data.frame$bin,g=10)),sum)

but then the quantiles apply to the population as a whole and not the
individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=tapply(data.frame$values,data.frame 
$bin,cut2,g=10)),sum)


which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I
believe the error stems either from a. the output of tapply being a
list of a dimension equal to the number of bins, and not a list of
equal dimension as the values, or b. that somehow aggregate does not
like that the second list (of the quantiles within the bins are not
sorted nicely)

1. Do you have a reference for doing the summation on both bins and
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] aggregating along bins and bin-quantiles

2008-10-20 Thread Ivan Alves
Dear all,

I would like to aggregate a data frame (consisting of 2 columns - one  
for the bins, say factors, and one for the values) along bins and  
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame 
$bin,Quantile=cut2(data.frame$bin,g=10)),sum)

but then the quantiles apply to the population as a whole and not the  
individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame 
$bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum)

which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
   'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I  
believe the error stems either from a. the output of tapply being a  
list of a dimension equal to the number of bins, and not a list of  
equal dimension as the values, or b. that somehow aggregate does not  
like that the second list (of the quantiles within the bins are not  
sorted nicely)

1. Do you have a reference for doing the summation on both bins and  
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong  
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregating along bins and bin-quantiles

2008-10-20 Thread Ivan Alves
Apologies, just a typo in the first instruction (when translating the  
names), the question is still valid


On 21 Oct 2008, at 00:38, Ivan Alves wrote:


Dear all,

I would like to aggregate a data frame (consisting of 2 columns - one
for the bins, say factors, and one for the values) along bins and
quantiles within the bins.

I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=cut2(data.frame$values,g=10)),sum)

but then the quantiles apply to the population as a whole and not the
individual bins. Upon this realisation I have tried

aggregate(data.frame$values, list(bin = data.frame
$bin,Quantile=tapply(data.frame$values,data.frame$bin,cut2,g=10)),sum)

which gives the following error:

Error in sort.list(unique.default(x), na.last = TRUE) :
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

clearly I am doing something wrong, but cannot figure out what.  I
believe the error stems either from a. the output of tapply being a
list of a dimension equal to the number of bins, and not a list of
equal dimension as the values, or b. that somehow aggregate does not
like that the second list (of the quantiles within the bins which do  
not  appear to be

sorted nicely)

1. Do you have a reference for doing the summation on both bins and
quantiles within the bins?
2. If not, can you give me some guidance as to what I am doing wrong
and how I can solve the sort/list issue?

Any help would be greatly appreciated

Kind regards,

Ivan Alves


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.