Re: [R] do I need plyr, apply or something else?

2012-07-12 Thread Russell Bowdrey

Michael, Mikhail

Many thanks for your helpful comments. My faith in community support continues 
to grow.

Michael: I'm looking to use some sort of flexible spline-like fit 
(smooth.spline, lowess etc).


Many thanks for sharing your expertise. I actually cross posted this on to the 
"manipulatr" google group, here is the response from Peter Meilstrup:

" For (1) you might want to take a look at rollapply() and related functions in 
the zoo package.
for (2), don't put the different samples of your curve fit into different 
columns. Instead imagine generating a data frame with three columns:

bae.date (date each your fit is based around) 
prediction.date (date you are extrapolating to)
preciction (the fitted value)

so if you have 100 dates, and generate a 7 point curve from each date, you end 
up with 700 rows."

As ever time pressures kind of dictate that I start from what I know. I've only 
pretty basic database skills at the moment, so will try zoo/TTR first and try 
PostgreSQL if that isn't satisfactory.

-Original Message-
From: Mikhail Titov [mailto:m...@gmx.us] 
Sent: 12 July 2012 00:22
To: R. Michael Weylandt
Cc: Russell Bowdrey; r-help@r-project.org
Subject: Re: do I need plyr, apply or something else?

"R. Michael Weylandt"  writes:

> On Wed, Jul 11, 2012 at 10:05 AM, Russell Bowdrey 
>  wrote:
>>
>> Dear all,
>>
>> This is what I'd like to do (I have an implementation using for 
>> loops, which I designed before I realised just how slow R is at 
>> executing them - this process currently takes days to run).
>>
>> I have a large dataframe containing corporate bond data, columns are:
>> BondID
>> Date (goes back 5years)
>> Var1
>> Var2
>> Term2Maturity
>>
>> What I want to do is this:
>>
>> 1)  For each bond, at each given date, look back over 1 year and append 
>> some statistics to each row ( sd(Var1), cor(Var1,Var2) over that year etc)
>>
>
> Look at the TTR package and the various run** functions. Much faster.
>
>> a.  It seems I might be able to use ddply for this, but I can't work 
>> out how to code the stats function to only look back over one year, 
>> rather than the full data range
>>
>> b.  For example: dfBondsWithCorr<-ddply(dfBonds, .(BondID), 
>> transform,corr=cor(Var1,Var2),.progress="text")
>> returns a dataframe where for each bond it has same corr for each 
>> date
>>
>> 2) On each date, subset dfBondsWithCorr by certain qualification 
>> criteria, then to the qualifiers fit a regression through a Var1 and 
>> Term2Maturity, output the regression as a df of curves (say for each 
>> date, a curve represented by points every 0.5 years)
>>
>> a.  I can do this pretty efficiently for a single date (and I suppose 
>> I could wrap that in a function) , but can't quite see how to do the 
>> filtering and spitting out of curves over multiple dates without 
>> using for loops
>>
>
> This ones harder. For simple linear regressions, you can solve the 
> regression analytically (e.g., slope = runCov / runVar and mean
> similarly) but doing it for more complicated regressions will pretty 
> much require a for loop of one sort or another. Can you say what sort 
> of model you are looking to use?
>
>> Would appreciate any thoughts, many thanks in advance

I feel like PostgreSQL will do the work better. It has support for basic 
statistics [1] and you can use window functions [2] to limit the scope for last 
year only. Then you get your data with RODBC or something.

I suspect you have you data in some sort of DB in the first place. Perhaps it 
has similar features.

[1] 
http://www.postgresql.org/docs/9.1/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE
[2] 
http://www.postgresql.org/docs/9.1/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

--
Mikhail


This email and any attachments are confidential and inte...{{dropped:29}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do I need plyr, apply or something else?

2012-07-11 Thread Mikhail Titov
"R. Michael Weylandt"  writes:

> On Wed, Jul 11, 2012 at 10:05 AM, Russell Bowdrey
>  wrote:
>>
>> Dear all,
>>
>> This is what I'd like to do (I have an implementation using for
>> loops, which I designed before I realised just how slow R is at
>> executing them - this process currently takes days to run).
>>
>> I have a large dataframe containing corporate bond data, columns are:
>> BondID
>> Date (goes back 5years)
>> Var1
>> Var2
>> Term2Maturity
>>
>> What I want to do is this:
>>
>> 1)  For each bond, at each given date, look back over 1 year and append 
>> some statistics to each row ( sd(Var1), cor(Var1,Var2) over that year etc)
>>
>
> Look at the TTR package and the various run** functions. Much faster.
>
>> a.  It seems I might be able to use ddply for this, but I can't work
>> out how to code the stats function to only look back over one year,
>> rather than the full data range
>>
>> b.  For example: dfBondsWithCorr<-ddply(dfBonds, .(BondID), 
>> transform,corr=cor(Var1,Var2),.progress="text")
>> returns a dataframe where for each bond it has same corr for each date
>>
>> 2) On each date, subset dfBondsWithCorr by certain qualification
>> criteria, then to the qualifiers fit a regression through a Var1 and
>> Term2Maturity, output the regression as a df of curves (say for each
>> date, a curve represented by points every 0.5 years)
>>
>> a.  I can do this pretty efficiently for a single date (and I
>> suppose I could wrap that in a function) , but can't quite see how
>> to do the filtering and spitting out of curves over multiple dates
>> without using for loops
>>
>
> This ones harder. For simple linear regressions, you can solve the
> regression analytically (e.g., slope = runCov / runVar and mean
> similarly) but doing it for more complicated regressions will pretty
> much require a for loop of one sort or another. Can you say what sort
> of model you are looking to use?
>
>> Would appreciate any thoughts, many thanks in advance

I feel like PostgreSQL will do the work better. It has support for basic
statistics [1] and you can use window functions [2] to limit the scope
for last year only. Then you get your data with RODBC or something.

I suspect you have you data in some sort of DB in the first
place. Perhaps it has similar features.

[1] 
http://www.postgresql.org/docs/9.1/static/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE
[2] 
http://www.postgresql.org/docs/9.1/interactive/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

-- 
Mikhail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] do I need plyr, apply or something else?

2012-07-11 Thread R. Michael Weylandt
On Wed, Jul 11, 2012 at 10:05 AM, Russell Bowdrey
 wrote:
>
> Dear all,
>
> This is what I'd like to do (I have an implementation using for loops, which 
> I designed before I realised just how slow R is at executing them - this 
> process currently takes days to run).
>
> I have a large dataframe containing corporate bond data, columns are:
> BondID
> Date (goes back 5years)
> Var1
> Var2
> Term2Maturity
>
> What I want to do is this:
>
> 1)  For each bond, at each given date, look back over 1 year and append 
> some statistics to each row ( sd(Var1), cor(Var1,Var2) over that year etc)
>

Look at the TTR package and the various run** functions. Much faster.

> a.   It seems I might be able to use ddply for this, but I can't work out 
> how to code the stats function to only look back over one year, rather than 
> the full data range
>
> b.  For example: dfBondsWithCorr<-ddply(dfBonds, .(BondID), 
> transform,corr=cor(Var1,Var2),.progress="text")
> returns a dataframe where for each bond it has same corr for each date
>
> 2)  On each date, subset dfBondsWithCorr by certain qualification 
> criteria, then to the qualifiers fit a regression through a Var1 and 
> Term2Maturity, output the regression as a df of curves (say for each date, a 
> curve represented by points every 0.5 years)
>
> a.   I can do this pretty efficiently for a single date (and I suppose I 
> could wrap that in a function) , but can't quite see how to do the filtering 
> and spitting out of curves over multiple dates without using for loops
>

This ones harder. For simple linear regressions, you can solve the
regression analytically (e.g., slope = runCov / runVar and mean
similarly) but doing it for more complicated regressions will pretty
much require a for loop of one sort or another. Can you say what sort
of model you are looking to use?

Best,
Michael

> Would appreciate any thoughts, many thanks in advance
>
>
> Russ
>
>
>
> This email and any attachments are confidential and inte...{{dropped:30}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] do I need plyr, apply or something else?

2012-07-11 Thread Russell Bowdrey

Dear all,

This is what I'd like to do (I have an implementation using for loops, which I 
designed before I realised just how slow R is at executing them - this process 
currently takes days to run).

I have a large dataframe containing corporate bond data, columns are:
BondID
Date (goes back 5years)
Var1
Var2
Term2Maturity

What I want to do is this:

1)  For each bond, at each given date, look back over 1 year and append 
some statistics to each row ( sd(Var1), cor(Var1,Var2) over that year etc)

a.   It seems I might be able to use ddply for this, but I can't work out 
how to code the stats function to only look back over one year, rather than the 
full data range

b.  For example: dfBondsWithCorr<-ddply(dfBonds, .(BondID), 
transform,corr=cor(Var1,Var2),.progress="text")
returns a dataframe where for each bond it has same corr for each date

2)  On each date, subset dfBondsWithCorr by certain qualification criteria, 
then to the qualifiers fit a regression through a Var1 and Term2Maturity, 
output the regression as a df of curves (say for each date, a curve represented 
by points every 0.5 years)

a.   I can do this pretty efficiently for a single date (and I suppose I 
could wrap that in a function) , but can't quite see how to do the filtering 
and spitting out of curves over multiple dates without using for loops

Would appreciate any thoughts, many thanks in advance


Russ



This email and any attachments are confidential and inte...{{dropped:30}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.