Re: [R] translating SQL statements into data.table operations

2010-03-25 Thread Matthew Dowle

Nick,

Good question,  but just sent to the wrong place. The posting guide asks you 
to contact the package maintainer first before posting to r-help only if you 
don't hear back. I guess one reason for that is that if questions about all 
2000+ packages were sent to r-help, then r-help's traffic could go through 
the roof.  Another reason could be that some (i.e. maybe many, maybe few) 
package maintainers don't actually monitor r-help and might miss any 
messages you post here.  I only saw this one thanks to google alerts.

Since I'm writing anyway ... are you using the latest version on r-forge 
which has the very fast grouping? Have you set multi-column keys on both edt 
and cdt and tried edt[cdt,roll=TRUE] syntax ?  We'll help you off list to 
climb the learning curve quickly. We are working on FAQs and a vignette and 
they should be ready soon too.

Please do follow up with us (myself and Tom Short cc'd are the main 
developers) off list and one of us will be happy to help further.

Matthew


Nick Switanek nswita...@gmail.com wrote in message 
news:772ec1011003241351v6a3f36efqb0b0787564691...@mail.gmail.com...
 I've recently stumbled across data.table, Matthew Dowle's package. I'm
 impressed by the speed of the package in handling operations with large
 data.frames, but am a bit overwhelmed with the syntax. I'd like to express
 the SQL statement below using data.table operations rather than sqldf 
 (which
 was incredibly slow for a small subset of my financial data) or
 import/export with a DBMS, but I haven't been able to figure out how to do
 it. I would be grateful for your suggestions.

 nick



 My aim is to join events (trades) from two datasets (edt and cdt) 
 where,
 for the same stock, the events in one dataset occur between 15 and 75 days
 before the other, and within the same time window. I can only see how to
 express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm 
 also
 at a loss at whether I can express the remainder using data.table's
 %between% operator or not.

 ctqm - sqldf(SELECT e.*,
 c.DATE 'DATEctrl',
 c.TIME 'TIMEctrl',
 c.PRICE 'PRICEctrl',
 c.SIZE 'SIZEctrl'

 FROM edt e, ctq c

 WHERE e.SYMBOL = c.SYMBOL AND
   julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND
 75 AND
   strftime('%H:%M:%S',c.TIME) BETWEEN
 strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME))

 [[alternative HTML version deleted]]


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] translating SQL statements into data.table operations

2010-03-24 Thread Nick Switanek
I've recently stumbled across data.table, Matthew Dowle's package. I'm
impressed by the speed of the package in handling operations with large
data.frames, but am a bit overwhelmed with the syntax. I'd like to express
the SQL statement below using data.table operations rather than sqldf (which
was incredibly slow for a small subset of my financial data) or
import/export with a DBMS, but I haven't been able to figure out how to do
it. I would be grateful for your suggestions.

nick



My aim is to join events (trades) from two datasets (edt and cdt) where,
for the same stock, the events in one dataset occur between 15 and 75 days
before the other, and within the same time window. I can only see how to
express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm also
at a loss at whether I can express the remainder using data.table's
%between% operator or not.

ctqm - sqldf(SELECT e.*,
 c.DATE 'DATEctrl',
 c.TIME 'TIMEctrl',
 c.PRICE 'PRICEctrl',
 c.SIZE 'SIZEctrl'

 FROM edt e, ctq c

 WHERE e.SYMBOL = c.SYMBOL AND
   julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND
75 AND
   strftime('%H:%M:%S',c.TIME) BETWEEN
strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME))

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] translating SQL statements into data.table operations

2010-03-24 Thread Gabor Grothendieck
Note that, in general, you can speed up joins, even within sqldf, by
adding indexes to your tables and ensuring that your select statement
is written in such a way that the indexes are used.  See example 4i on
the sqldf home page.

On Wed, Mar 24, 2010 at 4:51 PM, Nick Switanek nswita...@gmail.com wrote:
 I've recently stumbled across data.table, Matthew Dowle's package. I'm
 impressed by the speed of the package in handling operations with large
 data.frames, but am a bit overwhelmed with the syntax. I'd like to express
 the SQL statement below using data.table operations rather than sqldf (which
 was incredibly slow for a small subset of my financial data) or
 import/export with a DBMS, but I haven't been able to figure out how to do
 it. I would be grateful for your suggestions.

 nick



 My aim is to join events (trades) from two datasets (edt and cdt) where,
 for the same stock, the events in one dataset occur between 15 and 75 days
 before the other, and within the same time window. I can only see how to
 express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm also
 at a loss at whether I can express the remainder using data.table's
 %between% operator or not.

 ctqm - sqldf(SELECT e.*,
                 c.DATE 'DATEctrl',
                 c.TIME 'TIMEctrl',
                 c.PRICE 'PRICEctrl',
                 c.SIZE 'SIZEctrl'

                 FROM edt e, ctq c

                 WHERE e.SYMBOL = c.SYMBOL AND
                       julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND
 75 AND
                       strftime('%H:%M:%S',c.TIME) BETWEEN
 strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME))

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.