Re: [R] translating SQL statements into data.table operations
Nick, Good question, but just sent to the wrong place. The posting guide asks you to contact the package maintainer first before posting to r-help only if you don't hear back. I guess one reason for that is that if questions about all 2000+ packages were sent to r-help, then r-help's traffic could go through the roof. Another reason could be that some (i.e. maybe many, maybe few) package maintainers don't actually monitor r-help and might miss any messages you post here. I only saw this one thanks to google alerts. Since I'm writing anyway ... are you using the latest version on r-forge which has the very fast grouping? Have you set multi-column keys on both edt and cdt and tried edt[cdt,roll=TRUE] syntax ? We'll help you off list to climb the learning curve quickly. We are working on FAQs and a vignette and they should be ready soon too. Please do follow up with us (myself and Tom Short cc'd are the main developers) off list and one of us will be happy to help further. Matthew Nick Switanek nswita...@gmail.com wrote in message news:772ec1011003241351v6a3f36efqb0b0787564691...@mail.gmail.com... I've recently stumbled across data.table, Matthew Dowle's package. I'm impressed by the speed of the package in handling operations with large data.frames, but am a bit overwhelmed with the syntax. I'd like to express the SQL statement below using data.table operations rather than sqldf (which was incredibly slow for a small subset of my financial data) or import/export with a DBMS, but I haven't been able to figure out how to do it. I would be grateful for your suggestions. nick My aim is to join events (trades) from two datasets (edt and cdt) where, for the same stock, the events in one dataset occur between 15 and 75 days before the other, and within the same time window. I can only see how to express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm also at a loss at whether I can express the remainder using data.table's %between% operator or not. ctqm - sqldf(SELECT e.*, c.DATE 'DATEctrl', c.TIME 'TIMEctrl', c.PRICE 'PRICEctrl', c.SIZE 'SIZEctrl' FROM edt e, ctq c WHERE e.SYMBOL = c.SYMBOL AND julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND 75 AND strftime('%H:%M:%S',c.TIME) BETWEEN strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] translating SQL statements into data.table operations
I've recently stumbled across data.table, Matthew Dowle's package. I'm impressed by the speed of the package in handling operations with large data.frames, but am a bit overwhelmed with the syntax. I'd like to express the SQL statement below using data.table operations rather than sqldf (which was incredibly slow for a small subset of my financial data) or import/export with a DBMS, but I haven't been able to figure out how to do it. I would be grateful for your suggestions. nick My aim is to join events (trades) from two datasets (edt and cdt) where, for the same stock, the events in one dataset occur between 15 and 75 days before the other, and within the same time window. I can only see how to express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm also at a loss at whether I can express the remainder using data.table's %between% operator or not. ctqm - sqldf(SELECT e.*, c.DATE 'DATEctrl', c.TIME 'TIMEctrl', c.PRICE 'PRICEctrl', c.SIZE 'SIZEctrl' FROM edt e, ctq c WHERE e.SYMBOL = c.SYMBOL AND julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND 75 AND strftime('%H:%M:%S',c.TIME) BETWEEN strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] translating SQL statements into data.table operations
Note that, in general, you can speed up joins, even within sqldf, by adding indexes to your tables and ensuring that your select statement is written in such a way that the indexes are used. See example 4i on the sqldf home page. On Wed, Mar 24, 2010 at 4:51 PM, Nick Switanek nswita...@gmail.com wrote: I've recently stumbled across data.table, Matthew Dowle's package. I'm impressed by the speed of the package in handling operations with large data.frames, but am a bit overwhelmed with the syntax. I'd like to express the SQL statement below using data.table operations rather than sqldf (which was incredibly slow for a small subset of my financial data) or import/export with a DBMS, but I haven't been able to figure out how to do it. I would be grateful for your suggestions. nick My aim is to join events (trades) from two datasets (edt and cdt) where, for the same stock, the events in one dataset occur between 15 and 75 days before the other, and within the same time window. I can only see how to express the WHERE e.SYMBOL = c.SYMBOL part in data.table syntax. I'm also at a loss at whether I can express the remainder using data.table's %between% operator or not. ctqm - sqldf(SELECT e.*, c.DATE 'DATEctrl', c.TIME 'TIMEctrl', c.PRICE 'PRICEctrl', c.SIZE 'SIZEctrl' FROM edt e, ctq c WHERE e.SYMBOL = c.SYMBOL AND julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND 75 AND strftime('%H:%M:%S',c.TIME) BETWEEN strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.