Re: [sqlite] use of parentheses when not arithmetically or logically required
Simon, Thank you for the optimization. I'll switch to using between. This particular database is used for generating some statistics and generating reports, so performance is not that big an issue. When you talk about "clever use of an index", are you referring to a combined index or merely indices on each column? Thanks for your input! -- View this message in context: http://www.nabble.com/use-of-parentheses-when-not-arithmetically-or-logically-required-tp25033803p25035025.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] use of parentheses when not arithmetically or logically required
I have searched the list and understand the need for parentheses when required by logical or arithmetic operations or subselects. My question is if it makes a difference to use parentheses when not logically required, as in the following example where the only logical used is "and": select * from log where (response >= 200 and response < 300) and (port >= 8444 and port <= 8459) Is there any difference in execution or performance if the parentheses are present or not? I'm guessing not, but would like confirmation. select * from log where response >= 200 and response < 300 and port >= 8444 and port <= 8459 Thanks! -- View this message in context: http://www.nabble.com/use-of-parentheses-when-not-arithmetically-or-logically-required-tp25033803p25033803.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] 3 million rows, query speeds, and returning zero for rows that don't exist
I'm using sqlite to do some analysis on very large web application log files - approaching 3 million lines per day. And what a wonderful tool it is! It has saved me from writing lots of custom scripts. I have a perl script that parses an rsynced copy of this huge log file, munges, converts, and adds new entries to my database. The schema currently looks something like this - some of the columns are there as a aid to development/debugging so I can see what's going on: CREATE TABLE 'log' ( ip VARCHAR, datetimestring VARCHAR COLLATE NOCASE, uid VARCHAR COLLATE NOCASE, timestamp INTEGER, operation VARCHAR COLLATE NOCASE, target VARCHAR, response INTEGER, client VARCHAR COLLATE NOCASE, completion VARCHAR COLLATE NOCASE, port INTEGER, connections INTEGER, date INTEGER, hour INTEGER, min INTEGER, sec INTEGER ) I've been experimenting with indices as far as when the tradeoff in size and speed of adding data is offset sufficiently by how fast I need to get the results. For testing purposes, I created indexes for all the columns used in the selects below. Most of the queries I've done so far have been pretty straightforward and it's worked very well. But, now I need to do one that's taking too long. There's probably a better way than the one I'm using. The problem is that I need to produce information for the timeslots that are missing as well as those that are there. For example, I need to be able to evaluate the effect of different load balancers, so I want to compare the numbers of transactions on all the ports. I need to do so on a second by second basis so I can see the most detail and not an average. I thought one way do do this would be with a self-join because there are so many transactions that I'm pretty sure that all seconds will be present in the table. I also considered creating a second table derived from the first whenever it's updated that would have a unix epoch entry for each second within the log file. Here's what I tried for the first approach: select distinct t.timestamp, t.port, count(u.timestamp) from log t left join log u on t.timestamp = u.timestamp and t.timestamp >= 1246266000 and u.uid != "-" and (u.response = 207 or u.response = 200) and u.port = 8444; The purpose of the join is to give me all of the timestamps - one for each second - even those seconds that had no activity on that port - and then use those for the query. But this query is taking a very, very, very long time. I'm probably making some newbie mistake, because that's what I am! Is this a valid approach? Is there a better solution? I am able to get results extremely quickly by using something like this: select date, hour, min, sec, count(port) from log where uid != "-" and (response = 207 or response = 200) and port = 8444 group by timestamp order by timestamp; but the problem is that I also need to know when the ports are not busy and there are no log entries. Here are the last few lines of the result: 2009-06-29|17|42|0|2 2009-06-29|17|42|7|1 2009-06-29|17|42|8|4 What I need to have in the results are entries for seconds 1-6 with a value of 0, like this: 2009-06-29|17|42|0|2 2009-06-29|17|42|1|0 2009-06-29|17|42|2|0 2009-06-29|17|42|3|0 2009-06-29|17|42|4|0 2009-06-29|17|42|5|0 2009-06-29|17|42|6|0 2009-06-29|17|42|7|1 2009-06-29|17|42|8|4 Am I on the right path with the self-join? I also considered adding a new table to the database that contained all the possible time values in epoch seconds - I could easily do so with the script that creates the database and adds the data. I could then left join this table with the data table. But would doing so be substantially faster than the self-join? The examples above are for only one port - I need to do the same for 16 ports. I wrote a perl script to do queries for each second, but it was taking about 1 minute to do queries for the 16 ports for a 1 second interval - way too slow to be used to produce results for 24 hour periods. Thanks for any advice! I learn a lot from this list. Leo -- View this message in context: http://www.nabble.com/3-million-rows%2C-query-speeds%2C-and-returning-zero-for-rows-that-don%27t-exist-tp24299431p24299431.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] how do I use more than one column as a "combined key"?
SQL newbie here. Sqlite has been a fantastic tool for analyzing, comparing and correcting lots of account information. But I've hit a roadblock because I don't know how to treat multiple columns as a kind of combined key. I need to know how to relate two tables on multiple columns. It's been easy when I only had to relate using a single column. The tables are pretty big - 20,000+ entries. User Accounts name uid server login . . . Calendar Accounts server login firstname lastname . . . What I need to be able to do is to check that the server and login information in the first table matches one and only one of the accounts in the second table. What I'm having trouble figuring out is how to use the server and login as a combined key. When I have a single unique value that I can relate, like a DS ID, I know how to do that. How can I query to find out which user accounts don't match up with one and only one calendar accounts? And which calendar accounts are not associated with a single user account? Thanks for any help you can provide. Sqlite is the perfect tool for this kind of stuff! -- View this message in context: http://www.nabble.com/how-do-I-use-more-than-one-column-as-a-%22combined-key%22--tp23510319p23510319.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] How do I do this join on multiple columns?
SQL newbie here. I have data in a couple of tables that I need to relate. But I don't know how to relate on more than one column. I'll not go into detail about why the data is formed the way it is - it comes from other programs. For example, let's say there are a couple of tables: server login name last_login email ... and we have a second table that contains columns for server and login: server login How can I, for example, select rows of the first table where their server and login, combined, are NOT in the second table. It's sorta like a composite key. Do I do it with a couple of joins? Could I see an example? If it were only a single column I could use a "not in" subselect, but I'm not sure how to do it when it involves more than one column. Thanks for any help! -- View this message in context: http://www.nabble.com/How-do-I-do-this-join-on-multiple-columns--tp22301000p22301000.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users