Re: [sqlite] Optimization of equality comparison when NULL involved
--- Sean Cunningham <[EMAIL PROTECTED]> wrote: > I have very large datasets and have found that the built in union, > intersect, and except operations do not seem to use indices > (would be happy to be proven wrong here). As such, they > are not very speedy with large large data sets. A patch to speed up queries on a view (or a subquery) with compound selects: http://www.sqlite.org/cvstrac/tktview?tn=1924 http://marc.info/?l=sqlite-users&m=117958960408282 Need a vacation? Get great deals to amazing places on Yahoo! Travel. http://travel.yahoo.com/ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Optimization of equality comparison when NULL involved
On 6/18/07, Sean Cunningham <[EMAIL PROTECTED]> wrote: There was talk in the mailing list a while back about creating a new operator that would act as a superset of '==' which would treat NULL==NULL as True. I have seen this in some other database. Anybody know if this is on the roadmap? It wouldn't be hard to write a user-defined function to accomplish this. It naturally wouldn't be an infix operator. -scott - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Optimization of equality comparison when NULL involved
On Jun 15, 2007, at 5:27 PM, Scott Hess wrote: select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value=tableB.value union select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value IS NULL AND tableB.value IS NULL; I think I oversimplified the problem a little bit. In the example I gave, there were just two columns. In the problem I have to solve, there are n columns, and any item in any column might be NULL. The above approach will work well with just two columns, but with multiple columns you would have to take an iterative approach; building intermediate results and intersecting with the first column. What I am trying to do is build efficient set operations. I have very large datasets and have found that the built in union, intersect, and except operations do not seem to use indices (would be happy to be proven wrong here). As such, they are not very speedy with large large data sets. If you really can treat null as '', then you might be better off defining the column as NOT NULL DEFAULT ''. I am leaning toward avoid NULL and using some other token which I can interpret as "no data". Not an ideal solution, but should scale. There was talk in the mailing list a while back about creating a new operator that would act as a superset of '==' which would treat NULL==NULL as True. I have seen this in some other database. Anybody know if this is on the roadmap? - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Optimization of equality comparison when NULL involved
> select tableA.path, tableA.value from tableA,tableB where > tableA.path=tableB.path and (tableA.value=tableB.value or > (tableA.value IS NULL AND tableB.value IS NULL)); > > It's possible that won't use an index, either, due to the OR, in which > case you could try a union between a select with is-null only, and > another with equality, something like: > > select tableA.path, tableA.value from tableA,tableB where > tableA.path=tableB.path and tableA.value=tableB.value union > select tableA.path, tableA.value from tableA,tableB where > tableA.path=tableB.path and tableA.value IS NULL AND tableB.value IS > NULL; Even though this query has no OR operation: select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value=tableB.value it has to do a complete pass over one of the tables anyway since it does not have a constrained value (or at least a full pass over an index). So I think in this case the first query with the OR listed above is more efficient since it would do just a single pass over one of the tables instead of 2 passes (one for each SELECT in the UNION). I've been thinking of heuristics to rewrite SELECTs with ORs using UNIONs, which is pretty straightforward, but knowing when not to perform this optimization is the tricky part. Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Optimization of equality comparison when NULL involved
You can use something like: select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and (tableA.value=tableB.value or (tableA.value IS NULL AND tableB.value IS NULL)); It's possible that won't use an index, either, due to the OR, in which case you could try a union between a select with is-null only, and another with equality, something like: select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value=tableB.value union select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value IS NULL AND tableB.value IS NULL; If you really can treat null as '', then you might be better off defining the column as NOT NULL DEFAULT ''. -scott On 6/15/07, Sean Cunningham <[EMAIL PROTECTED]> wrote: I am hoping there is an obvious answer to this that I've overlooked. I have two tables: create table tableA (path TEXT, value TEXT); create index myIndexA on tableA (path, value); create table tableB(path TEXT, value TEXT); create index myIndexB on tableB (path, value); Now some simple inserts: insert into tableA(path,value) values('alpha', '1'); insert into tableA(path,value) values('beta', '2'); insert into tableB(path,value) values('alpha', '1'); insert into tableB(path,value) values('beta', '999'); The following select statement gets what you'd expect, and it uses the index. sqlite> select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value=tableB.value; alpha|1 gamma|3 However, if your data has NULL's: insert into tableA(path,value) values('gamma', NULL); insert into tableB(path,value) values('gamma', NULL); The same select above neglects to report the tuple ('gamma',NULL) as the equal. This is correct given SQL's treatment of NULL, and is easily fixed: sqlite> select tableA.path, tableA.value from tableA,tableB where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull (tableA.value,'')=ifnull(tableB.value,''); alpha|1 gamma|3 gamma| However, the above statement has the unfortunate side effect of bypassing the index: sqlite> explain select tableA.path, tableA.value from tableA,tableB where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull (tableA.value,'')=ifnull(tableB.value,''); 0|Goto|0|35| 1|Integer|0|0| 2|OpenRead|0|2| 3|SetNumColumns|0|2| 4|Integer|0|0| 5|OpenRead|1|4| 6|SetNumColumns|1|2| 7|Rewind|0|32| 8|Rewind|1|31| 9|Column|0|0| 10|String8|0|0| 11|CollSeq|0|0|collseq(BINARY) 12|Function|2|2|ifnull(2) 13|Column|1|0| 14|String8|0|0| 15|CollSeq|0|0|collseq(BINARY) 16|Function|2|2|ifnull(2) 17|Ne|28417|30| 18|Column|0|1| 19|String8|0|0| 20|CollSeq|0|0|collseq(BINARY) 21|Function|2|2|ifnull(2) 22|Column|1|1| 23|String8|0|0| 24|CollSeq|0|0|collseq(BINARY) 25|Function|2|2|ifnull(2) 26|Ne|28417|30| 27|Column|0|0| 28|Column|0|1| 29|Callback|2|0| 30|Next|1|9| 31|Next|0|8| 32|Close|0|0| 33|Close|1|0| 34|Halt|0|0| 35|Transaction|0|0| 36|VerifyCookie|0|4| 37|Goto|0|1| 38|Noop|0|0| My question is:Is there another way to write such a select statement which can solve both problems of treating NULL==NULL and using the index. Thanks, Sean - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Optimization of equality comparison when NULL involved
In SQLite null is not equal to anything, including null. I'm not sure what the best solution for your application is. With help from the others here, I have learned to use coalesce() to convert nulls into a value, and to not allow null in key fields. select tableA.path, tableA.value from tableA, tableB where tableA.path = tableB.path and coalesce(tableA.value, '' ) = coalesce(tableB.value, '' ); yields alpha|1 gamma| And it appears to use the myIndexB index for the join. -Clark - Original Message From: Sean Cunningham <[EMAIL PROTECTED]> To: sqlite-users@sqlite.org Sent: Friday, June 15, 2007 1:26:49 PM Subject: [sqlite] Optimization of equality comparison when NULL involved I am hoping there is an obvious answer to this that I've overlooked. I have two tables: create table tableA (path TEXT, value TEXT); create index myIndexA on tableA (path, value); create table tableB(path TEXT, value TEXT); create index myIndexB on tableB (path, value); Now some simple inserts: insert into tableA(path,value) values('alpha', '1'); insert into tableA(path,value) values('beta', '2'); insert into tableB(path,value) values('alpha', '1'); insert into tableB(path,value) values('beta', '999'); The following select statement gets what you'd expect, and it uses the index. sqlite> select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value=tableB.value; alpha|1 gamma|3 However, if your data has NULL's: insert into tableA(path,value) values('gamma', NULL); insert into tableB(path,value) values('gamma', NULL); The same select above neglects to report the tuple ('gamma',NULL) as the equal. This is correct given SQL's treatment of NULL, and is easily fixed: sqlite> select tableA.path, tableA.value from tableA,tableB where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull (tableA.value,'')=ifnull(tableB.value,''); alpha|1 gamma|3 gamma| However, the above statement has the unfortunate side effect of bypassing the index: sqlite> explain select tableA.path, tableA.value from tableA,tableB where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull (tableA.value,'')=ifnull(tableB.value,''); 0|Goto|0|35| 1|Integer|0|0| 2|OpenRead|0|2| 3|SetNumColumns|0|2| 4|Integer|0|0| 5|OpenRead|1|4| 6|SetNumColumns|1|2| 7|Rewind|0|32| 8|Rewind|1|31| 9|Column|0|0| 10|String8|0|0| 11|CollSeq|0|0|collseq(BINARY) 12|Function|2|2|ifnull(2) 13|Column|1|0| 14|String8|0|0| 15|CollSeq|0|0|collseq(BINARY) 16|Function|2|2|ifnull(2) 17|Ne|28417|30| 18|Column|0|1| 19|String8|0|0| 20|CollSeq|0|0|collseq(BINARY) 21|Function|2|2|ifnull(2) 22|Column|1|1| 23|String8|0|0| 24|CollSeq|0|0|collseq(BINARY) 25|Function|2|2|ifnull(2) 26|Ne|28417|30| 27|Column|0|0| 28|Column|0|1| 29|Callback|2|0| 30|Next|1|9| 31|Next|0|8| 32|Close|0|0| 33|Close|1|0| 34|Halt|0|0| 35|Transaction|0|0| 36|VerifyCookie|0|4| 37|Goto|0|1| 38|Noop|0|0| My question is:Is there another way to write such a select statement which can solve both problems of treating NULL==NULL and using the index. Thanks, Sean - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Optimization of equality comparison when NULL involved
I am hoping there is an obvious answer to this that I've overlooked. I have two tables: create table tableA (path TEXT, value TEXT); create index myIndexA on tableA (path, value); create table tableB(path TEXT, value TEXT); create index myIndexB on tableB (path, value); Now some simple inserts: insert into tableA(path,value) values('alpha', '1'); insert into tableA(path,value) values('beta', '2'); insert into tableB(path,value) values('alpha', '1'); insert into tableB(path,value) values('beta', '999'); The following select statement gets what you'd expect, and it uses the index. sqlite> select tableA.path, tableA.value from tableA,tableB where tableA.path=tableB.path and tableA.value=tableB.value; alpha|1 gamma|3 However, if your data has NULL's: insert into tableA(path,value) values('gamma', NULL); insert into tableB(path,value) values('gamma', NULL); The same select above neglects to report the tuple ('gamma',NULL) as the equal. This is correct given SQL's treatment of NULL, and is easily fixed: sqlite> select tableA.path, tableA.value from tableA,tableB where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull (tableA.value,'')=ifnull(tableB.value,''); alpha|1 gamma|3 gamma| However, the above statement has the unfortunate side effect of bypassing the index: sqlite> explain select tableA.path, tableA.value from tableA,tableB where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull (tableA.value,'')=ifnull(tableB.value,''); 0|Goto|0|35| 1|Integer|0|0| 2|OpenRead|0|2| 3|SetNumColumns|0|2| 4|Integer|0|0| 5|OpenRead|1|4| 6|SetNumColumns|1|2| 7|Rewind|0|32| 8|Rewind|1|31| 9|Column|0|0| 10|String8|0|0| 11|CollSeq|0|0|collseq(BINARY) 12|Function|2|2|ifnull(2) 13|Column|1|0| 14|String8|0|0| 15|CollSeq|0|0|collseq(BINARY) 16|Function|2|2|ifnull(2) 17|Ne|28417|30| 18|Column|0|1| 19|String8|0|0| 20|CollSeq|0|0|collseq(BINARY) 21|Function|2|2|ifnull(2) 22|Column|1|1| 23|String8|0|0| 24|CollSeq|0|0|collseq(BINARY) 25|Function|2|2|ifnull(2) 26|Ne|28417|30| 27|Column|0|0| 28|Column|0|1| 29|Callback|2|0| 30|Next|1|9| 31|Next|0|8| 32|Close|0|0| 33|Close|1|0| 34|Halt|0|0| 35|Transaction|0|0| 36|VerifyCookie|0|4| 37|Goto|0|1| 38|Noop|0|0| My question is:Is there another way to write such a select statement which can solve both problems of treating NULL==NULL and using the index. Thanks, Sean - To unsubscribe, send email to [EMAIL PROTECTED] -