Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness
On 03/23/2011 01:07 AM, Jay A. Kreibich wrote: > On Tue, Mar 22, 2011 at 06:25:04PM +0700, Dan Kennedy scratched on the wall: > >> SQLite assumes that the result of each expression in the WHERE >> clause depends only on its inputs. If the input arguments are >> the same, the output should be do. Since random() has no inputs, >> SQLite figures that it must always return the same value. > >To what degree? And expression like "...WHERE 20<= (random()%100)" >has no "inputs" other than constants, but is still evaluated once per >row. Or is it just raw functions and column references, and not the >expression as a whole? I think once you are trying to predict how many times or exactly when a user function will be called for a given SQL statement you are technically into the realms of undefined behaviour. And again, technically, SQLite assumes that the value returned by a user-defined function are a function of its inputs. Once instance of where this assumption is used is with virtual tables. If you do: SELECT * FROM vtab WHERE col = userfunction(); and the xBestIndex() method says it can handle "col = ?" but does not set the corresponding "aConstraintUsage[x].omit" flag, SQLite will evaluate userfunction() once to pass to the xFilter method, and then again for each row visited by the virtual table cursor. If the result of userfunction() is not stable, the query could return difficult to explain results. I think there might be other such examples too. Left joins. Where clauses that include OR operators. That sort of thing. That said, we're aware of the way random() and user-functions with side-effects are often used. I don't think it's something that would get changed capriciously. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness
On Tue, Mar 22, 2011 at 06:25:04PM +0700, Dan Kennedy scratched on the wall: > SQLite assumes that the result of each expression in the WHERE > clause depends only on its inputs. If the input arguments are > the same, the output should be do. Since random() has no inputs, > SQLite figures that it must always return the same value. To what degree? And expression like "...WHERE 20 <= (random()%100)" has no "inputs" other than constants, but is still evaluated once per row. Or is it just raw functions and column references, and not the expression as a whole? Either way, it would seem a function with no inputs should always be considered non-constant. Unless someone writes a function that boils down to func(){return VALUE;}, it is very likely the function references some external value or state, and is unlikely to return the same value. I realize that most systems will only evaluates random() once, even in a larger expression, but I've always found it nice that SQLite did not in expressions like the one I gave. It makes it much easier to sample data. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness
On Tue, Mar 22, 2011 at 2:25 PM, Dan Kennedywrote: > On 03/22/2011 04:26 PM, Max Vlasov wrote: > > Hi, > > > > recently I finally started experimenting with virtual tables and there's > at > > least one thing I can not understand. > > > > As I see xBestIndex/xFilter were developed to allow fast searching if the > > implementation is able to do this. But there's also sql language that > allows > > very exotic queries. Some of them may be recognized by the > implementation, > > some not. If the former, one just can rely on sqlite double checking and > > just do full scan. But there are also cases when it looks like > recognition > > is not possible. For example > > > > SELECT * FROM vtest where id> random() > > > > in this case xBestIndex just assumes some constant as the expression, so > the > > one who implements just can't detect probably unresolved query and thinks > > that it can search quickly (binary search, for example). The call to > xFilter > > just passes first random value and sqlite will never call it again for > the > > same enumeration. So xFilter thinks this is the constant value used in > the > > query and jumps to the first correct row row never planning to jump back. > > But this is actually a misleading action since in real world sqlite calls > > random on every row and the rows bypassed are actually important and can > be > > evaluated to true. I mentioned random(), but there may be other cases, > for > > example when other fields are part of expressions. > > SQLite assumes that the result of each expression in the WHERE > clause depends only on its inputs. If the input arguments are > the same, the output should be do. Since random() has no inputs, > SQLite figures that it must always return the same value. > > You can see a similar effect with: > > CREATE TABLE t1(a PRIMARY KEY, b); > SELECT * FROM t1 WHERE a > random(); -- random() evaluated once. > SELECT * FROM t1 WHERE +a > random(); -- random() evaluated many times > Dan, thanks, I double-checked your information and (ironically) I see that the problem is with "the double check" :) As I see now, sqlite does a great job that probably won't require any additional steps for the problem I posted. So if the expression is not "simple" in the terms I used, it just won't supply any constraint to xBestIndex so automatically forcing full-scan. But if the double-check is on, sqlite seems like actually checks random() for every result row and this actually can give non-correct result. Although I can not confirm the assumption with the numbers, but I also checked this hypothesis with another "dynamic" expression using milliseconds SELECT * FROM vtest WHERE (id = cast(strftime('%f','now')*1000 as integer)) and for a comparatively large dataset the value passed in xFilter is always different to one returned if I just use full scan and double-checking (for example 15719 vs 18984). So it seems like virtual tables double checker always evaluates the expression used for every row. One can live with that just by disabling double-checking or not using such dynamics at all. I'm not sure whether such a minor thing should be fixed in the core. Thanks Max Vlasov. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness
On 03/22/2011 04:26 PM, Max Vlasov wrote: > Hi, > > recently I finally started experimenting with virtual tables and there's at > least one thing I can not understand. > > As I see xBestIndex/xFilter were developed to allow fast searching if the > implementation is able to do this. But there's also sql language that allows > very exotic queries. Some of them may be recognized by the implementation, > some not. If the former, one just can rely on sqlite double checking and > just do full scan. But there are also cases when it looks like recognition > is not possible. For example > > SELECT * FROM vtest where id> random() > > in this case xBestIndex just assumes some constant as the expression, so the > one who implements just can't detect probably unresolved query and thinks > that it can search quickly (binary search, for example). The call to xFilter > just passes first random value and sqlite will never call it again for the > same enumeration. So xFilter thinks this is the constant value used in the > query and jumps to the first correct row row never planning to jump back. > But this is actually a misleading action since in real world sqlite calls > random on every row and the rows bypassed are actually important and can be > evaluated to true. I mentioned random(), but there may be other cases, for > example when other fields are part of expressions. SQLite assumes that the result of each expression in the WHERE clause depends only on its inputs. If the input arguments are the same, the output should be do. Since random() has no inputs, SQLite figures that it must always return the same value. You can see a similar effect with: CREATE TABLE t1(a PRIMARY KEY, b); SELECT * FROM t1 WHERE a > random(); -- random() evaluated once. SELECT * FROM t1 WHERE +a > random(); -- random() evaluated many times ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Best approach for xBestIndex/xFilter effectiveness
Hi, recently I finally started experimenting with virtual tables and there's at least one thing I can not understand. As I see xBestIndex/xFilter were developed to allow fast searching if the implementation is able to do this. But there's also sql language that allows very exotic queries. Some of them may be recognized by the implementation, some not. If the former, one just can rely on sqlite double checking and just do full scan. But there are also cases when it looks like recognition is not possible. For example SELECT * FROM vtest where id > random() in this case xBestIndex just assumes some constant as the expression, so the one who implements just can't detect probably unresolved query and thinks that it can search quickly (binary search, for example). The call to xFilter just passes first random value and sqlite will never call it again for the same enumeration. So xFilter thinks this is the constant value used in the query and jumps to the first correct row row never planning to jump back. But this is actually a misleading action since in real world sqlite calls random on every row and the rows bypassed are actually important and can be evaluated to true. I mentioned random(), but there may be other cases, for example when other fields are part of expressions. So, the main question: is it possible to detect simple expressions that can be correctly resolved by quick searching? I know that I can always rely on sqlite double-checking and always do full scan. But theoretically for large datasets one should at least think about some optimization. Thanks, Max Vlasov ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users