Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness

2011-03-22 Thread Dan Kennedy
On 03/23/2011 01:07 AM, Jay A. Kreibich wrote:
> On Tue, Mar 22, 2011 at 06:25:04PM +0700, Dan Kennedy scratched on the wall:
>
>> SQLite assumes that the result of each expression in the WHERE
>> clause depends only on its inputs. If the input arguments are
>> the same, the output should be do. Since random() has no inputs,
>> SQLite figures that it must always return the same value.
>
>To what degree?  And expression like "...WHERE 20<= (random()%100)"
>has no "inputs" other than constants, but is still evaluated once per
>row.  Or is it just raw functions and column references, and not the
>expression as a whole?


I think once you are trying to predict how many times or exactly
when a user function will be called for a given SQL statement you
are technically into the realms of undefined behaviour.

And again, technically, SQLite assumes that the value returned by
a user-defined function are a function of its inputs. Once instance
of where this assumption is used is with virtual tables. If you do:

   SELECT * FROM vtab WHERE col = userfunction();

and the xBestIndex() method says it can handle "col = ?" but does
not set the corresponding "aConstraintUsage[x].omit" flag, SQLite
will evaluate userfunction() once to pass to the xFilter
method, and then again for each row visited by the virtual table
cursor. If the result of userfunction() is not
stable, the query could return difficult to explain results.

I think there might be other such examples too. Left joins. Where
clauses that include OR operators. That sort of thing.

That said, we're aware of the way random() and user-functions with
side-effects are often used. I don't think it's something that
would get changed capriciously.

Dan.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness

2011-03-22 Thread Jay A. Kreibich
On Tue, Mar 22, 2011 at 06:25:04PM +0700, Dan Kennedy scratched on the wall:

> SQLite assumes that the result of each expression in the WHERE
> clause depends only on its inputs. If the input arguments are
> the same, the output should be do. Since random() has no inputs,
> SQLite figures that it must always return the same value.

  To what degree?  And expression like "...WHERE 20 <= (random()%100)"
  has no "inputs" other than constants, but is still evaluated once per
  row.  Or is it just raw functions and column references, and not the
  expression as a whole?

  Either way, it would seem a function with no inputs should always be
  considered non-constant.  Unless someone writes a function that boils
  down to func(){return VALUE;}, it is very likely the function
  references some external value or state, and is unlikely to return
  the same value.

  I realize that most systems will only evaluates random() once, even
  in a larger expression, but I've always found it nice that SQLite did
  not in expressions like the one I gave.  It makes it much easier to
  sample data.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness

2011-03-22 Thread Max Vlasov
On Tue, Mar 22, 2011 at 2:25 PM, Dan Kennedy  wrote:

> On 03/22/2011 04:26 PM, Max Vlasov wrote:
> > Hi,
> >
> > recently I finally started experimenting with virtual tables and there's
> at
> > least one thing I can not understand.
> >
> > As I see xBestIndex/xFilter were developed to allow fast searching if the
> > implementation is able to do this. But there's also sql language that
> allows
> > very exotic queries. Some of them may be recognized by the
> implementation,
> > some not. If the former, one just can rely on sqlite double checking and
> > just do full scan. But there are also cases when it looks like
> recognition
> > is not possible. For example
> >
> > SELECT * FROM vtest where id>  random()
> >
> > in this case xBestIndex just assumes some constant as the expression, so
> the
> > one who implements just can't detect probably unresolved query and thinks
> > that it can search quickly (binary search, for example). The call to
> xFilter
> > just passes first random value and sqlite will never call it again for
> the
> > same enumeration. So xFilter thinks this is the constant value used in
> the
> > query and jumps to the first correct row row never planning to jump back.
> > But this is actually a misleading action since in real world sqlite calls
> > random on every row and the rows bypassed are actually important and can
> be
> > evaluated to true. I mentioned random(), but there may be other cases,
> for
> > example when other fields are part of expressions.
>
> SQLite assumes that the result of each expression in the WHERE
> clause depends only on its inputs. If the input arguments are
> the same, the output should be do. Since random() has no inputs,
> SQLite figures that it must always return the same value.
>
> You can see a similar effect with:
>
>   CREATE TABLE t1(a PRIMARY KEY, b);
>   SELECT * FROM t1 WHERE a > random();  -- random() evaluated once.
>   SELECT * FROM t1 WHERE +a > random(); -- random() evaluated many times
>


Dan, thanks, I double-checked your information and (ironically) I see that
the problem is with "the double check" :)

As I see now, sqlite does a great job that probably won't require any
additional steps for the problem I posted. So if the expression is not
"simple" in the terms I used, it just won't supply any constraint to
xBestIndex so automatically forcing full-scan. But if the double-check is
on, sqlite seems like actually checks random() for every result row and this
actually can give non-correct result.

Although I can not confirm the assumption with the numbers, but I also
checked this hypothesis with another "dynamic" expression using milliseconds

SELECT * FROM vtest WHERE (id = cast(strftime('%f','now')*1000 as integer))

and for a comparatively large dataset the value passed in xFilter is always
different to one returned if I just use full scan and double-checking (for
example 15719 vs 18984).

So it seems like virtual tables double checker always evaluates the
expression used for every row. One can live with that just by disabling
double-checking or not using such dynamics at all. I'm not sure whether such
a minor thing should be fixed in the core.

Thanks

Max Vlasov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Best approach for xBestIndex/xFilter effectiveness

2011-03-22 Thread Dan Kennedy
On 03/22/2011 04:26 PM, Max Vlasov wrote:
> Hi,
>
> recently I finally started experimenting with virtual tables and there's at
> least one thing I can not understand.
>
> As I see xBestIndex/xFilter were developed to allow fast searching if the
> implementation is able to do this. But there's also sql language that allows
> very exotic queries. Some of them may be recognized by the implementation,
> some not. If the former, one just can rely on sqlite double checking and
> just do full scan. But there are also cases when it looks like recognition
> is not possible. For example
>
> SELECT * FROM vtest where id>  random()
>
> in this case xBestIndex just assumes some constant as the expression, so the
> one who implements just can't detect probably unresolved query and thinks
> that it can search quickly (binary search, for example). The call to xFilter
> just passes first random value and sqlite will never call it again for the
> same enumeration. So xFilter thinks this is the constant value used in the
> query and jumps to the first correct row row never planning to jump back.
> But this is actually a misleading action since in real world sqlite calls
> random on every row and the rows bypassed are actually important and can be
> evaluated to true. I mentioned random(), but there may be other cases, for
> example when other fields are part of expressions.

SQLite assumes that the result of each expression in the WHERE
clause depends only on its inputs. If the input arguments are
the same, the output should be do. Since random() has no inputs,
SQLite figures that it must always return the same value.

You can see a similar effect with:

   CREATE TABLE t1(a PRIMARY KEY, b);
   SELECT * FROM t1 WHERE a > random();  -- random() evaluated once.
   SELECT * FROM t1 WHERE +a > random(); -- random() evaluated many times
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Best approach for xBestIndex/xFilter effectiveness

2011-03-22 Thread Max Vlasov
Hi,

recently I finally started experimenting with virtual tables and there's at
least one thing I can not understand.

As I see xBestIndex/xFilter were developed to allow fast searching if the
implementation is able to do this. But there's also sql language that allows
very exotic queries. Some of them may be recognized by the implementation,
some not. If the former, one just can rely on sqlite double checking and
just do full scan. But there are also cases when it looks like recognition
is not possible. For example

SELECT * FROM vtest where id > random()

in this case xBestIndex just assumes some constant as the expression, so the
one who implements just can't detect probably unresolved query and thinks
that it can search quickly (binary search, for example). The call to xFilter
just passes first random value and sqlite will never call it again for the
same enumeration. So xFilter thinks this is the constant value used in the
query and jumps to the first correct row row never planning to jump back.
But this is actually a misleading action since in real world sqlite calls
random on every row and the rows bypassed are actually important and can be
evaluated to true. I mentioned random(), but there may be other cases, for
example when other fields are part of expressions.

So, the main question: is it possible to detect simple expressions that can
be correctly resolved by quick searching? I know that I can always rely on
sqlite double-checking and always do full scan. But theoretically for large
datasets one should at least think about some optimization.

Thanks,

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users