Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jay A. Kreibich
On Mon, Mar 01, 2010 at 11:44:51PM +0100, Jean-Christophe Deschamps scratched 
on the wall:
>
>>   So indexes are not used for NOT conditions, as NOT conditions
>>   generally require a full scan, regardless.  Yes, it is a simple
>>   reverse of a binary test, but the reverse of a specific indexed
>>   lookup of a known value is a table scan to gather all the unknown
>>   values.
>
> I understand how your argument goes, but I still have difficulty buying  
> it.  Why not negate the condition itself and then only proceed?

  That seems to be mixing concepts a bit.
  
  Yes, you can try to use algebra to negate the base of a "NOT..."
  condition and transform it into an equivalent condition that lacks a
  secondary negation, like transforming "NOT A > B" into "A <= B".
  That new condition can them be applied to the query, and the query
  engine may then choose to use an index, because the condition now
  defines the set of rows it is looking for, rather than the set of rows
  it is not looking for.

  But, I'd argue that's not really the point, for several reasons.
  First and foremost, the original conversation was about why indexes
  aren't used with "NOT..." conditions.  If you go through this process,
  by the time you actually pass off the expression to the database
  engine, you no longer have a "NOT..." condition.  You may have an
  expression that can be transformed (back) into a "NOT..." condition,
  but that's not the same at the database level.

  Fine, you say, but shouldn't the query optimizer be able to produce
  that transformation?  So the transform is made between the input level
  and the database level?

  Well, I suppose it is possible for simple expressions, but I'd argue
  that really isn't the business of the query optimizer.  At least not
  the one in SQLite.  For one thing, as we've already seen, 3LV and
  those pesky NULLs make this kind of transformation a lot more complex
  than it first looks.  But more to the point, it is only likely to
  work for fairly simple expressions.   For a large number of expressions
  a transformed expression simply does not exist.

  Take, for example, one of the previous expressions you gave:
  "NOT col1 LIKE 'abc%'".  The LIKE operator is defined by the SQL
  function like(), which offers no inverse function.  The optimizations
  that use (or don't) an index are deep inside the code that implements
  the like() function.  You can't simply flip them, you need to start
  over.  And yes, you might be able to devise a new function based off
  the default implementation of like(), maybe even one that pays
  attention to the case_sensitive_like pragma.  You might even be able
  to implement it so that it uses all the same index aware optimizations
  the default like() does.  And what you end up with is an unlike()
  function that you feed into the database engine as a non-"NOT..."
  expression, putting us back to the argument that by the time you go
  through all this, you're not using an index for a "NOT..." expression,
  you're using it for an equivalent, but different expression.

  Perhaps more importantly, the transform isn't systematic, but
  predefined by swapping one function for another that has been
  pre-defined to have an inverse meaning.  And, of course, it can't be
  a simple inverse at a lower layer either.  For an unlike() function
  to have similar efficiency and use an index when possible, the whole
  function needs to be re-written at a much deeper level so that
  similar index aware optimizations can be used and still answer the
  question of like() or unlike().  That's far from simple, even for one
  specific and isolated case.

  Then there's the nasty bit about user-define over-rides for like(),
  or any expression that happens to use a function.  You can't exactly
  expect everyone that writes a function to also write an inverse function.
  (Of course, I'll admit that the majority of such functions can't use
  an index anyways...)

  So the only question left is why the optimizer doesn't do it for
  the simple cases.  And I'd argue that, "Why don't *you* do it?"  It
  hardly seems worth the very complex code (and code size, and runtime
  cost to check and test all this, not to mention development time,
  plus the huge amount of testing to verify it works correctly) just
  to optimize the extremely simple cases on the order of "NOT col1 > 3."
  If performance is a big concern the developer should just write
  different queries (and, very likely, come to appreciate that when
  NULL handling is important, this isn't as easy as it sounds).  It
  might be nice if the optimizer could do it for us, but I'd guess that
  the optimizer would only be able to bite off very simple cases that
  most of us could do in our heads in just a few seconds.
 


  Then the only issue is the knowledge and understanding that a 
  "NOT..." condition may be significantly more expensive than an
  equivalent "base" condition.



  And that's a point worth 

Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jean-Christophe Deschamps

>I haven't been able to think of how it would preclude using the index,
>but I suspect it's more a matter of needing a similar-but-different
>codepath to optimize for the NOT case, rather than a simple "invert
>this" codepath relying on the existing case.  Which is really just
>another way of stating Jay's point, perhaps.

I understand.  Perhaps I'm biaised by using functional languages where 
such operators as NOT in our case are applied as deep as possible in 
the process of any evaluation.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Scott Hess
On Mon, Mar 1, 2010 at 3:12 PM, Jean-Christophe Deschamps  
wrote:
>>NULL = 12345 is NULL, NOT NULL is NULL, so subset N is not part of NOT
>>(col = 12345).
>
> You're right of course!  (and I was even saying about nulls treated apart)
>
> But, in your view, that the set can be non-contiguous for
> negative/negated conditions would it explain that current code can't
> make use of index?

I haven't been able to think of how it would preclude using the index,
but I suspect it's more a matter of needing a similar-but-different
codepath to optimize for the NOT case, rather than a simple "invert
this" codepath relying on the existing case.  Which is really just
another way of stating Jay's point, perhaps.

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jean-Christophe Deschamps

>NULL = 12345 is NULL, NOT NULL is NULL, so subset N is not part of NOT
>(col = 12345).

You're right of course!  (and I was even saying about nulls treated apart)

But, in your view, that the set can be non-contiguous for 
negative/negated conditions would it explain that current code can't 
make use of index?



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Scott Hess
On Mon, Mar 1, 2010 at 2:44 PM, Jean-Christophe Deschamps  
wrote:
> The actual reason for the way NOT works as for now may be due to the
> fact that negating a condition may cause the resulting set to be in
> fact itself the union of two subsets.
> Say the "where" condition K is "col = 12345".  We can see the index
> split into not less than four subsets:
>   o) N = {rows with col is null}
>   o) X = {rows with col < 12345}
>   o) Y = {rows with col = 12345}
>   o) Z = {rows with col > 12345}
>
> For the "positive" condition K, the resulset is Y.
> For the negated condition K' (col <> 12345) the resulset is N u X u Z,
> made of two (or three ?) distinct blocks of indexed rowids.  A possible
> explanation is that the current code may be unable to cope with
> situations where the resultset is not index-wise contiguous.

NULL = 12345 is NULL, NOT NULL is NULL, so subset N is not part of NOT
(col = 12345).

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jean-Christophe Deschamps

>   So indexes are not used for NOT conditions, as NOT conditions
>   generally require a full scan, regardless.  Yes, it is a simple
>   reverse of a binary test, but the reverse of a specific indexed
>   lookup of a known value is a table scan to gather all the unknown
>   values.

Jay,

I understand how your argument goes, but I still have difficulty buying 
it.  Why not negate the condition itself and then only proceed?

Under any condition K, a table T is the union of the subset X of 
elements that match the condition K and the subset Y of elements that 
don't.  In other words, Y is the subset whose elements match [not 
K].  In SQL there is of course the special case for nulls.

You're saying that selecting all rows in T that match K is OK for 
indexing, while selecting rows in T which match the negated condition 
K' = not K can only be done with a full scan.

That is true iff the complementation is done afterwards or on the fly 
(full scan).  My point is that NOT seems to be acting as a 
complementation operator: select X then full scan and check every row 
if it belongs to X or not.

I'd rather see the negated condition K' (= not K) as the condition used 
for indexed selection, again when using the 'complementary' condition 
makes sense.

Pavel modified simple example works well here:

condition  " ... >  3" OK for index search
condition  " ... <= 3" OK for index search
condition  "NOT  ... >  3" only full scan: why?

I still believe that it's possible that many simple conditions be 
negated and used negated for index search.

Now, that it would easy to implement in the current SQLite code is 
another matter entirely and I never pretended it could be done within 
minutes.

The actual reason for the way NOT works as for now may be due to the 
fact that negating a condition may cause the resulting set to be in 
fact itself the union of two subsets.
Say the "where" condition K is "col = 12345".  We can see the index 
split into not less than four subsets:
   o) N = {rows with col is null}
   o) X = {rows with col < 12345}
   o) Y = {rows with col = 12345}
   o) Z = {rows with col > 12345}

For the "positive" condition K, the resulset is Y.
For the negated condition K' (col <> 12345) the resulset is N u X u Z, 
made of two (or three ?) distinct blocks of indexed rowids.  A possible 
explanation is that the current code may be unable to cope with 
situations where the resultset is not index-wise contiguous.


The OP request can be written in a number of ways to make use of the 
index, even:
select count(*) from mybigtable where mytextcolumn < '' or 
mytextcolumn >= '';
so the question is not if SQLite can process equivalent count/querries 
efficiently or not.  I nonetheless think "is not null" is much clearer 
than the above workaround. 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jay A. Kreibich
On Mon, Mar 01, 2010 at 06:03:49PM +0100, Jean-Christophe Deschamps scratched 
on the wall:
> 
> >I totally disagree with you. Let's say you have 1,000,000 rows and 100
> >of them contain NULL. In this situation selecting NOT NULL will select
> >almost all rows which means that using index in this case doesn't give
> >any performance boost. So here using full scan for NOT NULL condition
> >is better and for NULL condition using index is better. Everything is
> >completely legitimate.

> The number of rows concerned either way is not the point. 

  Unless you have stats to prove otherwise, the optimizer will
  generally assume you're trying to select a smaller subset.

  And SQLite's stats get weird when a lot of NULLs are involved, as it
  sees NULLs as unique values.  That's correct for 3VL, but not so much
  for calculating the uniqueness of a column.

> Also the 
> NULL condition could be almost anything else.  Situations where the 1M 
> and 100 figures are reversed would invalidate your point, so?

  No, you'd still need/want a table scan (see below), but for different
  reasons.

> select * from T where col1 like 'abc%' and col2 between 3 and 18;
> uses the index on T
> 
> select * from T where col1 NOT like 'abc%' and col2 between 1 and 24;
> doesn't, due to the use of NOT.
> 
> That in some particular situation using an index could be better or 
> worse is again not the point.  The point was that NOT  is 
> simply reversing the issue of a binary test, in fine, and that seems 
> essentially independant of the use of an index for determining .

  You can't use an index to efficiently find something that "isn't."

  Indexes are optimized to find specific values.  You have to know what
  you're looking for before you start.  If you need to do a full scan,
  you should just use the regular table structure, not the index.
  Most indexes have a very high overhead in returning rows.  That's why
  they're only useful if they target a very small percentage of rows
  (typically less than 15%).

  In the case of "not" operators, you don't know what you're looking
  for.  In the case of NOTs, you can't use an index to find what you want,
  only to find what you don't want, so that's not useful.

  For example, if I hand you a phone book and tell you to find all the
  phone numbers for everyone with a lastname that *doesn't* start with
  'X', you can use the phone book "index" to quickly find those numbers
  you don't want.

  But that isn't useful unless you do a full scan to generate the pool
  of numbers you might want, and then subtract out the numbers you
  don't want.  In other words, you need to do a full data scan.  Once
  that's the case, it is faster to generate the list of return values
  with a full scan, testing and rejecting as you do the scan.

  So indexes are not used for NOT conditions, as NOT conditions
  generally require a full scan, regardless.  Yes, it is a simple
  reverse of a binary test, but the reverse of a specific indexed
  lookup of a known value is a table scan to gather all the unknown
  values.

> >BTW, when SQLite uses index on "text" field it needs to load full
> >values of "text" field anyway.
> 
> Do you mean that
>  select * from T where rowid = 1;
> needs loading the _entire_ index when T has rowids in 1..1000 ?

  No.  
  
  But a row is a row and a value is a value.  It does not matter if it
  is a NULL or a TEXT data type.  If you want to test a column value,
  you need to load it.  So the question of "Why load a TEXT value to
  test for NULL?" is an odd one, since you clearly can't test something
  you don't know.

  Along with that, SQLite indexes use full data copies.  So an index of
  a column full of TEXT values has a full copy of the TEXT values.  So
  loading is still loading.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jean-Christophe Deschamps

>maybe NOT is implemented the same way as any other
>function and so it cannot be optimized using index.

That's possible, but other logical operators don't exhibit the same 
bahavior and will not prevent the use of indexes.  That NOT is not 
being handled at the same _logical_ level than AND and OR is a bit 
disappointing.


>Better example will be "NOT int_val < 3" versus "int_val >= 3".

You're right, that just got "out of the keyboard" by itself ;-)


>No, I didn't mean that. Apparently we have different understanding of
>words "full values". :)

OK then I follow you, but that only needs loading / searching some 
O(log N) pages from the b-tree, which hold actual column entries (full 
values).





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Pavel Ivanov
> The point was that NOT  is
> simply reversing the issue of a binary test, in fine, and that seems
> essentially independant of the use of an index for determining .

I agree with the point that NOT prevents using an index for some
reason. I'm not sure but probably that's because of the same issue
with functions: maybe NOT is implemented the same way as any other
function and so it cannot be optimized using index. But I find your
example with like inappropriate because it's much harder to implement
NOT LIKE 'abc%' using index than LIKE 'abc%' - it needs double
entrance to the b-tree structure. Better example will be "NOT int_val
< 3" versus "int_val >= 3".

>>BTW, when SQLite uses index on "text" field it needs to load full
>>values of "text" field anyway.
>
> Do you mean that
>     select * from T where rowid = 1;
> needs loading the _entire_ index when T has rowids in 1..1000 ?

No, I didn't mean that. Apparently we have different understanding of
words "full values". :)


Pavel

On Mon, Mar 1, 2010 at 12:03 PM, Jean-Christophe Deschamps
 wrote:
>
>>I totally disagree with you. Let's say you have 1,000,000 rows and 100
>>of them contain NULL. In this situation selecting NOT NULL will select
>>almost all rows which means that using index in this case doesn't give
>>any performance boost. So here using full scan for NOT NULL condition
>>is better and for NULL condition using index is better. Everything is
>>completely legitimate.
>
> The number of rows concerned either way is not the point.  Also the
> NULL condition could be almost anything else.  Situations where the 1M
> and 100 figures are reversed would invalidate your point, so?
>
> select * from T where col1 like 'abc%' and col2 between 3 and 18;
> uses the index on T
>
> select * from T where col1 NOT like 'abc%' and col2 between 1 and 24;
> doesn't, due to the use of NOT.
>
> That in some particular situation using an index could be better or
> worse is again not the point.  The point was that NOT  is
> simply reversing the issue of a binary test, in fine, and that seems
> essentially independant of the use of an index for determining .
>
>
>>BTW, when SQLite uses index on "text" field it needs to load full
>>values of "text" field anyway.
>
> Do you mean that
>     select * from T where rowid = 1;
> needs loading the _entire_ index when T has rowids in 1..1000 ?
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jean-Christophe Deschamps

>I totally disagree with you. Let's say you have 1,000,000 rows and 100
>of them contain NULL. In this situation selecting NOT NULL will select
>almost all rows which means that using index in this case doesn't give
>any performance boost. So here using full scan for NOT NULL condition
>is better and for NULL condition using index is better. Everything is
>completely legitimate.

The number of rows concerned either way is not the point.  Also the 
NULL condition could be almost anything else.  Situations where the 1M 
and 100 figures are reversed would invalidate your point, so?

select * from T where col1 like 'abc%' and col2 between 3 and 18;
uses the index on T

select * from T where col1 NOT like 'abc%' and col2 between 1 and 24;
doesn't, due to the use of NOT.

That in some particular situation using an index could be better or 
worse is again not the point.  The point was that NOT  is 
simply reversing the issue of a binary test, in fine, and that seems 
essentially independant of the use of an index for determining .


>BTW, when SQLite uses index on "text" field it needs to load full
>values of "text" field anyway.

Do you mean that
 select * from T where rowid = 1;
needs loading the _entire_ index when T has rowids in 1..1000 ?



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Pavel Ivanov
> Anyway, it seems the OP has a point in saying that it would be nice
> --and I would say 'natural'-- to have the optimizer enhanced to handle
> "NOT " as efficiently as it handles ", provided
> such enhancement can be done with only little changes.

I totally disagree with you. Let's say you have 1,000,000 rows and 100
of them contain NULL. In this situation selecting NOT NULL will select
almost all rows which means that using index in this case doesn't give
any performance boost. So here using full scan for NOT NULL condition
is better and for NULL condition using index is better. Everything is
completely legitimate.

The only bad thing here is that functions like max(), min() and
count() are not special functions in SQLite like it is in some other
RDBMS. Because of this fact SQLite core doesn't know that for count()
it doesn't need the full row and if the table contains a lot of other
fields along with "text" then it will be indeed faster to use index on
"text" even for full scan as in example above. It's a historical
feature in SQLite and I doubt it can be easily changed anywhere in
near future.

BTW, when SQLite uses index on "text" field it needs to load full
values of "text" field anyway. So maybe OP meant something else?


Pavel

On Mon, Mar 1, 2010 at 11:24 AM, Jean-Christophe Deschamps
 wrote:
>
>> > "SELECT count(*) WHERE NOT text IS NULL"
>> >
>> > requires that the complete text column is loaded. With a stored LOB
>> > this results in crazy performance.
>>
>>How did you find that? What do you mean by "requires loading of the
>>whole text column"? It pretty much can require even loading of text
>>columns that shouldn't be counted at all just because one database
>>page is the minimum storage entity loaded from disk.
>
> Hi Pavel,
>
> I believe the OP was intrigued/upset by the fact that
>
>     SELECT count(*) WHERE NOT text IS NULL;
> or (equivalent)
>     SELECT count(*) WHERE text IS NOT NULL
>
> does not use an index in the text column, while
>
>     SELECT count(*) WHERE text IS NULL
>
> does use the index.
>
> I've shown a (trivial) way to achieve the same 'not null' count using
> the index.
>
> Anyway, it seems the OP has a point in saying that it would be nice
> --and I would say 'natural'-- to have the optimizer enhanced to handle
> "NOT " as efficiently as it handles ", provided
> such enhancement can be done with only little changes.
>
> The optimizer is smart enough to handle multiple conditions connected
> by AND and OR and use index for every condition (when they are
> available, of course), but it reverts to full scan for any NOT
> , whatever condition is (simple or complex).
>
> I'm certainly not in a position to dictate how should the optimizer
> should evolve but, as a mere user, I feel that situation a little less
> than satisfactory.
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Jean-Christophe Deschamps

> > "SELECT count(*) WHERE NOT text IS NULL"
> >
> > requires that the complete text column is loaded. With a stored LOB
> > this results in crazy performance.
>
>How did you find that? What do you mean by "requires loading of the
>whole text column"? It pretty much can require even loading of text
>columns that shouldn't be counted at all just because one database
>page is the minimum storage entity loaded from disk.

Hi Pavel,

I believe the OP was intrigued/upset by the fact that

 SELECT count(*) WHERE NOT text IS NULL;
or (equivalent)
 SELECT count(*) WHERE text IS NOT NULL

does not use an index in the text column, while

 SELECT count(*) WHERE text IS NULL

does use the index.

I've shown a (trivial) way to achieve the same 'not null' count using 
the index.

Anyway, it seems the OP has a point in saying that it would be nice 
--and I would say 'natural'-- to have the optimizer enhanced to handle 
"NOT " as efficiently as it handles ", provided 
such enhancement can be done with only little changes.

The optimizer is smart enough to handle multiple conditions connected 
by AND and OR and use index for every condition (when they are 
available, of course), but it reverts to full scan for any NOT 
, whatever condition is (simple or complex).

I'm certainly not in a position to dictate how should the optimizer 
should evolve but, as a mere user, I feel that situation a little less 
than satisfactory. 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-03-01 Thread Pavel Ivanov
> "SELECT count(*) WHERE NOT text IS NULL"
>
> requires that the complete text column is loaded. With a stored LOB
> this results in crazy performance.

How did you find that? What do you mean by "requires loading of the
whole text column"? It pretty much can require even loading of text
columns that shouldn't be counted at all just because one database
page is the minimum storage entity loaded from disk.


Pavel

On Sat, Feb 27, 2010 at 6:25 PM, Lothar Scholz
 wrote:
> It is driving me crazy. I'm working on a web spider where a table holds the 
> downloaded
> webpage. It seems that a select
>
> "SELECT count(*) WHERE NOT text IS NULL"
>
> requires that the complete text column is loaded. With a stored LOB
> this results in crazy performance.
>
> Is this optimized in later versions of SQLite (i'm using a 3.4.x which is
> about 2 years old).
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-02-27 Thread Jay A. Kreibich
On Sun, Feb 28, 2010 at 01:26:20AM +0100, Kees Nuyt scratched on the wall:
> On Sun, 28 Feb 2010 00:44:00 +0100, "Artur Reilin"
>  wrote:
> 
> > "Select count(id) from table where text!='' or
> > text!=NULL" is the same, right?
> 
> text!=NULL is not a valid expression.

  Sure it is.  8-)
  
  It just isn't a very useful expression: it will *always* return NULL.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-02-27 Thread Jean-Christophe Deschamps

>It is driving me crazy. I'm working on a web spider where a table 
>holds the downloaded
>webpage. It seems that a select
>
>"SELECT count(*) WHERE NOT text IS NULL"
>
>requires that the complete text column is loaded. With a stored LOB
>this results in crazy performance.
>
>Is this optimized in later versions of SQLite (i'm using a 3.4.x which is
>about 2 years old).


select (select count(*) from mybigtable) - (select count(*) from 
mybigtable where mytextcolumn is null) as "Non-NULL record count";


This uses only indexes, but explain query plan gives no answer on the 
first select.  Anyway, workaround like this to check that the second 
part actually uses your index (Adapt to minimum rowid you're using if 
ever you force negative rowids in).

explain query plan select (select count(*) from mybigtable where 
rowid > 0) - (select count(*) from mybigtable where mytextcolumn is 
null) as "Non-NULL record count";
TABLE mybigtable USING PRIMARY KEY
TABLE mybigtable USING idxMyTextColumn


BTW, what is "irreal" in SQLite?  Its cost or its support?



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-02-27 Thread Kees Nuyt
On Sun, 28 Feb 2010 00:44:00 +0100, "Artur Reilin"
 wrote:

> "Select count(id) from table where text!='' or
> text!=NULL" is the same, right?

text!=NULL is not a valid expression.
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-02-27 Thread Lothar Scholz
Hello Artur,

Sunday, February 28, 2010, 12:44:00 AM, you wrote:

AR> "Select count(id) from table where text!='' or text!=NULL" is the same,
AR> right?

Using text!='' obviously needs to load the text column.
I just tried it nevertheless and aborted after 5 min (brute force
reread of the whole file would be by the way much much faster then
the 5min for a 10 GB file).

AR> As i see you want to count all rows, right? I don't know if it is faster
AR> to only count one column instead of *, but I using it the one-column-count
AR> way. Perhaps this helps too?

I never thought about this and always considered it syntax-sugar for the
count function.

And it seems i'm right, no change either.

So it's maybe time to say byebye to SQLite and move on to a real
database system.


-- 
Best regards,
 Lothar Scholzmailto:sch...@scriptolutions.com

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there an optimization for "IS NULL" operator in a SELECT query ?

2010-02-27 Thread Artur Reilin
"Select count(id) from table where text!='' or text!=NULL" is the same,  
right?

As i see you want to count all rows, right? I don't know if it is faster  
to only count one column instead of *, but I using it the one-column-count  
way. Perhaps this helps too?

Artur

---

Am 28.02.2010, 00:25 Uhr, schrieb Lothar Scholz  
:

> It is driving me crazy. I'm working on a web spider where a table holds  
> the downloaded
> webpage. It seems that a select
>
> "SELECT count(*) WHERE NOT text IS NULL"
>
> requires that the complete text column is loaded. With a stored LOB
> this results in crazy performance.
>
> Is this optimized in later versions of SQLite (i'm using a 3.4.x which is
> about 2 years old).
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users