Re: [sqlite] Searching with like for a specific start letter

2010-04-28 Thread Tim Romano
A practical situation? Lexicographical applications and full-text
applications against text corpora require indexed substring searches,
including ends-with searches. (The FTS extension is not always a good fit.)
 I am glad that only the LIKE operator has been overridden in Adobe's
version and in the version that ships with the System.Data.SQLite (.NET)
adapter;  I'd be up the creek if both LIKE and GLOB had been overridden. I
like your renaming suggestion but unfortunately that's not an option if the
implementors want to make their implementation widely available and support
standard syntax. Hence, Adobe and Google et al don't have a LIKEU().

Tim Romano


On Wed, Apr 28, 2010 at 10:09 AM, Jean-Christophe Deschamps
wrote:

> Tim,
>
>
> I agree it is possible to overload LIKE and GLOB independantly but I
> don't see a practical situation where overloading only one of them
> would be desirable.
>
> For instance, if some extension overloads LIKE to support ICU, it would
> be logical and consistent to overload GLOB with the same
> function.  Given that the two entries differ only by a parameter,
> enjoying Unicode support in LIKE and not in GLOB (or vice-versa) would
> be a bit strange.
>
> Should one have a need to keep the native functions untouched, there is
> the easy possibility to call the new versions with new names (e.g.
> LIKEU, GLOBU) even if that makes the SQL less standard.
>
> In short: possible yes, likely not much.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-28 Thread Jean-Christophe Deschamps
Tim,

>But did I say that  GLOB uses an index if it has been overloaded?  No.  I
>wrote that if LIKE has been overloaded, queries that contain LIKE 
>won't use
>the index.  Typically, GLOB won't have been overridden too just 
>because LIKE
>has been overridden: the rationale for overriding the LIKE operator 
>does not
>apply equally to GLOB, and it would make little sense to override GLOB 
>in a
>manner that vitiates its raison d'être. You are conflating these two
>functions ("... if LIKE/GLOB has been overridden... overloads LIKE/GLOB")
>but in important respects they are dissimilar.

I agree it is possible to overload LIKE and GLOB independantly but I 
don't see a practical situation where overloading only one of them 
would be desirable.

For instance, if some extension overloads LIKE to support ICU, it would 
be logical and consistent to overload GLOB with the same 
function.  Given that the two entries differ only by a parameter, 
enjoying Unicode support in LIKE and not in GLOB (or vice-versa) would 
be a bit strange.

Should one have a need to keep the native functions untouched, there is 
the easy possibility to call the new versions with new names (e.g. 
LIKEU, GLOBU) even if that makes the SQL less standard.

In short: possible yes, likely not much.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-28 Thread Tim Romano
Jean-Christophe,
But did I say that  GLOB uses an index if it has been overloaded?  No.  I
wrote that if LIKE has been overloaded, queries that contain LIKE won't use
the index.  Typically, GLOB won't have been overridden too just because LIKE
has been overridden: the rationale for overriding the LIKE operator does not
apply equally to GLOB, and it would make little sense to override GLOB in a
manner that vitiates its raison d'être. You are conflating these two
functions ("... if LIKE/GLOB has been overridden... overloads LIKE/GLOB")
but in important respects they are dissimilar.

Regards
Tim Romano

On Mon, Apr 26, 2010 at 8:27 PM, Jean-Christophe Deschamps 
wrote:

> Tim,
>
> >Queries using GLOB do use the index on the column in question (i.e.
> >optimization is attempted)
> >Queries using LIKE do not use that index if the LIKE operator has been
> >overridden.
>
> Sorry but GLOB doesn't use an index either if LIKE/GLOB has been
> overloaded.  This is consistent with the docs and the output of Explain
> query plan for both variants when an extension is active and overloads
> LIKE/GLOB.
>
> Things can be different with a custom built of SQLite, where native
> LIKE/GLOB itself has been modified.  With custom code, all bets are off.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Jean-Christophe Deschamps
Tim,

>Queries using GLOB do use the index on the column in question (i.e.
>optimization is attempted)
>Queries using LIKE do not use that index if the LIKE operator has been
>overridden.

Sorry but GLOB doesn't use an index either if LIKE/GLOB has been 
overloaded.  This is consistent with the docs and the output of Explain 
query plan for both variants when an extension is active and overloads 
LIKE/GLOB.

Things can be different with a custom built of SQLite, where native 
LIKE/GLOB itself has been modified.  With custom code, all bets are off.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Alberto Simões
Just to complete the thread, I decided for the following:

SELECT substr(normalized,1,1) AS letter, COUNT(*) from entry group by
letter order by letter;

Thank you ALL!

2010/4/26 Alberto Simões :
> Hello
>
> Thank you all for the answers.
>
> On Mon, Apr 26, 2010 at 12:59 PM, Black, Michael (IS)
>  wrote:
>> When you say "running on the fly" do you mean running from an sqlite3 
>> command prompt?
>
> I mean somebody will query it and will be waiting for the answer.
>
>> Or are you doing this in some other programming language?
>
> Perl
>
>> Why in the world would you use a database to do this?
>
> Probably I am not using a database for this, but for something else,
> and I want to add a feature to let users read some statistics.
>
>>
>>
>> Hello
>>
>> I am running on the fly a query to count the number of words starting
>> with one of the 26 letters.
>>
>> I am doing the usual SELECT COUNT(term) from dictionary WHERE normword
>> LIKE "a%"  (for the 26 letters)
>>
>> normword is the term normalized without accents and the like
>>
>>
>> Is there any way to make this query faster? It is taking about 10
>> second for 140K entries.
>>
>> One idea is to add a column named 'letter' and SELECT COUNT(letter)
>> from dictionary WHERE letter = 'a'.
>> But are there other solutions?
>>
>> Thanks
>> --
>> Alberto Simões
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>>
>
>
>
> --
> Alberto Simões
>



-- 
Alberto Simões
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Alberto Simões
Hello

Thank you all for the answers.

On Mon, Apr 26, 2010 at 12:59 PM, Black, Michael (IS)
 wrote:
> When you say "running on the fly" do you mean running from an sqlite3 command 
> prompt?

I mean somebody will query it and will be waiting for the answer.

> Or are you doing this in some other programming language?

Perl

> Why in the world would you use a database to do this?

Probably I am not using a database for this, but for something else,
and I want to add a feature to let users read some statistics.

>
>
> Hello
>
> I am running on the fly a query to count the number of words starting
> with one of the 26 letters.
>
> I am doing the usual SELECT COUNT(term) from dictionary WHERE normword
> LIKE "a%"  (for the 26 letters)
>
> normword is the term normalized without accents and the like
>
>
> Is there any way to make this query faster? It is taking about 10
> second for 140K entries.
>
> One idea is to add a column named 'letter' and SELECT COUNT(letter)
> from dictionary WHERE letter = 'a'.
> But are there other solutions?
>
> Thanks
> --
> Alberto Simões
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>



-- 
Alberto Simões
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Tim Romano
Edit: I meant to type "Firefox" not Firebird.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Tim Romano
I am not quite sure what it is, and why it is, that you are doubting,
Jean-Christophe.

Queries using GLOB do use the index on the column in question (i.e.
optimization is attempted)
Queries using LIKE do not use that index if the LIKE operator has been
overridden.

You could confirm this claim from the docs and/or by using EXPLAIN QUERY
PLAN, assuming you had access to a version of SQLite where the LIKE operator
has been overridden (as it has been in Adobe AIR, Firebird, and in
the version that ships with the  System.Data.SQLite .NET provider as well,
IIRC.


Regards
Tim Romano


On Mon, Apr 26, 2010 at 9:24 AM, Jean-Christophe Deschamps 
wrote:

> At 14:31 26/04/2010, you wrote:
>
> >If the implementation of SQLite you are using overrides the LIKE operator
> >(as more than a few do), then SQLite will not make use of an index on the
> >column in question. Use the GLOB operator instead.
>
> I doubt it.  GLOB is absolutely nothing more or less than an invokation
> of the same code for LIKE but with slightly different
> parameters.  Except if people have made a completely differing version,
> departing from the architecture of the standard SQLite code (and there
> is little reason to, if any) AND have made LIKE and GLOB two completely
> distinct functions, there shouldn't be any significant difference in
> running time (for equivalent queries, of course).
>
> Also if ever LIKE is overloaded, then GLOB gets excluded from standard
> optimization, except large changes in SQLite code.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Jean-Christophe Deschamps
At 14:31 26/04/2010, you wrote:

>If the implementation of SQLite you are using overrides the LIKE operator
>(as more than a few do), then SQLite will not make use of an index on the
>column in question. Use the GLOB operator instead.

I doubt it.  GLOB is absolutely nothing more or less than an invokation 
of the same code for LIKE but with slightly different 
parameters.  Except if people have made a completely differing version, 
departing from the architecture of the standard SQLite code (and there 
is little reason to, if any) AND have made LIKE and GLOB two completely 
distinct functions, there shouldn't be any significant difference in 
running time (for equivalent queries, of course).

Also if ever LIKE is overloaded, then GLOB gets excluded from standard 
optimization, except large changes in SQLite code.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Tim Romano
Yes. If the OP's [normword] column contains proper nouns, he must normalize
to lower case in order to get accurate results from GLOB.
Or, if his lexicon contains proper nouns in upper case and normal nouns in
lower case, then he could always leave the case intact and use GLOB to get a
count of proper nouns versus normal nouns ;-)

... GLOB 'A*'
... GLOB 'a*'

Regards
Tim Romano




On Mon, Apr 26, 2010 at 8:47 AM, Igor Tandetnik  wrote:

> Tim Romano wrote:
> > If the implementation of SQLite you are using overrides the LIKE operator
> > (as more than a few do), then SQLite will not make use of an index on the
> > column in question. Use the GLOB operator instead.
> >
> > For example, I have a lexicon containing 263,000 words:
> >
> > select count(*) from lexicon where spelling like 'a%'   // 552 ms on
> first
> > run and then 355ms on second and subsequent runs
> > select count(*) from lexicon where spelling glob 'a*'  // 110 ms on first
> > run and then ~10ms on second and subsequent runs
>
> Note that, by default, LIKE is case-insensitive while GLOB is
> case-sensitive. Thus, even if not overridden with a custom function, LIKE
> cannot be optimized unless the column is declared with NOCASE collation.
> Again, this article provides all the details:
> http://sqlite.org/optoverview.html#like_opt
> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Igor Tandetnik
Tim Romano wrote:
> If the implementation of SQLite you are using overrides the LIKE operator
> (as more than a few do), then SQLite will not make use of an index on the
> column in question. Use the GLOB operator instead.
> 
> For example, I have a lexicon containing 263,000 words:
> 
> select count(*) from lexicon where spelling like 'a%'   // 552 ms on first
> run and then 355ms on second and subsequent runs
> select count(*) from lexicon where spelling glob 'a*'  // 110 ms on first
> run and then ~10ms on second and subsequent runs

Note that, by default, LIKE is case-insensitive while GLOB is case-sensitive. 
Thus, even if not overridden with a custom function, LIKE cannot be optimized 
unless the column is declared with NOCASE collation. Again, this article 
provides all the details: http://sqlite.org/optoverview.html#like_opt
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Tim Romano
If the implementation of SQLite you are using overrides the LIKE operator
(as more than a few do), then SQLite will not make use of an index on the
column in question. Use the GLOB operator instead.

For example, I have a lexicon containing 263,000 words:

select count(*) from lexicon where spelling like 'a%'   // 552 ms on first
run and then 355ms on second and subsequent runs
select count(*) from lexicon where spelling glob 'a*'  // 110 ms on first
run and then ~10ms on second and subsequent runs


Alternatively:

select substr(spelling,1,1), count (*) from  lexicon
group by substr(spelling,1,1)
order by  substr(spelling,1,1)

// ~3500 ms on first run and then ~2400 ms on second and subsequent runs


Of course, if your lexicon is static, you could create an ancillary table of
first letters and their corresponding counts.

Regards
Tim Romano

2010/4/25 Alberto Simões 

> Hello
>
> I am running on the fly a query to count the number of words starting
> with one of the 26 letters.
>
> I am doing the usual SELECT COUNT(term) from dictionary WHERE normword
> LIKE "a%"  (for the 26 letters)
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-26 Thread Black, Michael (IS)
When you say "running on the fly" do you mean running from an sqlite3 command 
prompt?
 
Or are you doing this in some other programming language?
 
Why in the world would you use a database to do this?
 
Michael D. Black
Senior Scientist
Northrop Grumman Mission Systems
 



From: sqlite-users-boun...@sqlite.org on behalf of Alberto Simões
Sent: Sun 4/25/2010 3:39 PM
To: General Discussion of SQLite Database
Subject: [sqlite] Searching with like for a specific start letter



Hello

I am running on the fly a query to count the number of words starting
with one of the 26 letters.

I am doing the usual SELECT COUNT(term) from dictionary WHERE normword
LIKE "a%"  (for the 26 letters)

normword is the term normalized without accents and the like


Is there any way to make this query faster? It is taking about 10
second for 140K entries.

One idea is to add a column named 'letter' and SELECT COUNT(letter)
from dictionary WHERE letter = 'a'.
But are there other solutions?

Thanks
--
Alberto Simões
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-25 Thread Kees Nuyt
On Sun, 25 Apr 2010 21:39:43 +0100, Alberto Simões
 wrote:

>Hello
>
> I am running on the fly a query to count the number of 
> words starting with one of the 26 letters.
>
> I am doing the usual SELECT COUNT(term) from dictionary WHERE normword
> LIKE "a%"  (for the 26 letters)
>
>normword is the term normalized without accents and the like

Would your application allow to return all 26 in one query?

SELECT COUNT(term) FROM dictionary 
WHERE normword >= 'a'
  AND normword <= 'zz'
GROUP BY substr(normword,1,1);

(untested, but certainly faster than 26 separate queries)

>Is there any way to make this query faster? It is taking about 10
>second for 140K entries.
>
>One idea is to add a column named 'letter' and SELECT COUNT(letter)
>from dictionary WHERE letter = 'a'.
>But are there other solutions?
>
>Thanks
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-25 Thread Simon Slavin

On 25 Apr 2010, at 9:39pm, Alberto Simões wrote:

> One idea is to add a column named 'letter' and SELECT COUNT(letter)
> from dictionary WHERE letter = 'a'.

That will be the simplest way to make a fast lookup, though it will slow down 
your INSERT function.  You could speed it up a tiny bit more by making the 
search field an INTEGER field, and storing, for example, 65 in there for 'a', 
66 for 'b', etc..

Two ways to do it: either put the right letter in the column when you create 
the record, or leave the column at a dummy default value when you create the 
record and have an 'update index' which updates all records which have the 
dummy default value.  Which one you choose depends on whether your INSERT 
function has to run faster than it does naturally.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Searching with like for a specific start letter

2010-04-25 Thread Igor Tandetnik
Alberto Simões wrote:
> I am running on the fly a query to count the number of words starting
> with one of the 26 letters.
> 
> I am doing the usual SELECT COUNT(term) from dictionary WHERE normword
> LIKE "a%"  (for the 26 letters)
> 
> normword is the term normalized without accents and the like

See if this condition works better:

where normword >= 'a' and normword < 'b'

See also http://sqlite.org/optoverview.html#like_opt
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Searching with like for a specific start letter

2010-04-25 Thread Alberto Simões
Hello

I am running on the fly a query to count the number of words starting
with one of the 26 letters.

I am doing the usual SELECT COUNT(term) from dictionary WHERE normword
LIKE "a%"  (for the 26 letters)

normword is the term normalized without accents and the like


Is there any way to make this query faster? It is taking about 10
second for 140K entries.

One idea is to add a column named 'letter' and SELECT COUNT(letter)
from dictionary WHERE letter = 'a'.
But are there other solutions?

Thanks
-- 
Alberto Simões
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users