Re: [sqlite] LIKE operator and collations

2019-02-15 Thread Shawn Wagner
Look into using the ICU extension. If you're compiling sqlite yourself,
just define SQLITE_ENABLE_ICU to 1 (And link with the ICU libraries),
otherwise you'll have to grab the source and compile it as a loadable
module.

https://www3.sqlite.org/cgi/src/dir?ci=03c4f00317233a34&name=ext/icu for
details. (Is there a way to link to the latest version of a file in the
repository and not a particular commit?)

On Fri, Feb 15, 2019 at 6:06 AM Aydin Ozgur Yagmur 
wrote:

> Hello,
>
> I want to use custom collations for "like" and "not equals" queries.
> *select * from tbl_internal where col_internal like 'ç%' collate
> TURKISH_CI;*
>
> it is ok for "equals" operator.
> *select * from tbl_internal where col_internal = 'çç' collate TURKISH_CI;*
>
> but not ok, for "like" and "not equals" operators
>
> How can i make a search like these? Could you give me any clues?
>
> Thanks for advance,
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2019-02-15 Thread Simon Slavin
You can write your own LIKE function and use that:



If you have the source for "collate TURKISH_CI" then you might be able to use 
it in your own function.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] LIKE operator and collations

2019-02-15 Thread Aydin Ozgur Yagmur
Hello,

I want to use custom collations for "like" and "not equals" queries.
*select * from tbl_internal where col_internal like 'ç%' collate
TURKISH_CI;*

it is ok for "equals" operator.
*select * from tbl_internal where col_internal = 'çç' collate TURKISH_CI;*

but not ok, for "like" and "not equals" operators

How can i make a search like these? Could you give me any clues?

Thanks for advance,
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] like operator

2016-12-05 Thread Igor Tandetnik

On 12/5/2016 3:43 PM, Don V Nielsen wrote:

Igor, I'm not sure if you gain anything from"length(lower(name))". Just
"length(name)" would suffice.


I'm guarding against various Unicode weirdnesses that could cause string 
length to change on case transformation. While SQLite only folds ASCII 
letters by default, it could be compiled with full ICU collation support.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] like operator

2016-12-05 Thread Don V Nielsen
Igor, I'm not sure if you gain anything from"length(lower(name))". Just
"length(name)" would suffice.

On Mon, Dec 5, 2016 at 10:11 AM, Dominique Devienne 
wrote:

> On Mon, Dec 5, 2016 at 4:24 PM, Igor Tandetnik  wrote:
>
> > On 12/5/2016 10:19 AM, Igor Tandetnik wrote:
> >
> >> On 12/5/2016 7:30 AM, ravi.shan...@cellworksgroup.com wrote:
> >>
> >>> select name from employee table where name like '%Araya%' or name like
> >>> '%Amul%' or name like '%Aj%';
> >>>
> >>> Table - Employee
> >>>
> >>> Id | Name | age |
> >>> 1  | Arayan Kuma | 29  |
> >>> 2  | Amul Kanth  | 30  |
> >>> 3  | Ajay Kumar | 45  |
> >>>
> >>> I dont like to use may or conditions for pattern matching using like
> >>> operator.
> >>> Is there any other way I can workaround without using or condition in
> >>> like operator in sqlite.
> >>>
> >>
> >> WHERE length(replace(replace(replace(name, 'Araya', ''), 'Amul', ''),
> >> 'Aj', '')) != length(name)
> >>
> >
> > Actually, this is not quite the same: it's case-sensitive, whereas LIKE
> is
> > case-insensitive by default. To be equivalent, make it
> >
> > WHERE length(replace(replace(replace(lower(name), 'araya', ''), 'amul',
> > ''), 'aj', '')) != length(lower(name))
>
>
> Or use the pragma [1].
> Thanks, didn't realize/know LIKE was case-insensitive (for ASCII chars
> only) by default, in SQLite.
> Also made me double-check whether Oracle is case-sensitive or not (it is)
> [2]
>
> BTW, Igor: wow :)
> Not that I'd use that ever, but still, very clever! --DD
>
> [1] https://www.sqlite.org/pragma.html#pragma_case_sensitive_like
> [2]
> http://stackoverflow.com/questions/5391069/case-insensitive-searching-in-
> oracle
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] like operator

2016-12-05 Thread Dominique Devienne
On Mon, Dec 5, 2016 at 4:24 PM, Igor Tandetnik  wrote:

> On 12/5/2016 10:19 AM, Igor Tandetnik wrote:
>
>> On 12/5/2016 7:30 AM, ravi.shan...@cellworksgroup.com wrote:
>>
>>> select name from employee table where name like '%Araya%' or name like
>>> '%Amul%' or name like '%Aj%';
>>>
>>> Table - Employee
>>>
>>> Id | Name | age |
>>> 1  | Arayan Kuma | 29  |
>>> 2  | Amul Kanth  | 30  |
>>> 3  | Ajay Kumar | 45  |
>>>
>>> I dont like to use may or conditions for pattern matching using like
>>> operator.
>>> Is there any other way I can workaround without using or condition in
>>> like operator in sqlite.
>>>
>>
>> WHERE length(replace(replace(replace(name, 'Araya', ''), 'Amul', ''),
>> 'Aj', '')) != length(name)
>>
>
> Actually, this is not quite the same: it's case-sensitive, whereas LIKE is
> case-insensitive by default. To be equivalent, make it
>
> WHERE length(replace(replace(replace(lower(name), 'araya', ''), 'amul',
> ''), 'aj', '')) != length(lower(name))


Or use the pragma [1].
Thanks, didn't realize/know LIKE was case-insensitive (for ASCII chars
only) by default, in SQLite.
Also made me double-check whether Oracle is case-sensitive or not (it is)
[2]

BTW, Igor: wow :)
Not that I'd use that ever, but still, very clever! --DD

[1] https://www.sqlite.org/pragma.html#pragma_case_sensitive_like
[2]
http://stackoverflow.com/questions/5391069/case-insensitive-searching-in-oracle
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] like operator

2016-12-05 Thread Igor Tandetnik

On 12/5/2016 10:19 AM, Igor Tandetnik wrote:

On 12/5/2016 7:30 AM, ravi.shan...@cellworksgroup.com wrote:

select name from employee table where name like '%Araya%' or name like
'%Amul%' or name like '%Aj%';

Table - Employee

Id | Name | age |
1  | Arayan Kuma | 29  |
2  | Amul Kanth  | 30  |
3  | Ajay Kumar | 45  |

I dont like to use may or conditions for pattern matching using like
operator.
Is there any other way I can workaround without using or condition in
like operator in sqlite.


WHERE length(replace(replace(replace(name, 'Araya', ''), 'Amul', ''),
'Aj', '')) != length(name)


Actually, this is not quite the same: it's case-sensitive, whereas LIKE 
is case-insensitive by default. To be equivalent, make it


WHERE length(replace(replace(replace(lower(name), 'araya', ''), 'amul', 
''), 'aj', '')) != length(lower(name))


--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] like operator

2016-12-05 Thread Igor Tandetnik
On 12/5/2016 7:30 AM, 
ravi.shan...@cellworksgroup.com wrote:

select name from employee table where name like '%Araya%' or name like
'%Amul%' or name like '%Aj%';

Table - Employee

Id | Name | age |
1  | Arayan Kuma | 29  |
2  | Amul Kanth  | 30  |
3  | Ajay Kumar | 45  |

I dont like to use may or conditions for pattern matching using like
operator.
Is there any other way I can workaround without using or condition in
like operator in sqlite.


WHERE length(replace(replace(replace(name, 'Araya', ''), 'Amul', ''), 
'Aj', '')) != length(name)


--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] like operator

2016-12-05 Thread ravi.shan...@cellworksgroup.com

Hi Guys,

select name from employee table where name like '%Araya%' or name like 
'%Amul%' or name like '%Aj%';


Table - Employee

Id | Name | age |
1  | Arayan Kuma | 29  |
2  | Amul Kanth  | 30  |
3  | Ajay Kumar | 45  |

I dont like to use may or conditions for pattern matching using like 
operator.
Is there any other way I can workaround without using or condition in 
like operator in sqlite.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-14 Thread Constantine Yannakopoulos
On Wed, May 14, 2014 at 1:35 PM, Jan Slodicka  wrote:

> Simon Slavin-3 wrote
> > On 13 May 2014, at 5:21pm, Constantine Yannakopoulos wrote:
> >
> >> ​This is very interesting Jan. The only way this could fail is if the
> >> collation implementation does something funny if it encounters this
> >> character​, e.g. choose to ignore it when comparing.
> >
> > That cuts out a very large number of collations.  The solution works fine
> > for any collation which orders strings according to Unicode order.  But
> > the point of creating a correlation is that you don't want that order.
> >
> > Simon.
>
> Simon, I think that the most frequent point of making a collation is to get
> the Unicode order. At the bare minimum adding LIKE optimization to the ICU
> Sqlite extension would make sense, the savings are really huge.
>

There could be a flag in sqlite3_create_collation_v2()'s TextRep argument,
much like the flag SQLITE_DETERMINISTIC of sqlite3_create_function() that
will flag the collation as a "unicode text" collation. If this flag is set,
the engine can perform the LIKE optimization for these collations using the
U+10FFFD idea to construct an upper limit for the range as it has been
described in previous posts. Since this flag is not present in existing
calls to sqlite3_create_collation_v2() the change will be
backward-compatible.

Either this or the already mentioned idea of giving the ability to manually
specify lower and upper bounds for the LIKE optimization, perhaps by means
of a callback in a hypothetical sqlite3_create_collation_v3() variant.

And by the way, "unicode text" collations include all "strange" collations
like the one of accent insensitivity and mixed codepage I described in my
original post. And I would expect these to be about 95% of all custom coded
collations.

--Constantine
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-14 Thread Jan Slodicka
Simon Slavin-3 wrote
> On 13 May 2014, at 5:21pm, Constantine Yannakopoulos wrote:
> 
>> ​This is very interesting Jan. The only way this could fail is if the
>> collation implementation does something funny if it encounters this
>> character​, e.g. choose to ignore it when comparing.
> 
> That cuts out a very large number of collations.  The solution works fine
> for any collation which orders strings according to Unicode order.  But
> the point of creating a correlation is that you don't want that order. 
> 
> Simon.

Simon, I think that the most frequent point of making a collation is to get
the Unicode order. At the bare minimum adding LIKE optimization to the ICU
Sqlite extension would make sense, the savings are really huge.





--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75667.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-14 Thread Clemens Ladisch
Hick Gunter wrote:
> I was under the impression you wanted to achieve this:
>
> > select hex('abc' || X'10FFFD');
> 61626310FFFD
> > select length('abc' || X'10FFFD');
> 6

If you want to create characters through a blob, you have to use
the correct UTF-8 encoding:

 sqlite> select quote(cast(char(1114109) as blob));
 X'F48FBFBD'
 sqlite> select unicode(x'F48FBFBD');
 1114109
 sqlite> select hex('abc' || x'F48FBFBD');
 616263F48FBFBD
 sqlite> select length('abc' || x'F48FBFBD');
 4


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-14 Thread Jan Slodicka
Dominique Devienne wrote
> On Tue, May 13, 2014 at 5:50 PM, Jan Slodicka <

> jano@

> > wrote:
>> So one could replace "LIKE 'xxx%'" by "BETWEEN('xxx', 'xxx' +
>> '\uDBFF\uDFFD').
> 
> make that
> 
> BETWEEN('xxx', 'xxx' + char(1114109))
> 
> I don't think SQlite supports \u literals, nor does it support hex
> literals, so must use the decimal equivalent to U+10FFFD.

Dominique,

I just wanted to sketch the idea and did not care of the syntax. I am going
to apply this differently, anyway:-)




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75665.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-14 Thread Hick Gunter
I was under the impression you wanted to achieve this:

asql> select hex('abc' || X'10FFFD');
hex('abc' || X'10FFFD')
---
61626310FFFD
asql> select length('abc' || X'10FFFD');
length('abc' || X'10FFFD')
--
6
asql> select typeof('abc' || X'10FFFD');
typeof('abc' || X'10FFFD')
--
text


-Ursprüngliche Nachricht-
Von: Dominique Devienne [mailto:ddevie...@gmail.com]
Gesendet: Mittwoch, 14. Mai 2014 09:28
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] LIKE operator and collations

On Wed, May 14, 2014 at 8:30 AM, Hick Gunter  wrote:
> Actually SQLite does support X'...' literals for creating blobs.

Note sure how that's relevant Hick. We don't need a blob, but a integer for 
char(). I was obviously talking about *number* literals (prefixed with 0b, 0, 
0x for binary / octal / hexa), not blob literals.

I'd be +1 to having char() also accept blobs (1 to 3 bytes long) instead of 
integer, since x'10FFFD' is closer to U+10FFFD than 1114109 is.

But Dr. Hipp would need to agree with that. Until then, 1114109 it must be. --DD

C:\Users\DDevienne>sqlite3
SQLite version 3.8.4.3 2014-04-03 16:53:12 Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select char(x'10FFFD');

sqlite> select typeof(char(x'10FFFD'));
text
sqlite> select length(char(x'10FFFD'));
0
sqlite>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


---
Gunter Hick
Software Engineer

Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna,
Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then
delete this message from your system. Please do not copy it or use it for any 
purposes, or disclose its contents to any person as to do so could be a breach 
of confidence. Thank you for your cooperation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-14 Thread Dominique Devienne
On Wed, May 14, 2014 at 8:30 AM, Hick Gunter  wrote:
> Actually SQLite does support X'...' literals for creating blobs.

Note sure how that's relevant Hick. We don't need a blob, but a
integer for char(). I was obviously talking about *number* literals
(prefixed with 0b, 0, 0x for binary / octal / hexa), not blob
literals.

I'd be +1 to having char() also accept blobs (1 to 3 bytes long)
instead of integer, since x'10FFFD' is closer to U+10FFFD than 1114109
is.

But Dr. Hipp would need to agree with that. Until then, 1114109 it must be. --DD

C:\Users\DDevienne>sqlite3
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select char(x'10FFFD');

sqlite> select typeof(char(x'10FFFD'));
text
sqlite> select length(char(x'10FFFD'));
0
sqlite>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-13 Thread Hick Gunter
Actually SQLite does support X'...' literals for creating blobs.

-Ursprüngliche Nachricht-
Von: Dominique Devienne [mailto:ddevie...@gmail.com]
Gesendet: Dienstag, 13. Mai 2014 18:19
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] LIKE operator and collations

On Tue, May 13, 2014 at 5:50 PM, Jan Slodicka  wrote:
> So one could replace "LIKE 'xxx%'" by "BETWEEN('xxx', 'xxx' + '\uDBFF\uDFFD').

make that

BETWEEN('xxx', 'xxx' + char(1114109))

I don't think SQlite supports \u literals, nor does it support hex 
literals, so must use the decimal equivalent to U+10FFFD.

 C:\Users\DDevienne>sqlite3
SQLite version 3.8.4.3 2014-04-03 16:53:12 Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select '\uDBFF\uDFFD';
\uDBFF\uDFFD
sqlite> select char(1114109);
􏿽
sqlite> select char(0x10FFFD);
Error: unrecognized token: "0x10FFFD"
sqlite> select typeof(char(1114109));
text
sqlite> select length(char(1114109));
1
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


---
Gunter Hick
Software Engineer

Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna,
Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then
delete this message from your system. Please do not copy it or use it for any 
purposes, or disclose its contents to any person as to do so could be a breach 
of confidence. Thank you for your cooperation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-13 Thread Simon Slavin

On 13 May 2014, at 5:21pm, Constantine Yannakopoulos wrote:

> ​This is very interesting Jan. The only way this could fail is if the
> collation implementation does something funny if it encounters this
> character​, e.g. choose to ignore it when comparing.

That cuts out a very large number of collations.  The solution works fine for 
any collation which orders strings according to Unicode order.  But the point 
of creating a correlation is that you don't want that order.  For instance, I 
have a correlation which is used for IP addresses and expects its input to look 
like

d.d.d.d

where each 'd' is one or more digits.  If you hand it a string which doesn't 
conform (octet missing, or octet > 255), that string evaluates the same as 
'0.0.0.0'.  If I was to use

... WHERE source LIKE '10.%'

and the char(1114109) solution was used I'd get incorrect results whereas

... WHERE source BETWEEN '10.0.0.0' AND '10.255.255.255'

works fine.

There isn't a solution to the problem unless SQLite has further information 
from whoever devised the collation.

Simon.

Refs: 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-13 Thread Dominique Devienne
On Tue, May 13, 2014 at 5:50 PM, Jan Slodicka  wrote:
> So one could replace "LIKE 'xxx%'" by "BETWEEN('xxx', 'xxx' + '\uDBFF\uDFFD').

make that

BETWEEN('xxx', 'xxx' + char(1114109))

I don't think SQlite supports \u literals, nor does it support hex
literals, so must use the decimal equivalent to U+10FFFD.

 C:\Users\DDevienne>sqlite3
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select '\uDBFF\uDFFD';
\uDBFF\uDFFD
sqlite> select char(1114109);
􏿽
sqlite> select char(0x10FFFD);
Error: unrecognized token: "0x10FFFD"
sqlite> select typeof(char(1114109));
text
sqlite> select length(char(1114109));
1
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-13 Thread Constantine Yannakopoulos
On Tue, May 13, 2014 at 6:50 PM, Jan Slodicka  wrote:

> Any comments are welcome.


​This is very interesting Jan. The only way this could fail is if the
collation implementation does something funny if it encounters this
character​, e.g. choose to ignore it when comparing. Since most collations
end up calling the OS unicode API, and this handles U+10FFFD correctly,
this should be a very rare case.

This may be a reason for Dr Hipp to reject adding this to the LIKE
optimization, but anyone could choose to make the optimization manually
using BETWEEN instead of LIKE if they are sure that any custom collations
they have coded and are using handle U+10FFFD correctly. This is obviously
true in your case.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-13 Thread Jan Slodicka
Constantine Yannakopoulos wrote
> On Mon, May 12, 2014 at 4:46 PM, Jan Slodicka <

> jano@

> > wrote:
> I understand that it is difficult to find the least greater character of a
> given character if you are unaware of the inner workings of a collation,
> but maybe finding a consistent upper limit for all characters in all
> possible collations is not impossible?
> 
> -- Constantine
> ___
> sqlite-users mailing list

> sqlite-users@

> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

I could be wrong when I refused this idea. U+10FFFD (represented as
surrogate pair "\uDBFF\uDFFD") may be that character. So one could replace
"LIKE 'xxx%'" by "BETWEEN('xxx', 'xxx' + '\uDBFF\uDFFD').

U+10FFFD is the highest codepoint from the Unicode Private Use Area (PUA).
Unicode standards do not specify anything for this area. So vendors and/or
apps (MS Word for example) can define here their own characters incl. their
properties (font, sorting etc.). I saw somewhere that somebody placed here
Japanese Kanji characters, for example.

msdn statement for CompareString: "All characters in the PUA are sorted
after all other Unicode characters. Within the area, characters are sorted
in numerical order." Hence Windows is OK.

I could not find any such notion for UCI library, but when I tested their
interactive demo, character U+10FFFD was really sorted at the end. (Even
with Japanese, Hindi etc.) That might mean that at least the standard setup
of the UCI library sorts U+10FFFD in the expected way. Of course, this is no
proof - UCI library can be customized.

After all, it seems that U+10FFFD is a good choice, at least for a vast
majority of cases.

BTW, my tests showed that the speed improvement of the LIKE optimization is
in the range 10x-100x.

Any comments are welcome.



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75643.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-13 Thread Jan Slodicka
> ​It can be implemented if the definition of a collation is extended to be 
able to provide this information as Simon suggested.

Sure, this could be done. But I am not sure whether it would be that
usefull. For example I would not use it for my current application (C#).



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75638.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Constantine Yannakopoulos
On Mon, May 12, 2014 at 6:45 PM, Simon Slavin  wrote:

> They're not the same, Jan.  This optimization is for one very specific use
> of LIKE: where the match string has no wildcards except for a percent sign
> at the end


In theory a LIKE can be optimized if it has a fixed part at the beginning​
and any combination of wildcards after that. The fixed part is used to
define a range in an index (if there is one) and then the wildcard part -if
not a plain '%' taht defines 'everything'- is applied as a filter to each
record in the range to narrow down the result. Now, whether this is to be
preferred instead of a full table scan should be a job for the query
planner.

--Constantine
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Simon Slavin

On 12 May 2014, at 4:05pm, Jan Slodicka wrote:

> And whether we'll call it "LIKE optimization" or "using BETWEEN", the
> problem remains basically the same.

They're not the same, Jan.  This optimization is for one very specific use of 
LIKE: where the match string has no wildcards except for a percent sign at the 
end.  I don't think it's out of order to say "Wildcard matching is wildcard 
matching, which is slow.  If you want fast, use BETWEEN.".

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Constantine Yannakopoulos
On Mon, May 12, 2014 at 5:41 PM, Jan Slodicka  wrote:

> I think that the answer is as before: The LIKE optimization cannot be
> performed by the DB engine, but only by the collation author.
>

​It can be implemented if the definition of a collation is extended to be
able to provide this information as Simon suggested. For example, by
defining a second callback that provides the lower and upper bounds that
enclose the results of a LIKE prefix string. If this callback is not
implemented or if it returns null strings then no optimization should be
attempted.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Constantine Yannakopoulos
On Mon, May 12, 2014 at 5:22 PM, Simon Slavin  wrote:

> agree: it's not possible to deduce an optimization without understanding
> the collation.  It might be possible to introduce it in a future version of
> SQLite by requiring anyone who writes an optimization to supply routines
> which work out the proper strings for comparison.
>

​Indeed. And if these routines are not implemented (e.g. for Japanese that
do not have ordering)​ then no LIKE optimization should be attempted by the
engine.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Jan Slodicka
> However, I don't think it's necessary to solve this problem.  Just don't
try to optimize it.  Whoever is doing the programming knows that pattern
matching is slow.  If they want a fast solution they'll use BETWEEN instead. 
And that will make them have to provide their own comparison strings.
 
It looks like you are now talking to Dr. Hipp.

Because we - application programmers - will eventually have to do the
optimization; the users will tell us when they are frustrated from the
waiting.

And whether we'll call it "LIKE optimization" or "using BETWEEN", the
problem remains basically the same.




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75624.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Jan Slodicka
> ​I understand that it is difficult to find the least greater character of a 
given character

If you understand this, then you must admit that it cannot be done by the DB
engine.

Secondly:

I pointed out to the problems when deriving the string "abd" from "abc".
However, you suggest now a different way: extending "abc" to "abcX". Here
are first problems that came to my mind:

- Some languages do not have any official ordering at all - take Japanese,
for example.

- CZE/SVK: Adding 'Z' may lead to "DZ". (specific sorting applies)

- Look at Unicode code charts. What would you select? Many (if not most)
characters with high codes would be probably ignored by the collation. In
such a case your optimization would lead to no results at all.

I think that the answer is as before: The LIKE optimization cannot be
performed by the DB engine, but only by the collation author.



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75623.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Simon Slavin

On 12 May 2014, at 3:05pm, Constantine Yannakopoulos wrote:

> ​I understand that it is difficult to find the least greater character of a
> given character if you are unaware of the inner workings of a collation,
> but maybe finding a consistent upper limit for all characters in all
> possible collations is not impossible?

I agree: it's not possible to deduce an optimization without understanding the 
collation.  It might be possible to introduce it in a future version of SQLite 
by requiring anyone who writes an optimization to supply routines which work 
out the proper strings for comparison.

However, I don't think it's necessary to solve this problem.  Just don't try to 
optimize it.  Whoever is doing the programming knows that pattern matching is 
slow.  If they want a fast solution they'll use BETWEEN instead.  And that will 
make them have to provide their own comparison strings.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Stephan Beal
On Mon, May 12, 2014 at 4:05 PM, Constantine Yannakopoulos <
alfasud...@gmail.com> wrote:

> ​I understand that it is difficult to find the least greater character of a
> given character if you are unaware of the inner workings of a collation,
> but maybe finding a consistent upper limit for all characters in all
> possible collations is not impossible?
>


i don't know if this helps, but i recently ran across a query both in the
Fossil SCM and in SVN (via a post on this list) which does something
similar for paths:

char const * zCollation = fsl_cx_filename_collation(f);
rc = fsl_db_prepare(db, &st,
"SELECT id FROM vfile WHERE vid=%"FSL_ID_T_PFMT
" AND (pathname=%Q %s "
"OR (pathname>'%q/' %s AND pathname<'%q0' %s))",
(fsl_id_t)vid,
zName, zCollation, zName,
zCollation, zName, zCollation )

that zCollation part resolves to either an empty string or "COLLATE
nocase," depending on application-level settings. zPath is a filename or
dir name. the 'vid' part there is not relevant for you, but the pathname
conditions sound similar to what you are trying to achieve.


-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Jan Slodicka
Hi Clemens,

I know that link very well. I answered Constantine's statement who claimed
that Sqlite implementation could be less restrictive. I just tried to
explain why it is not possible.

Jan



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75621.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Constantine Yannakopoulos
On Mon, May 12, 2014 at 4:46 PM, Jan Slodicka  wrote:

> Sqlite LIKE optimization is described as follows:
> A like pattern of the form "x LIKE 'abc%'" is changed into constraints
> "x>='abc' AND x<'abd' AND x LIKE 'abc%'"
>

Actually, I would do something ​like:

"x>='abc' AND x<'ab
​c​
'
​ || ​
AND x LIKE 'abc%'"

​where  is a string with a single character that is
guaranteed to be greater than -and NOT equal to- any other character. For
instance, if the encoding was single-byte ANSI​ it would be something like:

"x>='abc' AND x<'ab
​c​
'
​ || Char(255)
AND x LIKE 'abc%'"

​I understand that it is difficult to find the least greater character of a
given character if you are unaware of the inner workings of a collation,
but maybe finding a consistent upper limit for all characters in all
possible collations is not impossible?

-- Constantine
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Clemens Ladisch
Jan Slodicka wrote:
> Sqlite LIKE optimization is described as follows:
> A like pattern of the form "x LIKE 'abc%'" is changed into constraints
> "x>='abc' AND x<'abd' AND x LIKE 'abc%'"
>
> If you look into sqlite code, then the string "abd" is generated from "abc"
> using a trivial algebra.
>
> However, this algebra won't work on a custom collation.

 says:
| Terms that are composed of the LIKE or GLOB operator can sometimes be
| used to constrain indices. There are many conditions on this use:
| [...]
| 6. For the LIKE operator, if case_sensitive_like mode is enabled then
|the column must indexed using BINARY collating sequence, or if
|case_sensitive_like mode is disabled then the column must indexed
|using built-in NOCASE collating sequence.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-05-12 Thread Jan Slodicka
Sqlite LIKE optimization is described as follows:
A like pattern of the form "x LIKE 'abc%'" is changed into constraints
"x>='abc' AND x<'abd' AND x LIKE 'abc%'"

If you look into sqlite code, then the string "abd" is generated from "abc"
using a trivial algebra.

However, this algebra won't work on a custom collation. Right now I am
investigating a similar issue (we use custom case-insensitive collations)
and come to the conclusion that the LIKE optimization cannot be done without
the specific knowledge of the collation used.

In other words, it cannot be done by the Sqlite engine.

Two extreme examples:

CZE/SVK: 
If you replace "x LIKE 'ch%'" by "ch < x AND x < 'ci'", you'll get no
results.

DAN/NOR:
How would you replace "aa%"? (Note this correct sorting: Zorro < Aaron)
Or what about this: "\u0061\u030A"? (Å written with combining diacritics.)

Look  here    for more
problems.





--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/LIKE-operator-and-collations-tp73789p75617.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-10 Thread Constantine Yannakopoulos
On Sun, Feb 9, 2014 at 8:27 PM, Simon Slavin  wrote:

> I know it's a hack.  But it's an elegant efficient hack that takes
> advantage of the things SQLite does well.  As long as that's the only way
> you were using LIKE.
>

Don't get me wrong, the solution is good. But apart from the specific
problem I also started the thread in order to prove that the implementation
of the LIKE optimization in SqLite is not all it could be. Apart from mixed
languages there are other, less extreme scenarios where a
collation-sensitive like optimization will come in handy. For instance, for
languages with accents it would be nice to be able to create a
case-insensitive accent-insensitive (CI_AI) collation and be able to use
LIKE on it, even if it doesn't use an index. And overloading the LIKE
operator globally is not a good idea because it will affect all LIKE
operations in a database, even in columns that are not CI_AI.

I was hoping to elicit a response from D. R. Hipp but he has chosen not to
respond.

--
Constantine
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-09 Thread Simon Slavin

On 9 Feb 2014, at 6:05pm, Constantine Yannakopoulos  
wrote:

> - You are assuming that 'z' is the higher order character that can appear
> in a value. This is not the case; for instance greek characters have higher
> order than 'z'. This can be fixed (only for latin/greek) by using the
> highest order greek character 'ώ' (accented omega) instead of 'z'; but I
> would prefer a very high-order non-printable one instead.

By all means replace the '' I typed with a string of 200 accented 
lower-case omegas.  Or with (char)0x7F or some other equivalent.  Since (if I 
remember my Greek) it's impossible for a word to contain three omegas in a row, 
I doubt you'll come across any occurrences of it from anyone who is using your 
program realistically.

Hmm.  I assume that some internal part of SQLite would actually be putting this 
string through your collation function.  It would presumably happen 
automatically as part of what SQLite does.

I know it's a hack.  But it's an elegant efficient hack that takes advantage of 
the things SQLite does well.  As long as that's the only way you were using 
LIKE.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-09 Thread Constantine Yannakopoulos
Thank you all for your
replies,

@RSmith:

My question is about the _specific_ case:

SELECT * FROM ATable WHERE AColumn LIKE
'FixedSearchPart' || 'SomeWildCard'

where the right side of the LIKE operator is a constant that has a fixed
part at the beginning and a wildcard after that. The optimization is that
the index is used to produce a rowset in which AColumn starts with
'FixedSearchPart' and then rows are filtered according to 'SomeWildCard'.
For instance, in

SELECT * FROM ATable WHERE AColumn LIKE
'Constantine%Yann%'

As I understand it, the index scan will use the string 'Constantine' as
argument and then the full string 'Constantine%Yann%' will be used to
further filter the rowset. Of course any other case that has no fixed part
at the start of the right-side string will have to fall back to a full scan
and filter.

Maybe the parameter notation was a little confusing but from the
description you can deduce that it will not contain a wildcard.
Essentially, the user will be asking for the rows where AColumn BEGINS WITH
a string.

@Jean-Christophe:

Thank you for the extension. I will certainly have a look at it, but I
already have the comparison algorithm (specific to latin/greek) from
another case. So the only thing I have to do is to tailor it inside a
custom collation.

@Yuriy:

Yes, the current interface for custom functions does not provide any
information on what the arguments are, so it is impossible to deduce the
collation of arguments inside the function body. That's why this has to be
implemented by the SqLite query optimizer itself. I was thinking about
something like:

- The query optimizer understands that the left-side of the LIKE argument
is a column with an index and a collation, and the right side is a literal
with a fixed left part (after parameter substitution).
- It takes the fixed part, appends a very high-order (preferably
non-printable) character to it and use the index to materialize this clause:

  AColumn BETWEEN 'FixedPart' AND 'FixedPart' || .

If we were using ANSI I would suggest something like:

  AColumn BETWEEN 'FixedPart' AND 'FixedPart' || (char)0xFF

but for UTF-8 I am not sure what to suggest. It would have to be a UTF8
character value that is guaranteed to be greater than any other usable
character in any code page. I am not sure whether there actually is one.

-Then it applies the whole right-side argument to this rowset using LIKE
pattern matching to produce the correct result.

@Simon:

Your proposal is very clever but it has two weaknesses:
- You are assuming that 'z' is the higher order character that can appear
in a value. This is not the case; for instance greek characters have higher
order than 'z'. This can be fixed (only for latin/greek) by using the
highest order greek character 'ώ' (accented omega) instead of 'z'; but I
would prefer a very high-order non-printable one instead.
- It assumes that the column has a maximum width. Since SQLite does not
have maximum widths for columns of TEXT affinity (although a max width can
be implied and imposed by the app itself), improbable as it may be, there
could be a row with the value :SearchString||''||'a' which will
erroneously not make it into the result set. Again, the very high-order
non-printable character would solve this.
So it can be a solution for a specific case with a given max value and a
known set of code pages but it cannot be a general solution.

--Constantine.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-09 Thread Simon Slavin

On 9 Feb 2014, at 4:21pm, Yuriy Kaminskiy  wrote:

> Unfortunately, builtin LIKE ignores collation, and if you override LIKE (as in
> ICU extension), sqlite won't use index for optimization.

However, this particular use of LIKE ...

> SELECT * FROM ATable WHERE AColumn LIKE :SearchString || '%'

is really just a call to BETWEEN:

SELECT * FROM ATable WHERE AColumn BETWEEN :SearchString AND 
:SearchString||''

And BETWEEN translates into >= and <=.  And those /will/ use the index.

So implementing a 'greek soundex' function as a collation would be useful, if 
this is the only use you need to make of LIKE.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-09 Thread Yuriy Kaminskiy
Constantine Yannakopoulos wrote:
> I have a case where the user needs to perform a search in a text column of
> a table with many rows. Typically the user enters the first n matching
> characters as a search string and the application issues a SELECT statement
> that uses the LIKE operator with the search string:
> 
> SELECT * FROM ATable WHERE AColumn LIKE :SearchString || '%'
> 
> According to the LIKE optimization this statement will use an index so it
> will be fast.

FWIW, sqlite only use index if pattern is string constant or placeholder, it
won't try to use index optimization with more complex constant expression.
I.e.
   SELECT * FROM ATable WHERE AColumn LIKE :SearchString
and append % to your search string outside of sql.
(Well, it does not matter, as I think your following question does not have
acceptable solution).

> The application is used by Greek users. The greek alphabet has some letters
> that are visually identical to corresponding latin letters when in
> capitals, (e.g. A, E, I, O, B, H, K, etc), some that are different but
> match deterministically to a single latin letter (e.g. Σ = S, Φ = F, Ω = O
> etc.) and some that don't have a deterministic match (e.g. Γ = Gh or Yi).

You may want to look at ICU extension, but then sqlite won't use index.

Another option may be fts extension (likely, with custom tokenizer function).

> The table contains strings that consist of words that can be written in
> either latin or greek characters; sometimes even mixed (the user changed
> input locale midword before typing the first non-common letter). I have a
> request that the search should match strings that are written with either
> latin or greek or mixed letters, e.g. "MIS" should match "MISKO" (all
> latin), "ΜΙΣΚΟ" (all greek) or "MIΣΚΟ" (first two letters latin, rest
> greek). I thought of using a custom collation that does this type of
> comparison, have the column use that collation and create an index on that
> column to speed up the search but I discovered that the LIKE operator
> either will not use collations other than BINARY and NOCASE (pragma
> case_sensitive_like) or (if overloaded to perform custom matching) will not

I think "will not use index" is a bug that was fixed in latest sqlite version;
however, "ignoring collation" is intended behavior, and not easy to change :-(.

> use an index, and, worse yet, its behaviour will be the same to all string
> comparisons regardless of collation. So, a full table scan seems inevitable.
> I was wondering whether it is realistic to ask for the LIKE operator to use
> by default the assigned collation of a column. I assume that an index on

From first look, it won't be easy. Probably impossible without changing current
sqlite interface for user-defined collation/function/etc. Besides, it will break
compatibility with existing code, so would require some new PRAGMA to enable.

> that column is using by default the specified collation of the column for
> comparisons, so a LIKE clause like the aforementioned can use the index and
> perform a fast search while using the "mixed" comparison I need. This would
> transparently solve my problem and make the case_sensitive_like pragma
> redundant, but for backward compatibility this behaviour could be activated
> by a new pragma.
> 
> Are there any details I am missing that prevent this from being implemented?
> 
> Thanks in advance.
> 
> --Constantine.


RSmith wrote:
[...]
> It will be a matter of finding the most acceptable deficit... Whether it
> be size, time waste, upgrade cost etc.  By the way, I don't think
> upgrading the table schemata need to be a real hard thing... some
> scripts can take care of that in minimum amount of time. (Test them
> thoroughly though). Also, another poster here had developed a full set
> of international collations and comparison mechanisms as a loadable
> extension to SQLite - Nunicode by Aleksey Tulinov I think... link here:
>
> https://bitbucket.org/alekseyt/nunicode

Unfortunately, builtin LIKE ignores collation, and if you override LIKE (as in
ICU extension), sqlite won't use index for optimization.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-09 Thread RSmith


On 2014/02/09 13:18, Constantine Yannakopoulos wrote:

Hello all,

I have a case where the user needs to perform a search in a text column of
a table with many rows. Typically the user enters the first n matching
characters as a search string and the application issues a SELECT statement
that uses the LIKE operator with the search string:

SELECT * FROM ATable WHERE AColumn LIKE :SearchString || '%'

According to the LIKE optimization this statement will use an index so it
will be fast.


This can't be true, there is no way a LIKE or GLOB operator can always use an Index (I mean, it can "use" the Indexed column, but it 
cannot always reduce the number of iterations via Indexing).


To explain more what I mean, consider a simple binary search, let's say we have an ordered list of names that we use the alphabet as 
an indexer:


0:Abraham
1:Ben
2:Constantine
3:Igor
4:James
5:John
6:Ryan
7:Simon

A binary search would start by hitting the middle item typically ( i = Length div 2 ) which is 4 in this case, then comparing it to 
the searched item, let's say it is "JOH" in this case.  It sees that Idx 4 is James, which is smaller than "JOH" (no-case collation 
enabled), then looks to divide the remainder of items larger than James (Idx 5, 6 and 7) in two (3 div 2 = 1 ) and adds it to the 
bottom of them (5) so it checks Index 6, which is now Ryan and is higher than "JOH", it then divides into below Ryan and above James 
and obviously gets "John" which is a match.


A binary tree works similar with the difference it does not have to divide anything, the tree node children are already divisive so 
it just follows down the node closest to the target match until a definite match or matches is/are found (depending on search criteria).


The list above does however demonstrate why a LIKE operator cannot always use an Index, let's say I'm using a search for LIKE '%n', 
how on Earth would you be able to look for that by binary jumping through the list? ANY Indexed item might end on an n, indeed 4 of 
those above do, there is no way to tell and a full-table scan is inevitable.


Of course some clever DB systems, of which SQLite is one, can detect when you use LIKE "Jam%" and knows this is index-searchable and 
still use the Index, but it all depends on what you type and where those % signs are - something which is again negated if the 
search collation does not match the column collation, but is rather easy when standard text or binary collations are used.




store two text columns in the table.  The first is the text as entered.
  The second is your text reduced to its simplified searchable form,
probably all LATIN characters, perhaps using some sort of soundex.  Search
on the second column but return the text in the first.

This allows you to write your conversion routine in the language you're
using to do the rest of your programming, instead of having to express it
as a SQLite function.


Thanks for your reply,


Yes. I thought of that as well. I even have the greek soundex() function
from a previous implementation. Problem is it will bloat the database
considerably, keeping in mind that the users will typically demand that 
ALL
searchable text columns in the application work that way, and who can 
blame
them? And the project manager will not be very keen on accepting both 
this
database size increase and the time needed to calculate the extra 
soundex
column for every database row. It will be much easier to convince this
person to accept the time-costly database upgrade needed in both cases
(tables need to be recreated to change collation) but not both upgrade 
and
bloat.


And I am not happy to accept the fact that I cannot fly, but the laws of the 
Universe demands I adhere to the deficit, and when I simply have to fly, employ 
the help of a very large costly winged tube with jet engines attached to it!

It will be a matter of finding the most acceptable deficit... Whether it be 
size, time waste, upgrade cost etc.  By the way, I don't think upgrading the 
table schemata need to be a real hard thing... some scripts can take care of 
that in minimum amount of time. (Test them thoroughly though). Also, another 
poster here had developed a full set of international collations and comparison 
mechanisms as a loadable extension to SQLite - Nunicode by Aleksey Tulinov I 
think... link here:

https://bitbucket.org/alekseyt/nunicode





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-09 Thread Constantine Yannakopoulos
On Sun, Feb 9, 2014 at 1:25 PM, Simon Slavin  wrote:

> store two text columns in the table.  The first is the text as entered.
>  The second is your text reduced to its simplified searchable form,
> probably all LATIN characters, perhaps using some sort of soundex.  Search
> on the second column but return the text in the first.
>
> This allows you to write your conversion routine in the language you're
> using to do the rest of your programming, instead of having to express it
> as a SQLite function.
>

Thanks for your reply,


Yes. I thought of that as well. I even have the greek soundex() function
from a previous implementation. Problem is it will bloat the database
considerably, keeping in mind that the users will typically demand that ALL
searchable text columns in the application work that way, and who can blame
them? And the project manager will not be very keen on accepting both this
database size increase and the time needed to calculate the extra soundex
column for every database row. It will be much easier to convince this
person to accept the time-costly database upgrade needed in both cases
(tables need to be recreated to change collation) but not both upgrade and
bloat.

--
Constantine.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and collations

2014-02-09 Thread Simon Slavin

On 9 Feb 2014, at 11:18am, Constantine Yannakopoulos  
wrote:

> So, a full table scan seems inevitable.

I can't answer the specific question you asked, but I have a suggestion for 
your program: store two text columns in the table.  The first is the text as 
entered.  The second is your text reduced to its simplified searchable form, 
probably all LATIN characters, perhaps using some sort of soundex.  Search on 
the second column but return the text in the first.

This allows you to write your conversion routine in the language you're using 
to do the rest of your programming, instead of having to express it as a SQLite 
function.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] LIKE operator and collations

2014-02-09 Thread Constantine Yannakopoulos
Hello all,

I have a case where the user needs to perform a search in a text column of
a table with many rows. Typically the user enters the first n matching
characters as a search string and the application issues a SELECT statement
that uses the LIKE operator with the search string:

SELECT * FROM ATable WHERE AColumn LIKE :SearchString || '%'

According to the LIKE optimization this statement will use an index so it
will be fast.

The application is used by Greek users. The greek alphabet has some letters
that are visually identical to corresponding latin letters when in
capitals, (e.g. A, E, I, O, B, H, K, etc), some that are different but
match deterministically to a single latin letter (e.g. Σ = S, Φ = F, Ω = O
etc.) and some that don't have a deterministic match (e.g. Γ = Gh or Yi).

The table contains strings that consist of words that can be written in
either latin or greek characters; sometimes even mixed (the user changed
input locale midword before typing the first non-common letter). I have a
request that the search should match strings that are written with either
latin or greek or mixed letters, e.g. "MIS" should match "MISKO" (all
latin), "ΜΙΣΚΟ" (all greek) or "MIΣΚΟ" (first two letters latin, rest
greek). I thought of using a custom collation that does this type of
comparison, have the column use that collation and create an index on that
column to speed up the search but I discovered that the LIKE operator
either will not use collations other than BINARY and NOCASE (pragma
case_sensitive_like) or (if overloaded to perform custom matching) will not
use an index, and, worse yet, its behaviour will be the same to all string
comparisons regardless of collation. So, a full table scan seems inevitable.

I was wondering whether it is realistic to ask for the LIKE operator to use
by default the assigned collation of a column. I assume that an index on
that column is using by default the specified collation of the column for
comparisons, so a LIKE clause like the aforementioned can use the index and
perform a fast search while using the "mixed" comparison I need. This would
transparently solve my problem and make the case_sensitive_like pragma
redundant, but for backward compatibility this behaviour could be activated
by a new pragma.

Are there any details I am missing that prevent this from being implemented?

Thanks in advance.

--Constantine.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] LIKE operator on virtual table assumes ASCII collation

2011-09-09 Thread Ben Harper
>From xBestIndex and xFilter, there is no way to determine the intended 
>collation.
The default collation for a field is binary.
However, a LIKE operator needs an ASCII collation, and in the case of a virtual 
table, the
constraints passed in to xBestIndex and xFilter assume an ASCII collation. 
However, it is
impossible for xBestIndex or xFilter to know this.

For example (pseudo code):

CREATE VIRTUAL TABLE tab( txt TEXT );
INSERT INTO tab VALUES( 'AA');
INSERT INTO tab VALUES( 'AB');

SELECT * FROM tab WHERE txt LIKE 'AA%';
AA  -- This is correct

However, xBestIndex receives the equivalent of the following constraints:

SELECT * FROM tab WHERE txt >= 'AA' AND txt < 'Ab';
AA
AB -- Wrong

Now, these constraints are correct if we're working with an ASCII collation, 
but xBestIndex has no way of knowing that.

Am I missing some piece of information here, or is this a genuine shortcoming 
of the virtual table mechanism?

Regards,
Ben

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] like operator

2010-12-10 Thread Kees Nuyt
On Thu, 9 Dec 2010 23:23:37 -0800 (PST), salmaan
 wrote:

>Hi
>
>i need your help i want a query...
>
>i have a question for you suppose i have table name person in that column
>name description which contains the below words
>
>The govt has asked airlines to issue an all-inclusive and reasonable fares
>in a consumer-friendly manner, something that only Air India (domestic) has
>done
>
>i want a like(%) query which will give output 20 character from the
>selection suppose if i write a query
>
>SELECT * FROM Person
>WHERE description LIKE '%gov%'
>
>it will give the output
>
>The govt has asked airlines to issue an all-inclusive and reasonable fares
>in a consumer-friendly manner, something that only Air India (domestic) has
>done
>
>what i want  only 20 or 30 chracter from the like(%govt%) it can be any like
>character (%has%) or(%to%) it may vary but the o/p must be 20 or 30
>character from the like(%) it should not contains all the charter
>
>the output must in this way 
>govt has asked airlines to issue an all-inclusive and reasonable fares in a
>consumer-friendly manner

This example does not match your requirement. It's over 100
characters, not 20 to 30. According your requirement, it would be
'govt has asked airlines to is...'

This is best done in the host language, you can get the offset of
the LIKE term with the charindex() or pos() function and cut off the
string with the substr() function. The scope of sqlite is data and
set operations, the scope of the host language is processing and
presentation.

SQLite does offer substr(), but not charindex().
There are sqlite extension libraries which do support extra
functions, for example:

http://www.mail-archive.com/sqlite-users@sqlite.org/msg21791.html

(download the zip file and have a look at source file func_exp.c)
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] like operator

2010-12-09 Thread salmaan

Hi

i need your help i want a query...

i have a question for you suppose i have table name person in that column
name description which contains the below words

The govt has asked airlines to issue an all-inclusive and reasonable fares
in a consumer-friendly manner, something that only Air India (domestic) has
done

i want a like(%) query which will give output 20 character from the
selection suppose if i write a query

SELECT * FROM Person
WHERE description LIKE '%gov%'

it will give the out put

The govt has asked airlines to issue an all-inclusive and reasonable fares
in a consumer-friendly manner, something that only Air India (domestic) has
done

what i want  only 20 or 30 chracter from the like(%govt%) it can be any like
character (%has%) or(%to%) it may vary but the o/p must be 20 or 30
character from the like(%) it should not contains all the charter

the output must in this way 
govt has asked airlines to issue an all-inclusive and reasonable fares in a
consumer-friendly manner
-- 
View this message in context: 
http://old.nabble.com/like-operator-tp30423979p30423979.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator to support GREEK

2008-11-06 Thread indiababu

Hi

Here is the example I worked..



> Creating the table
SqlLite.exe test.db "create table t3 (t3key INTEGER PRIMARY KEY,path
TEXT,num double,timeEnter DATE , validate BOOL);"

> Inserting into the table

SqlLite.exe test.db  "insert into t3 (path,num) values
('D:/CCVweiew/sampleexample/data/path',3);"

SqlLite.exe test.db  "insert into t3 (path,num) values
('D:/CCdfView/sample/dadfta2/abcd',3);"

SqlLite.exe test.db  "insert into t3 (path,num) values
('D:/CCVdfiew/sample/data/efgh',3);"

>
SqlLite.exe test.db  "select * from t3 ";

t3key = 1
path = D:/CCVweiew/sampleexample/data/path
num = 3.0
timeEnter = NULL
validate = NULL

t3key = 2
path = D:/CCdfView/sample/dadfta2/abcd
num = 3.0
timeEnter = NULL
validate = NULL

t3key = 3
path = D:/CCVdfiew/sample/data/efgh
num = 3.0
timeEnter = NULL
validate = NULL

> updating when path matches '/sample/'

SqlLite.exe  test.db "update t3 set validate=500 where path like
'%/sample/%'" 

>SqlLite.exe test.db  "select * from t3 ";


t3key = 1
path = D:/CCVweiew/sampleexample/data/path
num = 3.0
timeEnter = NULL
validate = NULL

t3key = 2
path = D:/CCdfView/sample/dadfta2/abcd
num = 3.0
timeEnter = NULL
validate = 500

t3key = 3
path = D:/CCVdfiew/sample/data/efgh
num = 3.0
timeEnter = NULL
validate = 500

Hope it solves ur probs..

--
ABC


mikewhit wrote:
> 
> Gerasimos Xydas <[EMAIL PROTECTED]> writes:
> 
>> 
>> Hello,
>> 
>> I have built an SQLite 3 database from C code.
>> 
>> CASE 1
>> 
> ...
>> CASE 2
>> 
> ...
>> 
>> Best regards,
>> Gerasimos
>> 
> 
> Search the newsgroup ... start here
> http://thread.gmane.org/gmane.comp.db.sqlite.general/42112
> 
> Regards,
> MikeW
> 
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 

-- 
View this message in context: 
http://www.nabble.com/LIKE-operator-to-support-GREEK-tp20341577p20374899.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator to support GREEK

2008-11-05 Thread MikeW
Gerasimos Xydas <[EMAIL PROTECTED]> writes:

> 
> Hello,
> 
> I have built an SQLite 3 database from C code.
> 
> CASE 1
> 
...
> CASE 2
> 
...
> 
> Best regards,
> Gerasimos
> 

Search the newsgroup ... start here
http://thread.gmane.org/gmane.comp.db.sqlite.general/42112

Regards,
MikeW




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] LIKE operator to support GREEK

2008-11-05 Thread Gerasimos Xydas
Hello,

I have built an SQLite 3 database from C code.

CASE 1

I tried some sql inserts from my application, in ASCII (Greek codepage).

1. I can "select * from tbl where name='GREEK_NAME';"
2. I can NOT "select * from tbl where name like '%GREEK_CHARACTERS%';" - 
I am getting no results.

CASE 2

I tried some sql inserts from my application, in UTF-8.

1. I can "select * from tbl where name='GREEK_NAME_IN_UTF8';"
2. I can NOT "select * from tbl where name like 
'%GREEK_CHARACTERS_IN_UTF8%';" -  I am getting no results.

- Is this has to do with using the ICU code?
- Is there any example how to do that?
- Any other ideas?

Best regards,
Gerasimos

-- 
Gerasimos Xydas
PhD in Informatics and Telecommunications

University of Athens
Division of Signal Processing and Telecommunications
tel: +30 210 7275320
http://www.di.uoa.gr/~gxydas
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-18 Thread [EMAIL PROTECTED]
> Change cache sizes using separate cache_size  
> pragmas for each attached database.
Thank you! (It would be nice if there is a hint for this behaviour in 
http://www.sqlite.org/lang_attach.html.)

Is this correct? (At least it does not return an error)
PRAGMA job01.cache_size=200
PRAGMA job02.cache_size=200
PRAGMA job03.cache_size=200

...or should I better use PRAGMA default_cache_size when the database I created?

So if the cache_size increases per attached database and my emebedded system 
has a limited amount of memory, the best 
solution to control the total amount of memory SQLite uses is to count / limit 
attached databases?



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-18 Thread D. Richard Hipp

On Aug 18, 2008, at 8:34 AM, [EMAIL PROTECTED] wrote:

> The example with just one main database does not consume more than  
> ~300KB
> which seems to be pretty close to the specified 'cache_size'. The  
> second example with three attached database consumes
> around 500KB per query and it looks like 'cache_size' setting is not  
> relevant.

The cache_size is specified separately for each ATTACHed database.   
The default is 2000.  So if you attach three databases, your total  
cache size will be 8000 (2000 for the original plus 2000 for each  
ATTACHed database).  Change cache sizes using separate cache_size  
pragmas for each attached database.

Please also note that in the next release (SQLite version 3.6.2) the  
cache_size pragma will probably become a no-op.  We are working on  
alternative mechanisms to control the amount of cache memory used.


D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-18 Thread [EMAIL PROTECTED]
> Depending on what you are storing in fs_textid and what your LIKE  
> pattern is, you might get much better performance (and lower memory  
> usage) if you use GLOB instead of LIKE and if you explicitly code the  
> pattern rather than using the wildcard "?", and if you create a new  
> index:
> CREATE INDEX newidx1 ON fs_main(fs_itemtype, fs_textid);
Thank you for the optimization hints – I changed the application that it uses 
all three suggestions and many queries 
are now executed faster. Although, not if the search string begins with a 
wildcard char-acter as described in http:
//www.sqlite.org/optoverview.html#like_opt under point 2. I assume this causes 
again most of the  database to be loaded 
into cache as this was the case with the LIKE opera-tor.

> Are you sure that the memory is not freed?  Calling free() does not  
> normally return memory to the operating system so just because the  
> process memory usage went up does *not* mean that the memory is still  
> in use. It might just mean that the memory is being held by the  
> malloc()/free() for possible reuse later.  What does the  
> sqlite3_memory_used() interface tell you?  What about  
> sqlite3_memory_highwater()?
I wrote two small example programs to illustrate the memory usage difference 
between working on the main database and 
working with multiple ATTACHed database. The example with just one main 
database does not consume more than ~300KB 
which seems to be pretty close to the specified 'cache_size'. The second 
example with three attached database consumes 
around 500KB per query and it looks like 'cache_size' setting is not relevant. 
All memory seems to be freed correctly.

--- Example A ---
--- One database only, cache size limited to 250 pages ---

long rc = SQLITE_OK;

sqlite3_int64 mem01;
sqlite3_int64 mem02;

// Open database connection
sqlite3* m_pDB;
sqlite3_open("job01.db", &m_pDB);   // application memory usage: 1.5 MB

// PRAGMA
sqlite3_stmt* pStmtPragma01(NULL);
rc = sqlite3_prepare_v2(m_pDB, "PRAGMA cache_size=250", -1, &pStmtPragma01, 
NULL);
rc = sqlite3_step(pStmtPragma01);   
  
rc = sqlite3_finalize(pStmtPragma01);

// Run queries
sqlite3_stmt* pStmtLike01(NULL);
rc = sqlite3_prepare_v2(m_pDB, "SELECT COUNT(fs_recid) FROM main.fs_main 
WHERE fs_itemtype=10 AND fs_textid GLOB 
'*1';", -1, &pStmtLike01, NULL);
mem01 = sqlite3_memory_used();  // 17937   ...application memory 
usage: 1.8MB
rc = sqlite3_step(pStmtLike01);  
mem01 = sqlite3_memory_used();  // 305725  ...application memory 
usage: 2.0MB
rc = sqlite3_finalize(pStmtLike01);
mem01 = sqlite3_memory_used();  // 302855  ...application memory 
usage: 2.0MB
mem02 = sqlite3_memory_highwater(1);// 305757

// Close database connection
sqlite3_close(m_pDB);   // application memory usage: 1.5 MB <- all memory 
is freed correctly




--- Example B ---
--- :memory: database and three attached databases, cache size limited to 250 
pages ---

long rc = SQLITE_OK;

sqlite3_int64 mem01;
sqlite3_int64 mem02;

// Open database connection
sqlite3* m_pDB;
sqlite3_open(":memory:", &m_pDB);   // application memory usage: 1.5 MB

// PRAGMA
sqlite3_stmt* pStmtPragma01(NULL);
rc = sqlite3_prepare_v2(m_pDB, "PRAGMA cache_size=250", -1, &pStmtPragma01, 
NULL);
rc = sqlite3_step(pStmtPragma01);   
  
rc = sqlite3_finalize(pStmtPragma01);

// ATTACH job databases
sqlite3_stmt* pStmtAttach01(NULL);
rc = sqlite3_prepare_v2(m_pDB, "ATTACH 'job01.db' AS _job01;", -1, 
&pStmtAttach01, NULL); 
rc = sqlite3_step(pStmtAttach01);   
  
rc = sqlite3_finalize(pStmtAttach01);   
  
sqlite3_stmt* pStmtAttach02(NULL);
rc = sqlite3_prepare_v2(m_pDB, "ATTACH 'job02.db' AS _job02;", -1, 
&pStmtAttach02, NULL); 
rc = sqlite3_step(pStmtAttach02);   
   
rc = sqlite3_finalize(pStmtAttach02);   
  
sqlite3_stmt* pStmtAttach03(NULL);
rc = sqlite3_prepare_v2(m_pDB, "ATTACH 'job03.db' AS _job03;", -1, 
&pStmtAttach03, NULL); 
rc = sqlite3_step(pStmtAttach03);   
   
rc = sqlite3_finalize(pStmtAttach03);   
  

// Run queries
sqlite3_stmt* pStmtLike01(NULL);
rc = sqlite3_prepare_v2(m_pDB, "SELECT COUNT(fs_recid) FROM _job01.fs_main 
WHERE fs_itemtype=10 AND fs_textid 
GLOB '*1';", -1, &pStmtLike01, NULL);
mem01 = sqlite3_memory_used();  // 42568   ...application memory 
usage: 1.8MB
rc = sqlite3_step(pStmtLike01);  
mem01 = sqlite3_memory_used();  // 5

Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread [EMAIL PROTECTED]
Our e-mails crossed on the way to the list... sorry for the confusion.

Thank you for your advices - i'll follow up them to make the mentioned wildcard 
search working.

Basically, I'm looking for a way to have a fast wildcard search on a dataset 
that is distributed over multiple 
databases whose are attached to a main database. I hope I can do it with GLOB.

At the moment, the question is still unanswered, why LIKE consumes multiple 
times 2.5MB when applied to attached 
dabases. 

Daniel


Ursprüngliche Nachricht
Von: [EMAIL PROTECTED]
Datum: 15.08.2008 16:30
An: "General Discussion of SQLite Database"
Betreff: Re: [sqlite] LIKE operator and ATTACH databases memory usage


On Aug 15, 2008, at 8:06 AM, [EMAIL PROTECTED] wrote:

> Hello
>
> Why does SQLite consume 2.5MB memory every time when running a  
> statement on a attached database with LIKE operator?
>
> Example 1:
> SELECT fs_rec FROM fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes <50kB RAM
>
> Example 2:
> SELECT fs_rec FROM _job01.fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes 2.5MB RAM

(Note to mailing list readers:  Daniel sent me a sample database by  
private email)

The database is about 3MB in size and the example 2 query is probably  
doing something close to a full table scan.  This causes most of the  
database to be loaded into cache.  That will use about 2.5MB of RAM.   
The cache will flush itself automatically when you close the database  
connection or when the cache becomes stale.

Depending on what you are storing in fs_textid and what your LIKE  
pattern is, you might get much better performance (and lower memory  
usage) if you use GLOB instead of LIKE and if you explicitly code the  
pattern rather than using the wildcard "?", and if you create a new  
index:

 CREATE INDEX newidx1 ON fs_main(fs_itemtype, fs_textid);

See http://www.sqlite.org/optoverview.html#like_opt

D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread [EMAIL PROTECTED]
I'm going to check sqlite3_memory_used()/sqlite3_memory_highwater() next week 
as soon as I can.

At the moment the following is clear:
The application needs to run the same statement with LIKE operator for multiple 
attached databases. On the embedded
side, the device crashes after a few statements because there is no more RAM. 
On the desktop pc side, all statements
are successful, but process viewer shows an increased amount of memory used by 
the application.

I would be happy if the problem is in our application, but I wasn't able to 
find anything yet.

Why does the statement with LIKE (Example 2) consume the memory and the 
statement without LIKE (Example3) does not
consume the memory on a ATTACH'ed database?

Database schema:
CREATE TABLE
fs_main (
'fs_recid' INTEGER PRIMARY KEY NOT NULL,
'fs_itemtype' INTEGER,
'fs_textid' TEXT,
'fs_flag1' INTEGER,
'fs_object' BLOB );

The size of a single record is typically 100 bytes.

Please let me know if you didnt get the example database I sent directly to 
[EMAIL PROTECTED]

Regards
Daniel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread [EMAIL PROTECTED]
> This causes most of the database to be loaded into cache.
Is there one cache per database connection or one cache per ATTACH'ed database?

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread D. Richard Hipp

On Aug 15, 2008, at 8:06 AM, [EMAIL PROTECTED] wrote:

> Hello
>
> Why does SQLite consume 2.5MB memory every time when running a  
> statement on a attached database with LIKE operator?
>
> Example 1:
> SELECT fs_rec FROM fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes <50kB RAM
>
> Example 2:
> SELECT fs_rec FROM _job01.fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes 2.5MB RAM

(Note to mailing list readers:  Daniel sent me a sample database by  
private email)

The database is about 3MB in size and the example 2 query is probably  
doing something close to a full table scan.  This causes most of the  
database to be loaded into cache.  That will use about 2.5MB of RAM.   
The cache will flush itself automatically when you close the database  
connection or when the cache becomes stale.

Depending on what you are storing in fs_textid and what your LIKE  
pattern is, you might get much better performance (and lower memory  
usage) if you use GLOB instead of LIKE and if you explicitly code the  
pattern rather than using the wildcard "?", and if you create a new  
index:

 CREATE INDEX newidx1 ON fs_main(fs_itemtype, fs_textid);

See http://www.sqlite.org/optoverview.html#like_opt

D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread D. Richard Hipp

On Aug 15, 2008, at 9:56 AM, [EMAIL PROTECTED] wrote:

>
> I don’t dare to use the term “leak” here. It is hard so say at the  
> moment where the memory is going to. At least, the
> memory is not freed when sqlite3_finalize() is called on the  
> statement. Might it be possible, that this memory is
> allocated once per attached database and used for caching reasons?
>

Memory leaks in SQLite are uncommon.  Especially 2.5MB memory leaks. See

http://www.sqlite.org/malloc.html#testing

Are you sure that the memory is not freed?  Calling free() does not  
normally return memory to the operating system so just because the  
process memory usage went up does *not* mean that the memory is still  
in use. It might just mean that the memory is being held by the  
malloc()/free() for possible reuse later.  What does the  
sqlite3_memory_used() interface tell you?  What about  
sqlite3_memory_highwater()?

Why does the first query require 2.5MB of memory?  I don't know.  That  
would depend on your schema and the content of the database.  Perhaps  
it is going to cache.  Perhaps something else.  If one of the rows in  
your table contains a 2.5MB blob, that would explain where the memory  
is going, would it not?  We will need quite a bit more information  
from you if we are to help you debug your problem.

D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread [EMAIL PROTECTED]

I don’t dare to use the term “leak” here. It is hard so say at the moment where 
the memory is going to. At least, the 
memory is not freed when sqlite3_finalize() is called on the statement. Might 
it be possible, that this memory is 
allocated once per attached database and used for caching reasons?

Attached… oh sorry, I meant: “a few databases are attached with the ATTACH 
command to the main in-memory database”. 
Please let me know if you really need the database file.

I can see how the memory usage of my application increases by 2.5MB as soon as 
I call sqlite3_step().

Database and parameters:
Records: Typically 20’000 records
Values for ?: fs_itemtype is an integer between 0 and 100
Values for ?: fs_textid is a string with length between 0 and 16 characters
Indexes: On fs_textid
Indexes: There is another index on fs_itemtype but is not used here

Environment:
SQLite version: 3.5.9 (we will release our product very soon)
Operation system: Windows XP and WinCE (problem exists on both platforms)
Compiled: Ourselves with Visual Studio 2005 (no changes to source code)

Regards
Daniel




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread D. Richard Hipp

On Aug 15, 2008, at 8:06 AM, [EMAIL PROTECTED] wrote:

> Hello
>
> Why does SQLite consume 2.5MB memory every time when running a  
> statement on a attached database with LIKE operator?

This mailing list strips off attachments.   Please send the database  
by some other means.

How do you know that SQLite is "consuming" 2.5MB?  What version of  
SQLite are you running?  What operating system? Did you compile it  
yourself or are using using a pre-build binary? What values are us  
using for the "?" parameters in your queries?


>
>
> Example 1:
> SELECT fs_rec FROM fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes <50kB RAM
>
> Example 2:
> SELECT fs_rec FROM _job01.fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes 2.5MB RAM
> SELECT fs_rec FROM _job02.fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes 2.5MB RAM
> SELECT fs_rec FROM _job03.fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes 2.5MB RAM
> SELECT fs_rec FROM _job04.fs_main WHERE fs_itemtype=? AND fs_textid  
> LIKE ?;   // consumes 2.5MB RAM
>
> Example 3:
> SELECT fs_rec FROM _job01.fs_main WHERE fs_itemtype=? AND  
> fs_textid=?;  // consumes <1KB RAM
>
> Memory is consumed immediately after the first sqlite3_step() is  
> called.
>
> The PRAGMA cache_size is set to 2000 on the in-memory main database.
>
> Regards
> Daniel
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] LIKE operator and ATTACH databases memory usage

2008-08-15 Thread [EMAIL PROTECTED]
Hello

Why does SQLite consume 2.5MB memory every time when running a statement on a 
attached database with LIKE operator?

Example 1:
SELECT fs_rec FROM fs_main WHERE fs_itemtype=? AND fs_textid LIKE ?;
// consumes <50kB RAM

Example 2:
SELECT fs_rec FROM _job01.fs_main WHERE fs_itemtype=? AND fs_textid LIKE ?; 
// consumes 2.5MB RAM
SELECT fs_rec FROM _job02.fs_main WHERE fs_itemtype=? AND fs_textid LIKE ?; 
// consumes 2.5MB RAM
SELECT fs_rec FROM _job03.fs_main WHERE fs_itemtype=? AND fs_textid LIKE ?; 
// consumes 2.5MB RAM
SELECT fs_rec FROM _job04.fs_main WHERE fs_itemtype=? AND fs_textid LIKE ?; 
// consumes 2.5MB RAM

Example 3:
SELECT fs_rec FROM _job01.fs_main WHERE fs_itemtype=? AND fs_textid=?;  
// consumes <1KB RAM

Memory is consumed immediately after the first sqlite3_step() is called. 

The PRAGMA cache_size is set to 2000 on the in-memory main database.

Regards
Daniel


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Like operator

2008-02-27 Thread Igor Tandetnik
"Mahalakshmi.m"
<[EMAIL PROTECTED]> wrote
in message
news:[EMAIL PROTECTED]
> Rowid Id Album
> 1 4 
> 2 3 
> 3 2 
> 4 1 
>
> SELECT rowid,Album FROM Temp WHERE Album like 'c%';
> Output: rowed -> 2 and Album -> 
>
> My doubt is for this statement
> "SELECT rowid , Album FROM Temp WHERE Album like 'b%'; " I am not
> having any Album with match pattern starting with b so I want the
> rowed and Album for the string next to the provided ie, rowed ->3
> and Album -> .

select rowid, album from temp
where album >= 'b' collate nocase
order by album limit 1;

That should work both when there is and isn't a match.

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Like operator

2008-02-27 Thread Mahalakshmi.m

Hi,
I am working in 3.3.6
My table is as follows:

"create table MUSIC(id integer primary key,Album text,Artist text);"
IdAlbum Artist
1   z
2   w
3   s
4   t

I want to sort the Album First and then I have to tell the rowid of the
particular string pattern.If that pattern is not present it has to provide
the next string match.

I tried as follows:
"CREATE TABLE IF NOT EXISTS Temp as SELECT Album from MUSIC ORDER BY Album;

Rowid   Id  Album   
1   4   
2   3   
3   2   
4   1   

SELECT rowid,Album FROM Temp WHERE Album like 'c%'; 
Output: rowed -> 2 and Album -> 

My doubt is for this statement
"SELECT rowid , Album FROM Temp WHERE Album like 'b%'; " I am not having any
Album with match pattern starting with b so I want the rowed and Album for
the string next to the provided ie, rowed ->3  and Album -> .

Can any one please help to solve my problem.

Thanks & Regards,
Mahalakshmi.M




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-30 Thread Spiros Ioannou

Thank you all for your input.
-Spiros

--
Image Video & Multimedia Systems Lab.
Department of Electrical & Computer Eng.
National Technical University of Athens
http://www.image.ece.ntua.gr

[EMAIL PROTECTED] wrote:

Spiros Ioannou <[EMAIL PROTECTED]> wrote:

John Stanton wrote:

Sqlite does cater for text searching.  Look at FTS.
This is not text searching. No stemming, etc etc is required. Column has 
exactly 1 word, and the 'LIKE' substring matching is performed at the 
words' first characters (not end-characters).

Thanks,


Index the column and use GLOB with a trailing '*' instead
of LIKE with a trailing '%'.  Make the search string a
single string literal token in the SQL:

 SELECT * FROM table WHERE xyz GLOB 'abc*';

Do not use an expression for 'abc*'.  Do not use bound
parameters for 'abc*'.  Make it a literal string within
the SQL.  If you do these things, then SQLite will use
an index to do the search and it will be very fast.  Way
faster than fgrep.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-28 Thread John Stanton
If performance is very important to you and you have a need for SQL 
elsewhere consider using the Sqlite virtual table interface.  Your 
virtual table can be a flat file which is memory mapped and you use 
either a regex or fast string search on it.  You will have a result in 
milliseconds.  You could return a rowid for further navigation through 
the database.


String searching on a memory resident list will be faster than using an 
index up to quite a large size table.  If you have a multi-processor 
machine you could partition the search into parallel tasks and get an 
advantage.  It appears that you are doing a simple string search and 
don't need regular expression overhead.


If you have a very large table you can build a B-tree index on it in 
your virtual table and access through that.  The index can also be 
memory mapped and will deliver millisecond access times from tables with 
many millions of entries.


We have achieved speed improvements on the scale of orders of magnitude 
by using such techniques on text and taking advantage of a high 
performance search algorithm.


Spiros Ioannou wrote:

John Stanton wrote:

Sqlite does cater for text searching.  Look at FTS.


This is not text searching. No stemming, etc etc is required. Column has 
exactly 1 word, and the 'LIKE' substring matching is performed at the 
words' first characters (not end-characters).

Thanks,
-Spiros

- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-28 Thread Trevor Talbot
On 11/28/07, Spiros Ioannou <[EMAIL PROTECTED]> wrote:

> > egrep gets to work with a flat text file, which it can easily read
> > sequentially and get optimal performance from the OS's file
> > buffering/cache management.  It only needs to read a piece of the file
> > and scan for patterns, repeating until done.  The only structure it
> > needs to be aware of is line breaks, but that is so simple it can be
> > folded into the pattern scan itself.
>
> You are mostly describing fgrep. Egrep does a regular expression search
> thus honouring column layout in the example I gave earlier. (FYI fgrep
> is 50% faster than egrep than egrep in this query)

I was describing grep and all related tools. The only relevant
difference between egrep and fgrep is what kind of pattern it looks
for; the searching method is the same. Egrep is not aware of column
structure, only lines.

> The points are two:
> 1) Could sqlite structure allow for faster full column scan queries?
> More than 1000% slower performance seems way too much difference.

The short answer is no.

The longer answer is maybe in theory (see research into
column-oriented or "vertical" datastores, such as the much-publicized
Vertica), but SQLite's usage scenarios don't really make it
appropriate. It would probably mean abandoning the single-file
database, for instance, and would make many other common queries much
less efficient. That's ignoring all sorts of other practical issues,
like drh's careful avoidance of patent-encumbered things.

It's possible there is room for optimizing this case a bit more the
way it is, but it will always be much slower than grep.

You may not realize what I mean about structured data. A plaintext
file looks like this:

LineLineLineLineLineLineLine...

An sqlite database, on the other hand, has lots of structure. It's
broken into fixed-sized pages, each of which has a small header at the
front. Each table needs a way to find the rows in that belong to it
(index on the rowid field). For each row, it needs to be able to
locate each field of data belonging to that row. Each field has,
besides the data itself, a type code and data length. All of this is
encoded in various space-efficient forms in the same file.

So, for instance, consider that a field of data is stored like:

   TypeLengthData...

and you can see where this is going.

When SQLite searches it, it needs to follow the structure to know
where it is and what it's looking at. It can't simply read the next
chunk of data and do pattern matching on it immediately, like grep
can.


> 2) an index could (and should) be used when using LIKE 'asdf'
>
> Well I actually solved this problem by using
> PRAGMA case_sensitive_like = 1;
> Now the index is being used (for all non-english characters).

Which is what you wanted, right?

> But the online manual states:
>
> "
> if case_sensitive_like mode is enabled then the column must use the
> default BINARY collating sequence,
>   if case_sensitive_like mode is disabled then the column must use the b
> uilt-in NOCASE collating sequence.
>
>   
>   NOCASE - The same as binary, except the 26 upper case characters used
>   by the English language are folded to their lower case equivalents
> before the comparison is performed.
> "
>
> so as I understand it there's no reason that the index was not used in
> the NOCASE scenario. It should do case-insensitive matches for the 26
> latin characters and case-sensitive (binary) for all the others.

It might be a little hard to process that part the first few times.
You can declare a collation when creating the index itself, either
BINARY or NOCASE. The default is BINARY, which is case sensitive (does
not assume US ASCII or do anything with case).

By default, LIKE is case insensitive, which means it does care about
case so it can treat 'A' as 'a'. Therefore it can only use a NOCASE
index for optimizing prefix matches, because the only way for an index
to work is if it treats the characters the same way. NOCASE is not the
index default though, and it isn't what you wanted because you have
non-english data anyway.

By turning case_sensitive_like on, you made LIKE look for an index
with BINARY collation instead. That matches the index you created, so
it can optimize that case now.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-28 Thread drh
Spiros Ioannou <[EMAIL PROTECTED]> wrote:
> John Stanton wrote:
> > Sqlite does cater for text searching.  Look at FTS.
> 
> This is not text searching. No stemming, etc etc is required. Column has 
> exactly 1 word, and the 'LIKE' substring matching is performed at the 
> words' first characters (not end-characters).
> Thanks,

Index the column and use GLOB with a trailing '*' instead
of LIKE with a trailing '%'.  Make the search string a
single string literal token in the SQL:

 SELECT * FROM table WHERE xyz GLOB 'abc*';

Do not use an expression for 'abc*'.  Do not use bound
parameters for 'abc*'.  Make it a literal string within
the SQL.  If you do these things, then SQLite will use
an index to do the search and it will be very fast.  Way
faster than fgrep.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-28 Thread Spiros Ioannou

John Stanton wrote:

Sqlite does cater for text searching.  Look at FTS.


This is not text searching. No stemming, etc etc is required. Column has 
exactly 1 word, and the 'LIKE' substring matching is performed at the 
words' first characters (not end-characters).

Thanks,
-Spiros

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-28 Thread John Stanton

Trevor Talbot wrote:

On 11/27/07, Spiros Ioannou <[EMAIL PROTECTED]> wrote:


I had a 135MB, 1256132 lines,  '@' separated text file containing
various words and text fields (like a dictionary).
Example record:
[EMAIL PROTECTED]@[EMAIL PROTECTED],[EMAIL PROTECTED] altana : μικρός ανθόκηπος 
- εξώστης,
ταράτσα@@@



I imported the data in sqlite3.3.6 but when querying with the 'like'
operator, the performance way too slow (about 1.5-2 seconds/query):

 >time sqlite3 dicts.db "select * from table1 where word like 'asdf%';"
1.156u 0.491s 0:01.64 100.0%0+0k 0+0io 0pf+0w



FYI using egrep takes only 0.14s to get results in the worse case scenario:
 >time egrep -i "[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@asdf" 
meta.txt
0.077u 0.069s 0:00.14 92.8% 0+0k 0+0io 0pf+0w

1) I know egrep is not a DB but does sqlite use such an inefficient
search algorithm for content that cannot be indexed? Why not reverting
to simple 'grep-like' methods? Or am I missing something trivial here?


As a database, the file contains a LOT of structure.  SQLite must
follow the structure to locate the table, each record in the table,
and expand the text field from its stored format.  (The text itself is
not a big deal, but the row/column that stores it must be found and
extracted.)  The data is not necessarily stored end-to-end
sequentially in the file, as it's impossible to do that and still
maintain all the necessary properties of a structured database.

egrep gets to work with a flat text file, which it can easily read
sequentially and get optimal performance from the OS's file
buffering/cache management.  It only needs to read a piece of the file
and scan for patterns, repeating until done.  The only structure it
needs to be aware of is line breaks, but that is so simple it can be
folded into the pattern scan itself.

While someone would need to do profiling to examine exactly where the
time goes, it would not suirprise me to find that SQLite's LIKE
pattern matcher is more efficient than egrep, but that the overhead
from dealing with structured data is responsible for the time
difference.  I don't find the time itself surprising at all.


2) Why doesn't an index raise performance at all in this case? Is it
because non-latin chars are used?


Careful use of an index should help for the specific query you posted
(see http://sqlite.org/optoverview.html#like_opt), but it's not
possible for an index to speed up arbitrary patterns.

If you need to perform arbitrary pattern searches on a flat text file,
SQLite (and most other structured storage for that matter) is simply
the wrong tool for the job.  grep and friends are highly optimized for
just that purpose.


Sqlite does cater for text searching.  Look at FTS.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-28 Thread Spiros Ioannou

Firstly, thank you all for your answers,


egrep gets to work with a flat text file, which it can easily read
sequentially and get optimal performance from the OS's file
buffering/cache management.  It only needs to read a piece of the file
and scan for patterns, repeating until done.  The only structure it
needs to be aware of is line breaks, but that is so simple it can be
folded into the pattern scan itself.


You are mostly describing fgrep. Egrep does a regular expression search 
thus honouring column layout in the example I gave earlier. (FYI fgrep 
is 50% faster than egrep than egrep in this query)


The points are two:
1) Could sqlite structure allow for faster full column scan queries?
   More than 1000% slower performance seems way too much difference.
2) an index could (and should) be used when using LIKE 'asdf'

Well I actually solved this problem by using
PRAGMA case_sensitive_like = 1;
Now the index is being used (for all non-english characters).

But the online manual states:

"
if case_sensitive_like mode is enabled then the column must use the 
default BINARY collating sequence,
 if case_sensitive_like mode is disabled then the column must use the b 
uilt-in NOCASE collating sequence.


 
 NOCASE - The same as binary, except the 26 upper case characters used 
 by the English language are folded to their lower case equivalents 
before the comparison is performed.

"

so as I understand it there's no reason that the index was not used in 
the NOCASE scenario. It should do case-insensitive matches for the 26 
latin characters and case-sensitive (binary) for all the others.


Or I'm still missing some essential sql parameter. Or perhaps my version 
is too old.


Thank you all for your time,
All the best,
-Spiros

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-27 Thread John Stanton
You will always get much faster searching using a flat file and a 
grep-like search.  For the search you quote you can do better than egrep 
with an integrated search algorithm like Boyer-Moore.


This is no reflection on Sqlite, it is not intended to be a replacement 
for grep.


Spiros Ioannou wrote:

Hello,

I had a 135MB, 1256132 lines,  '@' separated text file containing 
various words and text fields (like a dictionary).

Example record:
[EMAIL PROTECTED]@[EMAIL PROTECTED],[EMAIL PROTECTED] altana : μικρός ανθόκηπος - εξώστης, 
ταράτσα@@@



I imported the data in sqlite3.3.6 but when querying with the 'like' 
operator, the performance way too slow (about 1.5-2 seconds/query):


 >time sqlite3 dicts.db "select * from table1 where word like 'asdf%';"
1.156u 0.491s 0:01.64 100.0%0+0k 0+0io 0pf+0w

Creating an index did not help at all. Using the  COLLATE NOCASE is not 
of help either since text encoding is not iso-8859-1 (but still is 8-bit).


FYI using egrep takes only 0.14s to get results in the worse case scenario:
 >time egrep -i "[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@asdf" 
meta.txt
0.077u 0.069s 0:00.14 92.8% 0+0k 0+0io 0pf+0w

1) I know egrep is not a DB but does sqlite use such an inefficient 
search algorithm for content that cannot be indexed? Why not reverting 
to simple 'grep-like' methods? Or am I missing something trivial here?
2) Why doesn't an index raise performance at all in this case? Is it 
because non-latin chars are used?



Please clarify, I'm really confused about this.
Thank you.

Spiros Ioannou

p.s.
if answers to (1) and (2) don't help, I'm going to write a php-egrep 
plugin to avoid popen-ing :-)



--
Image Video & Multimedia Systems Lab.
Department of Electrical & Computer Eng.
National Technical University of Athens
http://www.image.ece.ntua.gr

- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-27 Thread Trevor Talbot
On 11/27/07, Spiros Ioannou <[EMAIL PROTECTED]> wrote:

> I had a 135MB, 1256132 lines,  '@' separated text file containing
> various words and text fields (like a dictionary).
> Example record:
> [EMAIL PROTECTED]@[EMAIL PROTECTED],[EMAIL PROTECTED] altana : μικρός 
> ανθόκηπος - εξώστης,
> ταράτσα@@@

> I imported the data in sqlite3.3.6 but when querying with the 'like'
> operator, the performance way too slow (about 1.5-2 seconds/query):
>
>  >time sqlite3 dicts.db "select * from table1 where word like 'asdf%';"
> 1.156u 0.491s 0:01.64 100.0%0+0k 0+0io 0pf+0w

> FYI using egrep takes only 0.14s to get results in the worse case scenario:
>  >time egrep -i "[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@asdf" 
> meta.txt
> 0.077u 0.069s 0:00.14 92.8% 0+0k 0+0io 0pf+0w
>
> 1) I know egrep is not a DB but does sqlite use such an inefficient
> search algorithm for content that cannot be indexed? Why not reverting
> to simple 'grep-like' methods? Or am I missing something trivial here?

As a database, the file contains a LOT of structure.  SQLite must
follow the structure to locate the table, each record in the table,
and expand the text field from its stored format.  (The text itself is
not a big deal, but the row/column that stores it must be found and
extracted.)  The data is not necessarily stored end-to-end
sequentially in the file, as it's impossible to do that and still
maintain all the necessary properties of a structured database.

egrep gets to work with a flat text file, which it can easily read
sequentially and get optimal performance from the OS's file
buffering/cache management.  It only needs to read a piece of the file
and scan for patterns, repeating until done.  The only structure it
needs to be aware of is line breaks, but that is so simple it can be
folded into the pattern scan itself.

While someone would need to do profiling to examine exactly where the
time goes, it would not suirprise me to find that SQLite's LIKE
pattern matcher is more efficient than egrep, but that the overhead
from dealing with structured data is responsible for the time
difference.  I don't find the time itself surprising at all.

> 2) Why doesn't an index raise performance at all in this case? Is it
> because non-latin chars are used?

Careful use of an index should help for the specific query you posted
(see http://sqlite.org/optoverview.html#like_opt), but it's not
possible for an index to speed up arbitrary patterns.

If you need to perform arbitrary pattern searches on a flat text file,
SQLite (and most other structured storage for that matter) is simply
the wrong tool for the job.  grep and friends are highly optimized for
just that purpose.


[sqlite] LIKE operator extremely slow? more than 10 times slower than egrep?

2007-11-27 Thread Spiros Ioannou

Hello,

I had a 135MB, 1256132 lines,  '@' separated text file containing 
various words and text fields (like a dictionary).

Example record:
[EMAIL PROTECTED]@[EMAIL PROTECTED],[EMAIL PROTECTED] altana : μικρός ανθόκηπος - εξώστης, 
ταράτσα@@@



I imported the data in sqlite3.3.6 but when querying with the 'like' 
operator, the performance way too slow (about 1.5-2 seconds/query):


>time sqlite3 dicts.db "select * from table1 where word like 'asdf%';"
1.156u 0.491s 0:01.64 100.0%0+0k 0+0io 0pf+0w

Creating an index did not help at all. Using the  COLLATE NOCASE is not 
of help either since text encoding is not iso-8859-1 (but still is 8-bit).


FYI using egrep takes only 0.14s to get results in the worse case scenario:
>time egrep -i "[EMAIL PROTECTED]@[EMAIL PROTECTED]@[EMAIL PROTECTED]@asdf" 
meta.txt
0.077u 0.069s 0:00.14 92.8% 0+0k 0+0io 0pf+0w

1) I know egrep is not a DB but does sqlite use such an inefficient 
search algorithm for content that cannot be indexed? Why not reverting 
to simple 'grep-like' methods? Or am I missing something trivial here?
2) Why doesn't an index raise performance at all in this case? Is it 
because non-latin chars are used?



Please clarify, I'm really confused about this.
Thank you.

Spiros Ioannou

p.s.
if answers to (1) and (2) don't help, I'm going to write a php-egrep 
plugin to avoid popen-ing :-)



--
Image Video & Multimedia Systems Lab.
Department of Electrical & Computer Eng.
National Technical University of Athens
http://www.image.ece.ntua.gr

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator syntax for white space

2007-10-24 Thread Yuriy Martsynovskyy
These only strip lines with Space characters. I need also to filter
out also tabs, carriage returns and all series characters that
constitute a white space.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator syntax for white space

2007-10-23 Thread Trey Mack

How do I filter out records that contain WHITE SPACE in a field or
have this field empty?

For example:
select * from mytable
where fld <> "" and fld LIKE .


Do you mean "contain ONLY white space"? Are you after this?

select * from mytable
where trim(fld) <> ""



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] LIKE operator syntax for white space

2007-10-23 Thread Yuriy Martsynovskyy
How do I filter out records that contain WHITE SPACE in a field or
have this field empty?

For example:
select * from mytable
where fld <> "" and fld LIKE .

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] like operator

2007-08-17 Thread John Stanton
The % is an escape character in the WWW form protocol.  You need to 
interpret it and regard the next two bytes as a hex char.


RaghavendraK 70574 wrote:

Hi,

we have given a web interface which receive delete request.
Now in the req we get "%" and in the delete impl we do this
delete from table where itemName like xxx.%;

since the key is % the above statement becomes,
"delete from table where itemName like %.%";And result in fatal problem of 
erasing all records.

Is there any api to deal with like operator for these conditions, pls help. Hopefully fix will not 
degrade performance.


regrds
ragha


**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
 
*

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] like operator

2007-08-17 Thread Tom Briggs

   I'm not sure I correctly understand your question, but: escaping the
% in your query may be what you're looking for, i.e.

   delete from table where itemName like '\%.%' escape '\'

   Maybe. :)

   -Tom 

> -Original Message-
> From: RaghavendraK 70574 [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, August 16, 2007 9:03 PM
> To: SQLite
> Subject: [sqlite] like operator
> 
> Hi,
> 
> we have given a web interface which receive delete request.
> Now in the req we get "%" and in the delete impl we do this
> delete from table where itemName like xxx.%;
> 
> since the key is % the above statement becomes,
> "delete from table where itemName like %.%";And result in 
> fatal problem of erasing all records.
> 
> Is there any api to deal with like operator for these 
> conditions, pls help. Hopefully fix will not 
> degrade performance.
> 
> regrds
> ragha
> 
> 
> **
> 
>  This email and its attachments contain confidential 
> information from HUAWEI, which is intended only for the 
> person or entity whose address is listed above. Any use of 
> the information contained herein in any way (including, but 
> not limited to, total or partial disclosure, reproduction, or 
> dissemination) by persons other than the intended 
> recipient(s) is prohibited. If you receive this e-mail in 
> error, please notify the sender by phone or email immediately 
> and delete it!
>  
> **
> ***
> 
> --
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> --
> ---
> 
> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] like operator

2007-08-16 Thread Clark Christensen
You have to test your incoming values, and reject requests that have "%" (and 
other illegal) chars.

I never allow real deletes from a web form, and especially not from trusted 
users.  Consider adding a "deleted" column, and update the affected rows to 
indicate they've been deleted.  It's a little more work, but it sounds like 
it'll save you some pain in the long run.

 -Clark

- Original Message 
From: RaghavendraK 70574 <[EMAIL PROTECTED]>
To: SQLite 
Sent: Thursday, August 16, 2007 6:02:32 PM
Subject: [sqlite] like operator

Hi,

we have given a web interface which receive delete request.
Now in the req we get "%" and in the delete impl we do this
delete from table where itemName like xxx.%;

since the key is % the above statement becomes,
"delete from table where itemName like %.%";And result in fatal problem of 
erasing all records.

Is there any api to deal with like operator for these conditions, pls help. 
Hopefully fix will not 
degrade performance.

regrds
ragha


**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
 
*

-
To unsubscribe, send email to [EMAIL PROTECTED]
-





-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] like operator

2007-08-16 Thread RaghavendraK 70574
Hi,

we have given a web interface which receive delete request.
Now in the req we get "%" and in the delete impl we do this
delete from table where itemName like xxx.%;

since the key is % the above statement becomes,
"delete from table where itemName like %.%";And result in fatal problem of 
erasing all records.

Is there any api to deal with like operator for these conditions, pls help. 
Hopefully fix will not 
degrade performance.

regrds
ragha


**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained herein in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
e-mail in error, please notify the sender by phone or email immediately and 
delete it!
 
*

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator case-senisitive for international characters

2007-07-10 Thread Dan Kennedy
On Tue, 2007-07-10 at 22:23 -0700, Sweden wrote:
> I have searched many forums without success for the following question:
> 
> Is there any possibility to use LIKE operator  in a SELECT statment -
> without being case sensitive AND with support for international characters?
> I am using "PRAGMA case_sensitive_like=OFF;". The LIKE statement is in the
> format '%what%' (*.*). This works fine for English characters but not for
> Swedish.
> 
> E.g. LIKE "%åker%'  
> 
> returns "fred åkerholm" 
> 
> but NOT "Fred Åkerholm"
> 
> Doc says "International character sets are case sensitive in SQLite unless a
> user-supplied collating sequence is used. But if you employ a user-supplied
> collating sequence, the LIKE optimization describe here will never be
> taken". 
> 
> Regards, 
> 
> Michael from Sweden

There is an extension bundled with SQLite sources that uses the ICU
library to do this. See here:

http://www.sqlite.org/cvstrac/fileview?f=sqlite/ext/icu/README.txt&v=1.2

Dan.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] LIKE operator case-senisitive for international characters

2007-07-10 Thread Sweden

I have searched many forums without success for the following question:

Is there any possibility to use LIKE operator  in a SELECT statment -
without being case sensitive AND with support for international characters?
I am using "PRAGMA case_sensitive_like=OFF;". The LIKE statement is in the
format '%what%' (*.*). This works fine for English characters but not for
Swedish.

E.g. LIKE "%åker%'  

returns "fred åkerholm" 

but NOT "Fred Åkerholm"

Doc says "International character sets are case sensitive in SQLite unless a
user-supplied collating sequence is used. But if you employ a user-supplied
collating sequence, the LIKE optimization describe here will never be
taken". 

Regards, 

Michael from Sweden





E.g. 
-- 
View this message in context: 
http://www.nabble.com/LIKE-operator-case-senisitive-for-international-characters-tf4059943.html#a11534367
Sent from the SQLite mailing list archive at Nabble.com.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] LIKE operator with prepared statements

2006-04-07 Thread drh
Eric Bohlman <[EMAIL PROTECTED]> wrote:
> Dennis Cote wrote:
> > You could also do this:
> > SELECT x from y WHERE y.x LIKE '%' || ? || '%' ;
> > 
> > The || operator concatenates the % characters with your string.  Now you 
> > don't need to massage the string in the calling code.  Six of one, half 
> > dozen of the other.
> 
> Note, though, that as currently implemented (DRH has said it might 
> change in the future) the concatenation will be performed for each row 
> evaluated (and that particular query will guarantee a full-table scan) 
> so doing it in the calling code would be a better idea if large tables 
> are involved.
> 

The full table scan will happen regardless of what you do
if you are LIKE-ing with anything that starts with '%'.  But
you can force the common subexpression elimination by putting
the subexpression in a subquery:

  SELECT x FROM y WHERE y.x LIKE (SELECT '%' || ? || '%')

The subquery will do the concatenation exactly once and reuse
the result for every row of the Y table that it scans.
--
D. Richard Hipp   <[EMAIL PROTECTED]>



Re: [sqlite] LIKE operator with prepared statements

2006-04-07 Thread Eric Bohlman

Dennis Cote wrote:

You could also do this:
SELECT x from y WHERE y.x LIKE '%' || ? || '%' ;

The || operator concatenates the % characters with your string.  Now you 
don't need to massage the string in the calling code.  Six of one, half 
dozen of the other.


Note, though, that as currently implemented (DRH has said it might 
change in the future) the concatenation will be performed for each row 
evaluated (and that particular query will guarantee a full-table scan) 
so doing it in the calling code would be a better idea if large tables 
are involved.




Re: [sqlite] LIKE operator with prepared statements

2006-04-07 Thread Dennis Cote

Marian Olteanu wrote:


But why don't you use

SELECT x from y WHERE y.x LIKE ? ;

and bind the first parameter to "%SomeText%" instead of "SomeText" like
before?



Chad,

You could also do this: 


SELECT x from y WHERE y.x LIKE '%' || ? || '%' ;

The || operator concatenates the % characters with your string.  Now you 
don't need to massage the string in the calling code.  Six of one, half 
dozen of the other.


HTH
Dennis Cote


RE: [sqlite] LIKE operator with prepared statements

2006-04-06 Thread Marian Olteanu
But why don't you use

SELECT x from y WHERE y.x LIKE ? ;

and bind the first parameter to "%SomeText%" instead of "SomeText" like
before?

-Original Message-
From: Slater, Chad [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 06, 2006 6:40 PM
To: sqlite-users@sqlite.org
Subject: [sqlite] LIKE operator with prepared statements

Is it possible to use the LIKE operator with a prepared statement? 

I'm trying to build a query that uses binding for the text in the LIKE
operation as follows:

SELECT x from y WHERE y.x LIKE %?% ;

...And binding text to the positional parameter in hopes to get:

SELECT x from y WHERE y.x LIKE %SomeText% ;

But it results in a sql parse error. Is %Q and sqlite3_mprintf my only
option here?



Chad



[sqlite] LIKE operator with prepared statements

2006-04-06 Thread Slater, Chad
Is it possible to use the LIKE operator with a prepared statement? 

I'm trying to build a query that uses binding for the text in the LIKE
operation as follows:

SELECT x from y WHERE y.x LIKE %?% ;

...And binding text to the positional parameter in hopes to get:

SELECT x from y WHERE y.x LIKE %SomeText% ;

But it results in a sql parse error. Is %Q and sqlite3_mprintf my only
option here?



Chad