Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Josh Wilson
Each major release (and occasionally minor) we review third party libraries and 
what improvements have been released since we last looked. 

FTS and having international support for a fast search within our app has been 
a big push.  We are essentially a glorified database front end with a cushy 
form UI. We are trying to use FTS to help with autocompleting fields. 

Once we are happy the latest version of the third party library is stable we 
build it and manually copy it into the repo. So third party libraries aren't 
part of the build process except for linking statically. This guarantees 
stability as it is a frozen version of the third party library. Although human 
error creeps in when manually copying apparently. 

> On 16 Jun 2014, at 1:52 pm, "Stadin, Benjamin [via SQLite]" 
>  wrote:
> 
> You could create a fake framework, because it takes a while every time to 
> compile. 
> 
> Just wondering: What's your rationale to use Unicode61 in an iOS project? 
> Being able to sort based on the locale is a feature all our foreign customers 
> demand (and here in German as well). Of course nobody will complain until 
> they realize. 
>  
> Von: [hidden email] [[hidden email]] im Auftrag von Josh Wilson [[hidden 
> email]] 
> Gesendet: Montag, 16. Juni 2014 04:38 
> An: [hidden email] 
> Betreff: Re: [sqlite] Unicode61 Tokenizer 
> 
> Righteo thanks for the sanity check that it must be me at fault and that this 
> is indeed possible without ICU. 
> 
> I have a separate XCode project for rolling the latest SQLite amalgamation 
> and copy that built library out of the Derived Data folder into our main App 
> project. 
> 
> It would appear I kept copying an old file for v3.8.4.3 and not the actual 
> v3.8.5 I was modifying so no wonder there was no change. So after a `Clean` 
> and `Delete Derived Data` then build the resulting build worked. Rookie 
> mistake, sorry guys. 
> 
> I simply followed the ottersoftware approach to adding the defines at the 
> top of the sqlite3.c file after SQLITE_CORE and SQLITE_AMALGAMATION get 
> defined. 
> 
> #define SQLITE_ENABLE_FTS4 
> #define SQLITE_ENABLE_FTS3_PARENTHESIS 
> #define SQLITE_ENABLE_FTS4_UNICODE61 
> 
> This worked for me building against the iOS7.1 SDK (including 64bit 
> architecture build) for anyone else's future reference. 
> 
> 
> 
> 
> -- 
> View this message in context: 
> http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76118.html
> Sent from the SQLite mailing list archive at Nabble.com. 
> ___ 
> sqlite-users mailing list 
> [hidden email] 
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___ 
> sqlite-users mailing list 
> [hidden email] 
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76119.html
> To unsubscribe from Unicode61 Tokenizer, click here.
> NAML




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76121.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Simon Slavin

On 16 Jun 2014, at 3:38am, Josh Wilson  wrote:

> It would appear I kept copying an old file for v3.8.4.3 and not the actual
> v3.8.5 I was modifying so no wonder there was no change. So after a `Clean`
> and `Delete Derived Data` then build the resulting build worked. Rookie
> mistake, sorry guys.

That has bitten me too.  I've definitely solved mystery compilation bugs by 
doing a 'Clean' and rebuilding from source.  I'm sure there's some bug in 
Xcode's notation about when to update one of a project's build stages.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Stadin, Benjamin
You could create a fake framework, because it takes a while every time to 
compile.

Just wondering: What's your rationale to use Unicode61 in an iOS project? Being 
able to sort based on the locale is a feature all our foreign customers demand 
(and here in German as well). Of course nobody will complain until they 
realize. 

Von: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] im 
Auftrag von Josh Wilson [neozenith@gmail.com]
Gesendet: Montag, 16. Juni 2014 04:38
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Unicode61 Tokenizer

Righteo thanks for the sanity check that it must be me at fault and that this
is indeed possible without ICU.

I have a separate XCode project for rolling the latest SQLite amalgamation
and copy that built library out of the Derived Data folder into our main App
project.

It would appear I kept copying an old file for v3.8.4.3 and not the actual
v3.8.5 I was modifying so no wonder there was no change. So after a `Clean`
and `Delete Derived Data` then build the resulting build worked. Rookie
mistake, sorry guys.

I simply followed the ottersoftware approach to adding the defines at the
top of the sqlite3.c file after SQLITE_CORE and SQLITE_AMALGAMATION get
defined.

#define SQLITE_ENABLE_FTS4
#define SQLITE_ENABLE_FTS3_PARENTHESIS
#define SQLITE_ENABLE_FTS4_UNICODE61

This worked for me building against the iOS7.1 SDK (including 64bit
architecture build) for anyone else's future reference.




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76118.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Josh Wilson
Righteo thanks for the sanity check that it must be me at fault and that this
is indeed possible without ICU.

I have a separate XCode project for rolling the latest SQLite amalgamation
and copy that built library out of the Derived Data folder into our main App
project.

It would appear I kept copying an old file for v3.8.4.3 and not the actual
v3.8.5 I was modifying so no wonder there was no change. So after a `Clean`
and `Delete Derived Data` then build the resulting build worked. Rookie
mistake, sorry guys.

I simply followed the ottersoftware approach to adding the defines at the
top of the sqlite3.c file after SQLITE_CORE and SQLITE_AMALGAMATION get
defined.

#define SQLITE_ENABLE_FTS4
#define SQLITE_ENABLE_FTS3_PARENTHESIS
#define SQLITE_ENABLE_FTS4_UNICODE61

This worked for me building against the iOS7.1 SDK (including 64bit
architecture build) for anyone else's future reference.




--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76118.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Richard Hipp
On Sun, Jun 15, 2014 at 10:02 PM, Josh Wilson 
wrote:

> https://bitbucket.org/ottersoftware/fts-diacritic-marks
>
> Ok so the above project successfully includes a build of sqlite v3.7.15.2
> without ICU but the `unicode61` tokenizer works.
>
> So I tried the same #defines they used with v3.8.5 and still get 'unknown
> tokenizer: unicode61'
>
> Has something happened between versions?
>

Works for me.

SQLite version 3.8.5 2014-06-13 13:43:25
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create virtual table t1 using fts4(a,b,tokenize=unicode61);
sqlite> insert into t1(a,b) values('this is a test case','abcd efgh jklm');
sqlite>


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Josh Wilson
https://bitbucket.org/ottersoftware/fts-diacritic-marks

Ok so the above project successfully includes a build of sqlite v3.7.15.2
without ICU but the `unicode61` tokenizer works.

So I tried the same #defines they used with v3.8.5 and still get 'unknown
tokenizer: unicode61'

Has something happened between versions?



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76116.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Josh Wilson
Thanks Ben.

Ok that makes sense. Normally if you roll the source code yourself there are
no issues since you are taking responsibility for the code being delivered
to the AppStore.

I know that sometimes Apple insta-rejects based upon the symbols in your
binary. Collisions with private APIs makes sense too.

>From what I gather though is that the ICU library adds quite a footprint
~25Mb which is kinda prohibitive relative to our app size.

http://sqlite.1065341.n5.nabble.com/unicode61-FTS-tokenizer-td62499.html

Dr Richard Hipp states in the above linked post that `unicode61` is
independent of ICU yet I keep getting the following error:

'unknown tokenizer: unicode61'

Which makes me thinks the rules for v6.1 of unicode are hard baked into the
code somewhere.

Worst case scenario you are saying that rolling a build of sqlite3+ICU won't
get rejected from Apple AND the `unicode61` tokenizer works? This is good
news.

I'd still like confirmation of how to get the ICU-less version working if it
is at all possible.



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76115.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode61 Tokenizer

2014-06-15 Thread Stadin, Benjamin
I use SQLite with some custom extensions + ICU (+ some small additions to
enable loading minimized ICU dat files) since almost 3 years in all our
companies' iOS projects. I came across a posting once on StackOverflow
with somebody saying he got a rejection - but this was never the case for
us and we deliver like 3 apps to the store per month.

So, just don¹t include and link against the .dylib. And then there should
to my knowledge nothing stand in the way to link and roll your own ICU,
neither from legal nor Apple perspective.

To be on the save side, change the ICU version number for your own ICU
build. ICU functions get a postfix (which is resolved when you include the
headers of a particular build). So for example u_something() is resolved
to u_something_53(). In practice it is however quite unlikely that these
will collide, since Apple (like also other distributions) is rather
conservative and doesn¹t use the very latest version. But be on the save
side here and just change it.

I¹ve wrapped my SQLite build into an iOS fake framework. I need to say
however that embedding ICU into SQLite was quite tedious. I ended up
stripping ICU to the minimum required by SQLite, needed to fiddle with
defines and also had to alter a few lines of code (it becomes obvious if
you try to build what needs to be changed - like a Point type that
conflicts with some included header file on iOS).

I unfortunately can¹t give you this project, as it belongs to the company
I work for. But I can help you to create your own one.

Ben

Am 16.06.14 01:08 schrieb "Josh Wilson" unter :

>This might seem like a dumb question but I am getting conflicting
>information from web sources.
>
>https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1218086.
>html
>
>I'm trying to enable unicode61 tokenizer for a build on iOS. Apple frowns
>upon including your own libicu since they include their own
>libicucore.dylib and restrict access to how it is used purely through
>their
>NSString APIs.
>
>The above link suggests I can simply compile sqlite with
>-DSQLITE_ENABLE_FTS4_UNICODE61
>but not needing -DSQLITE_ENABLE_ICU.
>
>I have tried this with v3.8.5 and get errors that the unicode61 tokenizer
>is not recognised. Which makes sense to me, since presumably sqlite relies
>on ICU for it's definition of 'what is unicode' (as it is a moving target
>it would seem).
>
>So is it possible to have an ICU-less build that allows the unicode61
>tokenizer?
>
>Or should I look at writing a custom tokenizer that hooks into the allowed
>NSString API?
>
>Kind Regards
>-- 
>
>Josh Wilson
>___
>sqlite-users mailing list
>sqlite-users@sqlite.org
>http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Unicode61 Tokenizer

2014-06-15 Thread Josh Wilson
This might seem like a dumb question but I am getting conflicting
information from web sources.

https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1218086.html

I'm trying to enable unicode61 tokenizer for a build on iOS. Apple frowns
upon including your own libicu since they include their own
libicucore.dylib and restrict access to how it is used purely through their
NSString APIs.

The above link suggests I can simply compile sqlite with
-DSQLITE_ENABLE_FTS4_UNICODE61
but not needing -DSQLITE_ENABLE_ICU.

I have tried this with v3.8.5 and get errors that the unicode61 tokenizer
is not recognised. Which makes sense to me, since presumably sqlite relies
on ICU for it's definition of 'what is unicode' (as it is a moving target
it would seem).

So is it possible to have an ICU-less build that allows the unicode61
tokenizer?

Or should I look at writing a custom tokenizer that hooks into the allowed
NSString API?

Kind Regards
-- 

Josh Wilson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slowish R*Tree performance

2014-06-15 Thread Simon Slavin

> On 15 Jun 2014, at 5:21pm, Eric Rubin-Smith  wrote:
> 
> still not good enough for my use case
> (unfortunately).  Any further optimization tips are highly welcome.

Strongly suspect that although R*Trees produce an elegant solution to your 
problem, the fact that they're a general case tool will make them too slow to 
use for something like this.

I propose an alternative solution, though I have not tried it and do not have 
time to try it (sorry).

1) A function which turns a numeric IP address or a block into some standard 
easy-to-process representation in string form.  Possibly a pair of strings with 
the first string being an address the second being something indicating the 
extent of the block, perhaps something like 
'2001:0db8:8500:::::v::ff00:::::'.
  You could make it shorter by leaving out the colons but my experience is that 
although this leads to less data stored on disk it doesn't speed up processing 
by much.  But if you have a great deal of data you might want to do it anyway.

2) A comparator function (maybe a SQLite function, maybe not) which takes two 
such addresses or blocks and returns a value indicating whether they're 
identical, whether block A contains block or address B, or neither.

The closest I got to the above was when I needed a program which intensively 
searched and sorted individual IPv4 addresses.  I got best results by defining 
a SQLite function which converted IP addresses of all formats into 'standard 
format' where each byte was two hex digits.  All values stored in the database 
were stored in my 'standard' format.  This allowed easy collation using 
standard text sorting.  Everything else turned out faster to implement in my 
own programming language than it was to build as SQLite functions.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slowish R*Tree performance

2014-06-15 Thread Eric Rubin-Smith
On Sun, Jun 15, 2014 at 9:47 AM, Eric Rubin-Smith  wrote:

> Richard Hipp wrote:
>
> > What does this query return?
> >
> >  SELECT count(*) FROM ipIndex
> >   WHERE minD1 <= 1220818432 and 1220818432 <= maxD1
> > AND minD2 <= 2120561472 and 2120561472 <= maxD2
> > AND minD3 <= 1685398080 and 1685398080 <= maxD3
> > AND minD4 <= 1685755328 and 1685755328 <= maxD4
> > AND minD5 <= 538331072 and 538331072 <= maxD5;
>
> Hm, it returns 1645.  This indicates a bug (the max expected value is
> 128).  I'm now highly suspicious of my mathematical reasoning or my
> code.  I'll take a look.  Thanks, Richard!
>


Follow-up for those who are curious.  My program for randomly populating
the database was creating a bunch of identical bounding boxes for
short-length prefixes (i.e. prefixes corresponding to large bounding
boxes).  This made the R*Tree do a bunch of redundant work.  Eliminating
this issue led to a ~30x throughput improvement to ~6k searches per second
on a database with 100k prefixes in it.

After populating the database with 5.7 million such prefixes, we are at a
throughput of about 2.2kTPS.  Not horrible, and not sure I can expect much
more out of SQLite -- but still not good enough for my use case
(unfortunately).  Any further optimization tips are highly welcome.  In the
mean time, I'm going to keep digging.

Thanks again to Richard for pointing me in the right direction.

Eric
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slowish R*Tree performance

2014-06-15 Thread Eric Rubin-Smith
Richard Hipp wrote:

> What does this query return?
> 
>  SELECT count(*) FROM ipIndex
>   WHERE minD1 <= 1220818432 and 1220818432 <= maxD1
> AND minD2 <= 2120561472 and 2120561472 <= maxD2
> AND minD3 <= 1685398080 and 1685398080 <= maxD3
> AND minD4 <= 1685755328 and 1685755328 <= maxD4
> AND minD5 <= 538331072 and 538331072 <= maxD5;

Hm, it returns 1645.  This indicates a bug (the max expected value is
128).  I'm now highly suspicious of my mathematical reasoning or my 
code.  I'll take a look.  Thanks, Richard!

Eric

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] importing CSV with a random empty line at the end

2014-06-15 Thread gwenn
Hello,
I've taken the time to investigate the problem: only the last
parameter is not correctly bound to NULL.

May I suggest a patch:

--- shell_.c 2014-06-15 14:22:39.0 +0200
+++ shell.c 2014-06-15 14:23:11.0 +0200
@@ -2553,7 +2553,7 @@
   "filling the rest with NULL\n",
   sCsv.zFile, startLine, nCol, i+1);
   i++;
-  while( i wrote:
> gwenn, thanks for this. I did not understand what you saw, and then I
> realized my shell exe was probably too old.
> I downloaded the new exe and this solves the problem just fine!
> thanks for your help.
>
> gert
>
>
> 2014-06-02 19:03 GMT+02:00 gwenn :
>
>> Hello,
>> I doesn't fail for me (it may depend on the constraints on the target
>> table) but the behaviour is unexpected:
>>
>> $ echo "1|test
>> > " > empty.csv
>> $ sqlite3
>> SQLite version 3.8.4.3 2014-04-03 16:53:12
>> sqlite> create table test(opt text, data text not null);
>> sqlite> .import empty.csv test
>> empty.csv:2: expected 2 columns but found 1 - filling the rest with NULL
>> sqlite> select * from test;
>> 1|test
>> |test
>>
>> As the bindings are not cleared, it is not a null value but the
>> previous bound value which is inserted.
>> Regards.
>>
>> On Mon, Jun 2, 2014 at 5:41 PM, Gert Van Assche  wrote:
>> > All,
>> >
>> > I received 100.000 UTF-8 files (average size 50kb)  "ready for import" in
>> > an SQLite db.
>> > 90% of them go fine, but some files have an empty line at the very end of
>> > the fine (so an extra EOL before the EOF).
>> >
>> > Of course, the import fails... Is there an easy way to get rid of that
>> > extra empty line before I import the file, or is there a way to ignore an
>> > empty line?
>> >
>> > thanks
>> >
>> > Gert
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] slowish R*Tree performance

2014-06-15 Thread Richard Hipp
On Sun, Jun 15, 2014 at 12:25 AM, Eric Rubin-Smith 
wrote:

>
> sqlite> explain query plan SELECT prefix, target FROM routeTarget WHERE id
> = (
>...>SELECT id FROM ipIndex
>...> WHERE minD1 <= 1220818432 and 1220818432 <= maxD1
>...>   AND minD2 <= 2120561472 and 2120561472 <= maxD2
>...>   AND minD3 <= 1685398080 and 1685398080 <= maxD3
>...>   AND minD4 <= 1685755328 and 1685755328 <= maxD4
>...>   AND minD5 <= 538331072 and 538331072 <= maxD5
>...> ORDER BY ((maxD5-minD5)*(maxD4-minD4)*(maxD3-minD3)*
>...>   (maxD2-minD2)*(maxD1-minD1)) ASC
>...>LIMIT 1);
> 0|0|0|SEARCH TABLE routeTarget USING INTEGER PRIMARY KEY (rowid=?)
> 0|0|0|EXECUTE SCALAR SUBQUERY 1
> 1|0|0|SCAN TABLE ipIndex VIRTUAL TABLE INDEX 2:B0D1B2D3B4D5B6D7B8D9
> 1|0|0|USE TEMP B-TREE FOR ORDER BY
>

What does this query return?

 SELECT count(*) FROM ipIndex
  WHERE minD1 <= 1220818432 and 1220818432 <= maxD1
AND minD2 <= 2120561472 and 2120561472 <= maxD2
AND minD3 <= 1685398080 and 1685398080 <= maxD3
AND minD4 <= 1685755328 and 1685755328 <= maxD4
AND minD5 <= 538331072 and 538331072 <= maxD5;

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users