Re: [sqlite] Unicode61 Tokenizer
Each major release (and occasionally minor) we review third party libraries and what improvements have been released since we last looked. FTS and having international support for a fast search within our app has been a big push. We are essentially a glorified database front end with a cushy form UI. We are trying to use FTS to help with autocompleting fields. Once we are happy the latest version of the third party library is stable we build it and manually copy it into the repo. So third party libraries aren't part of the build process except for linking statically. This guarantees stability as it is a frozen version of the third party library. Although human error creeps in when manually copying apparently. > On 16 Jun 2014, at 1:52 pm, "Stadin, Benjamin [via SQLite]" >wrote: > > You could create a fake framework, because it takes a while every time to > compile. > > Just wondering: What's your rationale to use Unicode61 in an iOS project? > Being able to sort based on the locale is a feature all our foreign customers > demand (and here in German as well). Of course nobody will complain until > they realize. > > Von: [hidden email] [[hidden email]] im Auftrag von Josh Wilson [[hidden > email]] > Gesendet: Montag, 16. Juni 2014 04:38 > An: [hidden email] > Betreff: Re: [sqlite] Unicode61 Tokenizer > > Righteo thanks for the sanity check that it must be me at fault and that this > is indeed possible without ICU. > > I have a separate XCode project for rolling the latest SQLite amalgamation > and copy that built library out of the Derived Data folder into our main App > project. > > It would appear I kept copying an old file for v3.8.4.3 and not the actual > v3.8.5 I was modifying so no wonder there was no change. So after a `Clean` > and `Delete Derived Data` then build the resulting build worked. Rookie > mistake, sorry guys. > > I simply followed the ottersoftware approach to adding the defines at the > top of the sqlite3.c file after SQLITE_CORE and SQLITE_AMALGAMATION get > defined. > > #define SQLITE_ENABLE_FTS4 > #define SQLITE_ENABLE_FTS3_PARENTHESIS > #define SQLITE_ENABLE_FTS4_UNICODE61 > > This worked for me building against the iOS7.1 SDK (including 64bit > architecture build) for anyone else's future reference. > > > > > -- > View this message in context: > http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76118.html > Sent from the SQLite mailing list archive at Nabble.com. > ___ > sqlite-users mailing list > [hidden email] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > [hidden email] > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > > If you reply to this email, your message will be added to the discussion > below: > http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76119.html > To unsubscribe from Unicode61 Tokenizer, click here. > NAML -- View this message in context: http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76121.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unicode61 Tokenizer
On 16 Jun 2014, at 3:38am, Josh Wilsonwrote: > It would appear I kept copying an old file for v3.8.4.3 and not the actual > v3.8.5 I was modifying so no wonder there was no change. So after a `Clean` > and `Delete Derived Data` then build the resulting build worked. Rookie > mistake, sorry guys. That has bitten me too. I've definitely solved mystery compilation bugs by doing a 'Clean' and rebuilding from source. I'm sure there's some bug in Xcode's notation about when to update one of a project's build stages. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unicode61 Tokenizer
You could create a fake framework, because it takes a while every time to compile. Just wondering: What's your rationale to use Unicode61 in an iOS project? Being able to sort based on the locale is a feature all our foreign customers demand (and here in German as well). Of course nobody will complain until they realize. Von: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] im Auftrag von Josh Wilson [neozenith@gmail.com] Gesendet: Montag, 16. Juni 2014 04:38 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] Unicode61 Tokenizer Righteo thanks for the sanity check that it must be me at fault and that this is indeed possible without ICU. I have a separate XCode project for rolling the latest SQLite amalgamation and copy that built library out of the Derived Data folder into our main App project. It would appear I kept copying an old file for v3.8.4.3 and not the actual v3.8.5 I was modifying so no wonder there was no change. So after a `Clean` and `Delete Derived Data` then build the resulting build worked. Rookie mistake, sorry guys. I simply followed the ottersoftware approach to adding the defines at the top of the sqlite3.c file after SQLITE_CORE and SQLITE_AMALGAMATION get defined. #define SQLITE_ENABLE_FTS4 #define SQLITE_ENABLE_FTS3_PARENTHESIS #define SQLITE_ENABLE_FTS4_UNICODE61 This worked for me building against the iOS7.1 SDK (including 64bit architecture build) for anyone else's future reference. -- View this message in context: http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76118.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unicode61 Tokenizer
Righteo thanks for the sanity check that it must be me at fault and that this is indeed possible without ICU. I have a separate XCode project for rolling the latest SQLite amalgamation and copy that built library out of the Derived Data folder into our main App project. It would appear I kept copying an old file for v3.8.4.3 and not the actual v3.8.5 I was modifying so no wonder there was no change. So after a `Clean` and `Delete Derived Data` then build the resulting build worked. Rookie mistake, sorry guys. I simply followed the ottersoftware approach to adding the defines at the top of the sqlite3.c file after SQLITE_CORE and SQLITE_AMALGAMATION get defined. #define SQLITE_ENABLE_FTS4 #define SQLITE_ENABLE_FTS3_PARENTHESIS #define SQLITE_ENABLE_FTS4_UNICODE61 This worked for me building against the iOS7.1 SDK (including 64bit architecture build) for anyone else's future reference. -- View this message in context: http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76118.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unicode61 Tokenizer
On Sun, Jun 15, 2014 at 10:02 PM, Josh Wilsonwrote: > https://bitbucket.org/ottersoftware/fts-diacritic-marks > > Ok so the above project successfully includes a build of sqlite v3.7.15.2 > without ICU but the `unicode61` tokenizer works. > > So I tried the same #defines they used with v3.8.5 and still get 'unknown > tokenizer: unicode61' > > Has something happened between versions? > Works for me. SQLite version 3.8.5 2014-06-13 13:43:25 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create virtual table t1 using fts4(a,b,tokenize=unicode61); sqlite> insert into t1(a,b) values('this is a test case','abcd efgh jklm'); sqlite> -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unicode61 Tokenizer
https://bitbucket.org/ottersoftware/fts-diacritic-marks Ok so the above project successfully includes a build of sqlite v3.7.15.2 without ICU but the `unicode61` tokenizer works. So I tried the same #defines they used with v3.8.5 and still get 'unknown tokenizer: unicode61' Has something happened between versions? -- View this message in context: http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76116.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unicode61 Tokenizer
Thanks Ben. Ok that makes sense. Normally if you roll the source code yourself there are no issues since you are taking responsibility for the code being delivered to the AppStore. I know that sometimes Apple insta-rejects based upon the symbols in your binary. Collisions with private APIs makes sense too. >From what I gather though is that the ICU library adds quite a footprint ~25Mb which is kinda prohibitive relative to our app size. http://sqlite.1065341.n5.nabble.com/unicode61-FTS-tokenizer-td62499.html Dr Richard Hipp states in the above linked post that `unicode61` is independent of ICU yet I keep getting the following error: 'unknown tokenizer: unicode61' Which makes me thinks the rules for v6.1 of unicode are hard baked into the code somewhere. Worst case scenario you are saying that rolling a build of sqlite3+ICU won't get rejected from Apple AND the `unicode61` tokenizer works? This is good news. I'd still like confirmation of how to get the ICU-less version working if it is at all possible. -- View this message in context: http://sqlite.1065341.n5.nabble.com/Unicode61-Tokenizer-tp76113p76115.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unicode61 Tokenizer
I use SQLite with some custom extensions + ICU (+ some small additions to enable loading minimized ICU dat files) since almost 3 years in all our companies' iOS projects. I came across a posting once on StackOverflow with somebody saying he got a rejection - but this was never the case for us and we deliver like 3 apps to the store per month. So, just don¹t include and link against the .dylib. And then there should to my knowledge nothing stand in the way to link and roll your own ICU, neither from legal nor Apple perspective. To be on the save side, change the ICU version number for your own ICU build. ICU functions get a postfix (which is resolved when you include the headers of a particular build). So for example u_something() is resolved to u_something_53(). In practice it is however quite unlikely that these will collide, since Apple (like also other distributions) is rather conservative and doesn¹t use the very latest version. But be on the save side here and just change it. I¹ve wrapped my SQLite build into an iOS fake framework. I need to say however that embedding ICU into SQLite was quite tedious. I ended up stripping ICU to the minimum required by SQLite, needed to fiddle with defines and also had to alter a few lines of code (it becomes obvious if you try to build what needs to be changed - like a Point type that conflicts with some included header file on iOS). I unfortunately can¹t give you this project, as it belongs to the company I work for. But I can help you to create your own one. Ben Am 16.06.14 01:08 schrieb "Josh Wilson" unter: >This might seem like a dumb question but I am getting conflicting >information from web sources. > >https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1218086. >html > >I'm trying to enable unicode61 tokenizer for a build on iOS. Apple frowns >upon including your own libicu since they include their own >libicucore.dylib and restrict access to how it is used purely through >their >NSString APIs. > >The above link suggests I can simply compile sqlite with >-DSQLITE_ENABLE_FTS4_UNICODE61 >but not needing -DSQLITE_ENABLE_ICU. > >I have tried this with v3.8.5 and get errors that the unicode61 tokenizer >is not recognised. Which makes sense to me, since presumably sqlite relies >on ICU for it's definition of 'what is unicode' (as it is a moving target >it would seem). > >So is it possible to have an ICU-less build that allows the unicode61 >tokenizer? > >Or should I look at writing a custom tokenizer that hooks into the allowed >NSString API? > >Kind Regards >-- > >Josh Wilson >___ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Unicode61 Tokenizer
This might seem like a dumb question but I am getting conflicting information from web sources. https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1218086.html I'm trying to enable unicode61 tokenizer for a build on iOS. Apple frowns upon including your own libicu since they include their own libicucore.dylib and restrict access to how it is used purely through their NSString APIs. The above link suggests I can simply compile sqlite with -DSQLITE_ENABLE_FTS4_UNICODE61 but not needing -DSQLITE_ENABLE_ICU. I have tried this with v3.8.5 and get errors that the unicode61 tokenizer is not recognised. Which makes sense to me, since presumably sqlite relies on ICU for it's definition of 'what is unicode' (as it is a moving target it would seem). So is it possible to have an ICU-less build that allows the unicode61 tokenizer? Or should I look at writing a custom tokenizer that hooks into the allowed NSString API? Kind Regards -- Josh Wilson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slowish R*Tree performance
> On 15 Jun 2014, at 5:21pm, Eric Rubin-Smithwrote: > > still not good enough for my use case > (unfortunately). Any further optimization tips are highly welcome. Strongly suspect that although R*Trees produce an elegant solution to your problem, the fact that they're a general case tool will make them too slow to use for something like this. I propose an alternative solution, though I have not tried it and do not have time to try it (sorry). 1) A function which turns a numeric IP address or a block into some standard easy-to-process representation in string form. Possibly a pair of strings with the first string being an address the second being something indicating the extent of the block, perhaps something like '2001:0db8:8500:::::v::ff00:::::'. You could make it shorter by leaving out the colons but my experience is that although this leads to less data stored on disk it doesn't speed up processing by much. But if you have a great deal of data you might want to do it anyway. 2) A comparator function (maybe a SQLite function, maybe not) which takes two such addresses or blocks and returns a value indicating whether they're identical, whether block A contains block or address B, or neither. The closest I got to the above was when I needed a program which intensively searched and sorted individual IPv4 addresses. I got best results by defining a SQLite function which converted IP addresses of all formats into 'standard format' where each byte was two hex digits. All values stored in the database were stored in my 'standard' format. This allowed easy collation using standard text sorting. Everything else turned out faster to implement in my own programming language than it was to build as SQLite functions. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slowish R*Tree performance
On Sun, Jun 15, 2014 at 9:47 AM, Eric Rubin-Smithwrote: > Richard Hipp wrote: > > > What does this query return? > > > > SELECT count(*) FROM ipIndex > > WHERE minD1 <= 1220818432 and 1220818432 <= maxD1 > > AND minD2 <= 2120561472 and 2120561472 <= maxD2 > > AND minD3 <= 1685398080 and 1685398080 <= maxD3 > > AND minD4 <= 1685755328 and 1685755328 <= maxD4 > > AND minD5 <= 538331072 and 538331072 <= maxD5; > > Hm, it returns 1645. This indicates a bug (the max expected value is > 128). I'm now highly suspicious of my mathematical reasoning or my > code. I'll take a look. Thanks, Richard! > Follow-up for those who are curious. My program for randomly populating the database was creating a bunch of identical bounding boxes for short-length prefixes (i.e. prefixes corresponding to large bounding boxes). This made the R*Tree do a bunch of redundant work. Eliminating this issue led to a ~30x throughput improvement to ~6k searches per second on a database with 100k prefixes in it. After populating the database with 5.7 million such prefixes, we are at a throughput of about 2.2kTPS. Not horrible, and not sure I can expect much more out of SQLite -- but still not good enough for my use case (unfortunately). Any further optimization tips are highly welcome. In the mean time, I'm going to keep digging. Thanks again to Richard for pointing me in the right direction. Eric ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slowish R*Tree performance
Richard Hipp wrote: > What does this query return? > > SELECT count(*) FROM ipIndex > WHERE minD1 <= 1220818432 and 1220818432 <= maxD1 > AND minD2 <= 2120561472 and 2120561472 <= maxD2 > AND minD3 <= 1685398080 and 1685398080 <= maxD3 > AND minD4 <= 1685755328 and 1685755328 <= maxD4 > AND minD5 <= 538331072 and 538331072 <= maxD5; Hm, it returns 1645. This indicates a bug (the max expected value is 128). I'm now highly suspicious of my mathematical reasoning or my code. I'll take a look. Thanks, Richard! Eric ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] importing CSV with a random empty line at the end
Hello, I've taken the time to investigate the problem: only the last parameter is not correctly bound to NULL. May I suggest a patch: --- shell_.c 2014-06-15 14:22:39.0 +0200 +++ shell.c 2014-06-15 14:23:11.0 +0200 @@ -2553,7 +2553,7 @@ "filling the rest with NULL\n", sCsv.zFile, startLine, nCol, i+1); i++; - while( i wrote: > gwenn, thanks for this. I did not understand what you saw, and then I > realized my shell exe was probably too old. > I downloaded the new exe and this solves the problem just fine! > thanks for your help. > > gert > > > 2014-06-02 19:03 GMT+02:00 gwenn: > >> Hello, >> I doesn't fail for me (it may depend on the constraints on the target >> table) but the behaviour is unexpected: >> >> $ echo "1|test >> > " > empty.csv >> $ sqlite3 >> SQLite version 3.8.4.3 2014-04-03 16:53:12 >> sqlite> create table test(opt text, data text not null); >> sqlite> .import empty.csv test >> empty.csv:2: expected 2 columns but found 1 - filling the rest with NULL >> sqlite> select * from test; >> 1|test >> |test >> >> As the bindings are not cleared, it is not a null value but the >> previous bound value which is inserted. >> Regards. >> >> On Mon, Jun 2, 2014 at 5:41 PM, Gert Van Assche wrote: >> > All, >> > >> > I received 100.000 UTF-8 files (average size 50kb) "ready for import" in >> > an SQLite db. >> > 90% of them go fine, but some files have an empty line at the very end of >> > the fine (so an extra EOL before the EOF). >> > >> > Of course, the import fails... Is there an easy way to get rid of that >> > extra empty line before I import the file, or is there a way to ignore an >> > empty line? >> > >> > thanks >> > >> > Gert >> > ___ >> > sqlite-users mailing list >> > sqlite-users@sqlite.org >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] slowish R*Tree performance
On Sun, Jun 15, 2014 at 12:25 AM, Eric Rubin-Smithwrote: > > sqlite> explain query plan SELECT prefix, target FROM routeTarget WHERE id > = ( >...>SELECT id FROM ipIndex >...> WHERE minD1 <= 1220818432 and 1220818432 <= maxD1 >...> AND minD2 <= 2120561472 and 2120561472 <= maxD2 >...> AND minD3 <= 1685398080 and 1685398080 <= maxD3 >...> AND minD4 <= 1685755328 and 1685755328 <= maxD4 >...> AND minD5 <= 538331072 and 538331072 <= maxD5 >...> ORDER BY ((maxD5-minD5)*(maxD4-minD4)*(maxD3-minD3)* >...> (maxD2-minD2)*(maxD1-minD1)) ASC >...>LIMIT 1); > 0|0|0|SEARCH TABLE routeTarget USING INTEGER PRIMARY KEY (rowid=?) > 0|0|0|EXECUTE SCALAR SUBQUERY 1 > 1|0|0|SCAN TABLE ipIndex VIRTUAL TABLE INDEX 2:B0D1B2D3B4D5B6D7B8D9 > 1|0|0|USE TEMP B-TREE FOR ORDER BY > What does this query return? SELECT count(*) FROM ipIndex WHERE minD1 <= 1220818432 and 1220818432 <= maxD1 AND minD2 <= 2120561472 and 2120561472 <= maxD2 AND minD3 <= 1685398080 and 1685398080 <= maxD3 AND minD4 <= 1685755328 and 1685755328 <= maxD4 AND minD5 <= 538331072 and 538331072 <= maxD5; -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users