Re: [sqlite] Sqlite insertion performance
On 17/09/2007, John Machin <[EMAIL PROTECTED]> wrote: > On 17/09/2007 1:07 PM, Joe Wilson wrote: > > --- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote: > > I have been struggling with the performance of insertion in sqlite. > > > > Here we have a very simple case : > > > > A table with an integer autoincrement primary key and a text > > field that is > > unique. > > > > CREATE TABLE my (id PRIMARY KEY, url); > > > > CREATE UNIQUE INDEX myurl ON my(url); > > > > > > My application requires inserting up to 10 million records in > > batches of > > 20 thousand records. > >> For each group of 2 records, first insert them into a TEMP table. > >> Call the temp table t1. Then transfer the records to the main table > >> as follows: > >> > >> INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url; > > > > I had no performance improvement with that temp store staging table > > technique in my testing - actually it was slower. > > > > http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html > > > > Mind you, the table I was testing against had 4 indexes, whereas the above > > table has 2. I also wasn't using "OR IGNORE". There might be a difference. > > > > Just setting pragma cache_size to a huge value and inserting into > > the table normally in large batches resulted in better performance in > > my case. It may have already been mentioned, but having a big > > database page_size value helps minimize the disk writes as well. > > A couple of thoughts: > > OTTOMH, time to search index is approx O(D * log(K)) where D = depth of > tree and K = number of keys per block, and K ** D is O(N) > > So: > > (1) Big block means big K and thus small D > > (2) Long keys (like URLs!!) means small K and thus big D > > Further on point (2), the OP seems unsure about whether his URLs are > unique or not. Perhaps storing another column containing a 64-bit hash > with an index on that column might be the way to go -- shorter key might > might well outweigh the extra cost of checking for duplicates. > A thought along the same lines, can sqlite create a unique index that is hash-based? this would provide the UNIQUE support, but it wouldn't provide a sorted index. That should resolve the massive-insert-too-slow problem, and afterwards he can create a sorted index on the column if he needs ordered lookups. Alternatively, he can go without the UNIQUE index on the initial inserts, and delete the duplicates later. off top of head, it might go something like: create index (not unique) select t2.id from table as t1, table as t2 where t1.url = t2.url and t1.id < t2.id; that would give you a list of the row ids that have repeated an earlier row's url. then just build a id list (comma separated) and delete from table where id in (id list); NB: you can do the select and delete step in some DBs, not sure about sqlite. and THEN drop the above index, and create the final unique index. see ya - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An example for "progress" method?
I have used the prepare, step, finalize methods in order to implement the progress callback and it works fine. However, I would like to know couple of things. a. Whether the sqlite3_exec function is better in terms of performance to receive the callback? b. Will sqlite3_interrupt function stop the query execution immediately upon request or does it continue the current instruction and then stop or any thing as such? Please advise, Thanks in advance, Bharath On 9/18/07 5:13 AM, "John Stanton" <[EMAIL PROTECTED]> wrote: > Zbigniew Baniewski wrote: >> An interesting method is "progress": >> >> "The progress callback can be used to display the status of a lengthy query >>or to process GUI events during a lengthy query." >> >> But I'm not quite sure presently, how it could look like in practice? To make >> a "progress bar" I've got to know a maximal value of the records (table >> rows) involved in a query BEFORE it'll be caused to run. >> >> Perhaps again I've missed some simple thing(?) - but currently I don't know, >> how can it be done in a simple way. The methods "changes" and "total_changes" >> are giving the number or rows involved AFTER the query is done. So, how can >> I count it all before, to make a nice looking progress bar? Just by making >> additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "? >> >> Perhaps someone could show me some example? > You cannot know the result of a query until it is finished, and when it > is finished there is no need for a progress bar. > > The obvious solution is to not use a bar but to have another form of > progress indicator. > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > --- Robosoft Technologies - Come home to Technology Disclaimer: This email may contain confidential material. If you were not an intended recipient, please notify the sender and delete all copies. Emails to and from our network may be logged and monitored. This email and its attachments are scanned for virus by our scanners and are believed to be safe. However, no warranty is given that this email is free of malicious content or virus. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An example for "progress" method?
On 9/18/07 5:00 AM, "Zbigniew Baniewski" <[EMAIL PROTECTED]> wrote: > An interesting method is "progress": > > "The progress callback can be used to display the status of a lengthy query >or to process GUI events during a lengthy query." > > But I'm not quite sure presently, how it could look like in practice? To make > a "progress bar" I've got to know a maximal value of the records (table > rows) involved in a query BEFORE it'll be caused to run. > > Perhaps again I've missed some simple thing(?) - but currently I don't know, > how can it be done in a simple way. The methods "changes" and "total_changes" > are giving the number or rows involved AFTER the query is done. So, how can > I count it all before, to make a nice looking progress bar? Just by making > additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "? Yes, this could be one way to find the max value, but is feasible as long as we can afford the time taken to execute the query. One more way I would suggest is that, you could predict average number records your query would retrieve in all circumstances. Just a suggestion :-). It works well for me. --- Robosoft Technologies - Come home to Technology Disclaimer: This email may contain confidential material. If you were not an intended recipient, please notify the sender and delete all copies. Emails to and from our network may be logged and monitored. This email and its attachments are scanned for virus by our scanners and are believed to be safe. However, no warranty is given that this email is free of malicious content or virus. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An example for "progress" method?
Zbigniew Baniewski wrote: An interesting method is "progress": "The progress callback can be used to display the status of a lengthy query or to process GUI events during a lengthy query." But I'm not quite sure presently, how it could look like in practice? To make a "progress bar" I've got to know a maximal value of the records (table rows) involved in a query BEFORE it'll be caused to run. Perhaps again I've missed some simple thing(?) - but currently I don't know, how can it be done in a simple way. The methods "changes" and "total_changes" are giving the number or rows involved AFTER the query is done. So, how can I count it all before, to make a nice looking progress bar? Just by making additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "? Perhaps someone could show me some example? You cannot know the result of a query until it is finished, and when it is finished there is no need for a progress bar. The obvious solution is to not use a bar but to have another form of progress indicator. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An example for "progress" method?
Zbigniew Baniewski <[EMAIL PROTECTED]> wrote: > An interesting method is "progress": > > "The progress callback can be used to display the status of a lengthy query >or to process GUI events during a lengthy query." > > But I'm not quite sure presently, how it could look like in practice? To make > a "progress bar" I've got to know a maximal value of the records (table > rows) involved in a query BEFORE it'll be caused to run. > You are correct - you cannot do a progress bar that shows percent complete. But you can do a graphic that shows the operation is in process (perhaps spinning gears and a caption that says "Working...") and provide the user with a "Cancel" button. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] An example for "progress" method?
An interesting method is "progress": "The progress callback can be used to display the status of a lengthy query or to process GUI events during a lengthy query." But I'm not quite sure presently, how it could look like in practice? To make a "progress bar" I've got to know a maximal value of the records (table rows) involved in a query BEFORE it'll be caused to run. Perhaps again I've missed some simple thing(?) - but currently I don't know, how can it be done in a simple way. The methods "changes" and "total_changes" are giving the number or rows involved AFTER the query is done. So, how can I count it all before, to make a nice looking progress bar? Just by making additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "? Perhaps someone could show me some example? -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Opinions about per-row tokenizers for fts?
As part of doing internationalization work on Gears, it has been determined that it is unlikely that you can just define a global tokenizer that will work for everything. Instead, in some cases you may need to use a specific tokenizer, based on the content being tokenized, or the source of the content. This can be emulated by using multiple tables and complicating your joins, but it would be nicer if fts could just accommodate this use case. In the interests of not committing something that people won't like, my current proposal would be to add an implicit TOKENIZER column, which will override the table's default tokenizer for that row. So, you could do something like: CREATE VIRTUAL TABLE t USING fts3(TOKENIZER icu(en), content); INSERT INTO t VALUES ('testing testing'); -- Uses icu(en). INSERT INTO t (tokenizer, content) VALUES ('icu(kr)', '나의'); -- Uses icu(kr). SELECT rowid FROM t WHERE t MATCH 'TOKENIZER:icu(kr) 의'; [Forgive me if you can read Korean and I just did something offensive. I'm doing copy/paste, here!] fts allows for anything starting with 'tokenize' in that location in the CREATE statement, so in the above all uses must match. If you used "TOKENIZE" in the create, you use "TOKENIZE" everywhere else. In MATCH, it must be the uppercase term use from the create (the other places are case-insensitive), followed by : followed by a valid tokenizer name followed by an optional parameter list. A variant which I think is somewhat interesting would be: CREATE VIRTUAL TABLE t USING fts3(tokenizer TOKENIZER DEFAULT icu(en), content); This makes the "tokenizer" column a bit more explicit, and the 'DEFAULT ...' syntax makes it clear what's going on, but I couldn't really think of any other sensible name for the column, so it also feels redundent. Since 'tokenize' is already a reserved prefix for fts, I'm inclined towards the first variant. Opinions? -scott
Re: [sqlite] HELP!!!! LINKING AND LOADING FTS - on Linux/Ubuntu
is the fts2 a library? if so you probably need to put it in a directory that is in the list of library directory's. You also may need to run ldconfig to update the dynamic linker as the the whereabouts of your new lib. use "man ldconfig" for the how-to Scott Uma Krishnan wrote: Hello, I'm having trouble loading fts2. I modified makefile to create fts2 library on Linux/Ubuntu. When I attempt to load fts2 using the command: select load_extension('fts2'), i get the error shared library not found. ( noticed that it had not created the .so file, only .la file.) I do have the LD_LIBRARY set up correctly. What am I doing wrong? Thanks in advance Uma Uma Krishnan <[EMAIL PROTECTED]> wrote: Hello, I'm having trouble loading fts2. I modified makefile to create fts2 library on Linux/Ubuntu.I When I attempt to load fts2 using the command select load_extension('fts2'), i get the error shared library not found. ( noticed that it had not created the .so file, only .la file.) What am I doing wrong? Thanks in advance Uma Igor Tandetnik wrote: Kefah T. Issa wrote: I tried the ordered-urls-insert the results were better, but it is still taking progressively longer time as the number of records increases. A fundamental question to be asked here : Shouldn't the time complexity (Big-O) of the insert operation be constant? Of course not. It takes O(log N) to find an appropriate place in the index for every new record (where N is the number of records already inserted). Also, it generates a lot of disk activity once the index grows too large to fit in memory cache. I even did a third test where the integer primary key is not auto increment; the same problem is observed. The id is not a problem: O(log N) is caused by the index on url. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] - -- - Rightful liberty is unobstructed action according to our will within limits drawn around us by the equal rights of others. I do not add "within the limits of the law," because law is often but the tyrant's will, and always so when it violates the rights of the individual. Thomas Jefferson - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] extension-functions.tgz for sqlite3 3.4.1 ?
Thank you Nuno and Joe for your help. I have posted a new version, now called extension-functions.c, which works on external interfaces only and therefore does not require the sqlite3 source code. I have made everything a single C file with instructions as a comment at the top, hence no need for a tarball. I made some revisions so that it will compile without warnings under Mac OS X, which is fussier about unsigned vs. signed chars. Liam On 9/15/07, Nuno Lucas <[EMAIL PROTECTED]> wrote: > > On 9/14/07, Liam Healy <[EMAIL PROTECTED]> wrote: > > I tried eliminating sqliteInt.h and replacing with explicit declarations > > of i64 and u8. That part worked fine, but func_ext.c also uses > > sqliteMalloc > > void *sqlite3_malloc(int); > void *sqlite3_realloc(void*, int); > void sqlite3_free(void*); > > are the right functions to use (they are in sqlite3.h). > > > which is also defined in sqliteInt.h which led me down a rabbit hole of > > pulling > > more and more from sqliteInt.h, and I still can't eliminate the errors > and > > warnings. As a reminder, I didn't write the original source code, and I > > have > > only the vaguest sense of the meaning and need for these functions. So, > if > > anyone has any insight on how to accomplish the same goal without using > > internal definitions, I'd appreciate hearing about it. > > I attached a patch with the required changes just to compile using > only . > Used the current source code of the extensions on the contrib page. > > This is not enough to create a sqlite module, but at least it compiles > without using the private sqlite headers. > > > Regards, > ~Nuno Lucas > > > > > Liam > >
[sqlite] About cloning a table in a new dBase
Hi all: Suppose a dBase named dBase1 and a table declared in it: CREATE TABLE 'tb1' (Id INTEGER PRIMARY KEY, Nm INTEGER, Ci INTEGER); Then, after some inserts I open a new dBase, say dBase2, and duplicate dBase1.tb1 with the C API functions executing the querys: "ATTACH dBase1 AS dbOriginal" "CREATE TABLE tb1 AS SELECT * FROM dbOriginal.tb1" The new table is created and filled with proper values, but the new table schema isn't the same: sqlite>.schema tb1 CREATE TABLE 'tb1' (Id INTEGER, Nm INTEGER, Ci INTEGER); The PRIMARY KEY and any other constraints are lost. Is this the usual behavior? Must I use again the complete create sentence to get the desired design in the new table? Thanks in advance A.J.Millán ZATOR Systems - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] HELP!!!! LINKING AND LOADING FTS - on Linux/Ubuntu
To help isolate if it's actually a library path problem, you might try using the full path. So, instead of: select load_extension('fts2'); do: select load_extension('/path/to/libfts2.so'); Modified for Windows as appropriate (sorry, I mostly use Linux and static linking). If that works, then it's likely that you have a path problem. If that doesn't work, it's something else. -scott On 9/17/07, Scott Derrick <[EMAIL PROTECTED]> wrote: > is the fts2 a library? > > if so you probably need to put it in a directory that is in the list of > library directory's. You also may need to run ldconfig to update the > dynamic linker as the the whereabouts of your new lib. > > use "man ldconfig" for the how-to > > Scott > > Uma Krishnan wrote: > > Hello, > > > > I'm having trouble loading fts2. I modified makefile to create fts2 > > library on Linux/Ubuntu. > > > > When I attempt to load fts2 using the command: > > select load_extension('fts2'), i get the error shared library not found. > > ( noticed that it had not created the .so file, only .la file.) > > > > I do have the LD_LIBRARY set up correctly. > > > > What am I doing wrong? > > > > Thanks in advance > > > > Uma > > > > > > Uma Krishnan <[EMAIL PROTECTED]> wrote: Hello, > > > > I'm having trouble loading fts2. I modified makefile to create fts2 library > > on Linux/Ubuntu.I > > > > When I attempt to load fts2 using the command select > > load_extension('fts2'), i get the error shared library not found. > > ( noticed that it had not created the .so file, only .la file.) > > > > What am I doing wrong? > > > > Thanks in advance > > > > Uma > > > > Igor Tandetnik wrote: Kefah T. Issa wrote: > >>> I tried the ordered-urls-insert the results were better, but it is > >>> still > >>> taking progressively longer time as the number of records increases. > >>> > >>> A fundamental question to be asked here : > >>> > >>> Shouldn't the time complexity (Big-O) of the insert operation be > >>> constant? > > > > Of course not. It takes O(log N) to find an appropriate place in the > > index for every new record (where N is the number of records already > > inserted). Also, it generates a lot of disk activity once the index > > grows too large to fit in memory cache. > > > >>> I even did a third test where the integer primary key is not auto > >>> increment; > >>> the same problem is observed. > > > > The id is not a problem: O(log N) is caused by the index on url. > > > > Igor Tandetnik > > > > > > - > > To unsubscribe, send email to [EMAIL PROTECTED] > > - > > > > > > > > > > -- > > - > Rightful liberty is unobstructed action according to our will > within limits drawn around us by the equal rights of others. I do not > add "within the limits of the law," because law is often but the > tyrant's will, and always so when it violates the rights of the individual. > > Thomas Jefferson > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] About cloning a table in a new dBase
"A.J.Millan" <[EMAIL PROTECTED]> wrote: > > Must I use again the complete create sentence to get the desired design in > the new table? > yes. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] HELP!!!! LINKING AND LOADING FTS - on Linux/Ubuntu
Hello, I'm having trouble loading fts2. I modified makefile to create fts2 library on Linux/Ubuntu. When I attempt to load fts2 using the command: select load_extension('fts2'), i get the error shared library not found. ( noticed that it had not created the .so file, only .la file.) I do have the LD_LIBRARY set up correctly. What am I doing wrong? Thanks in advance Uma Uma Krishnan <[EMAIL PROTECTED]> wrote: Hello, I'm having trouble loading fts2. I modified makefile to create fts2 library on Linux/Ubuntu.I When I attempt to load fts2 using the command select load_extension('fts2'), i get the error shared library not found. ( noticed that it had not created the .so file, only .la file.) What am I doing wrong? Thanks in advance Uma Igor Tandetnik wrote: Kefah T. Issa wrote: >> I tried the ordered-urls-insert the results were better, but it is >> still >> taking progressively longer time as the number of records increases. >> >> A fundamental question to be asked here : >> >> Shouldn't the time complexity (Big-O) of the insert operation be >> constant? Of course not. It takes O(log N) to find an appropriate place in the index for every new record (where N is the number of records already inserted). Also, it generates a lot of disk activity once the index grows too large to fit in memory cache. >> I even did a third test where the integer primary key is not auto >> increment; >> the same problem is observed. The id is not a problem: O(log N) is caused by the index on url. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] 3.4.2 (or 3.5.0) on Solaris 10?
On 14 Sep 2007, at 16:20, Tim Bradshaw wrote: Has anyone successfully got either of these to build on Solaris 10, using the gcc that ships with it? I've tried on 10u4 on x86 and (after fixing the known problem with B_FALSE/B_TRUE for 3.4.2) they both failed sometime while linking. I just did a ./configure --prefix/what/ever with no special options. To answer my own question, the answer is whether or not to build static libraries. I'm not sure you can at all on Solaris 10, but even if you can whatever configure generates can't. So this works: ../sqlite-3.4.2/configure --prefix=/what/ever --enable-static=no --tim - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] 3.4.2 (or 3.5.0) on Solaris 10?
On 17 Sep 2007, at 05:58, Halton Huo wrote: I did not build sqlite on Solaris 10, but I do build it on Solaris Express Community Edition successfully. Are you using Sun's compiler or gcc? I guess I could install some version of Studio on my host if the former... --tim - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Multi-User confusion
On Mon, Sep 17, 2007 at 02:29:58PM +, [EMAIL PROTECTED] wrote: > http://www.sqlite.org/pragma.html#pragma_locking_mode Oh, boy... missed entire set of "pragma" commands. Thanks. -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Multi-User confusion
Zbigniew Baniewski <[EMAIL PROTECTED]> wrote: > On Sun, Sep 16, 2007 at 08:42:44PM -0700, Joe Wilson wrote: > > > Host a shared database file on computer A, say shared.db. > > >From computer B, open shared.db remotely and execute "BEGIN EXCLUSIVE;". > > >From computer C, open shared.db remotely and execute "BEGIN EXCLUSIVE;". > > If computer C has the error "SQL error: database is locked", then > > its locking probably works. > > BTW: I'm wondering, if there's a possibility to set in similar manner > exclusive rights to access the database file for the duration of the entire > "database session", not just transaction. I mean: when I'm using a program > which is accessing the database - nobody else has access. http://www.sqlite.org/pragma.html#pragma_locking_mode -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Multi-User confusion
On Sun, Sep 16, 2007 at 08:42:44PM -0700, Joe Wilson wrote: > Host a shared database file on computer A, say shared.db. > >From computer B, open shared.db remotely and execute "BEGIN EXCLUSIVE;". > >From computer C, open shared.db remotely and execute "BEGIN EXCLUSIVE;". > If computer C has the error "SQL error: database is locked", then > its locking probably works. BTW: I'm wondering, if there's a possibility to set in similar manner exclusive rights to access the database file for the duration of the entire "database session", not just transaction. I mean: when I'm using a program which is accessing the database - nobody else has access. -- pozdrawiam / regards Zbigniew Baniewski - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Sqlite insertion performance
On 17/09/2007 1:07 PM, Joe Wilson wrote: --- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote: I have been struggling with the performance of insertion in sqlite. Here we have a very simple case : A table with an integer autoincrement primary key and a text field that is unique. CREATE TABLE my (id PRIMARY KEY, url); CREATE UNIQUE INDEX myurl ON my(url); My application requires inserting up to 10 million records in batches of 20 thousand records. For each group of 2 records, first insert them into a TEMP table. Call the temp table t1. Then transfer the records to the main table as follows: INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url; I had no performance improvement with that temp store staging table technique in my testing - actually it was slower. http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html Mind you, the table I was testing against had 4 indexes, whereas the above table has 2. I also wasn't using "OR IGNORE". There might be a difference. Just setting pragma cache_size to a huge value and inserting into the table normally in large batches resulted in better performance in my case. It may have already been mentioned, but having a big database page_size value helps minimize the disk writes as well. A couple of thoughts: OTTOMH, time to search index is approx O(D * log(K)) where D = depth of tree and K = number of keys per block, and K ** D is O(N) So: (1) Big block means big K and thus small D (2) Long keys (like URLs!!) means small K and thus big D Further on point (2), the OP seems unsure about whether his URLs are unique or not. Perhaps storing another column containing a 64-bit hash with an index on that column might be the way to go -- shorter key might might well outweigh the extra cost of checking for duplicates. - To unsubscribe, send email to [EMAIL PROTECTED] -