Re: [sqlite] Sqlite insertion performance

2007-09-17 Thread Paul Harris
On 17/09/2007, John Machin <[EMAIL PROTECTED]> wrote:
> On 17/09/2007 1:07 PM, Joe Wilson wrote:
> > --- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
> > I have been struggling with the performance of insertion in sqlite.
> >
> > Here we have a very simple case :
> >
> > A table with an integer autoincrement primary key and a text
> > field that is
> > unique.
> >
> > CREATE TABLE my (id PRIMARY KEY, url);
> >
> > CREATE UNIQUE INDEX myurl ON my(url);
> >
> >
> > My application requires inserting up to 10 million records in
> > batches of
> > 20 thousand records.
> >> For each group of 2 records, first insert them into a TEMP table.
> >> Call the temp table t1.  Then transfer the records to the main table
> >> as follows:
> >>
> >>   INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;
> >
> > I had no performance improvement with that temp store staging table
> > technique in my testing - actually it was slower.
> >
> >   http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html
> >
> > Mind you, the table I was testing against had 4 indexes, whereas the above
> > table has 2. I also wasn't using "OR IGNORE". There might be a difference.
> >
> > Just setting pragma cache_size to a huge value and inserting into
> > the table normally in large batches resulted in better performance in
> > my case. It may have already been mentioned, but having a big
> > database page_size value helps minimize the disk writes as well.
>
> A couple of thoughts:
>
> OTTOMH, time to search index is approx O(D * log(K)) where D = depth of
> tree and K = number of keys per block, and K ** D is O(N)
>
> So:
>
> (1) Big block means big K and thus small D
>
> (2) Long keys (like URLs!!) means small K and thus big D
>
> Further on point (2), the OP seems unsure about whether his URLs are
> unique or not. Perhaps storing another column containing a 64-bit hash
> with an index on that column might be the way to go -- shorter key might
> might well outweigh the extra cost of checking for duplicates.
>


A thought along the same lines, can sqlite create a unique index that
is hash-based?  this would provide the UNIQUE support, but it wouldn't
provide a sorted index.

That should resolve the massive-insert-too-slow problem, and
afterwards he can create a sorted index on the column if he needs
ordered lookups.



Alternatively, he can go without the UNIQUE index on the initial
inserts, and delete the duplicates later.  off top of head, it might
go something like:

create index (not unique)
select t2.id from table as t1, table as t2 where t1.url = t2.url and
t1.id < t2.id;

that would give you a list of the row ids that have repeated an
earlier row's url.
then just build a id list (comma separated) and

delete from table where id in (id list);
NB: you can do the select and delete step in some DBs, not sure about sqlite.

and THEN drop the above index, and create the final unique index.

see ya

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An example for "progress" method?

2007-09-17 Thread Bharath Booshan L
I have used the prepare, step, finalize methods in order to implement the
progress callback and it works fine.

However, I would like to know couple of things.

a. Whether the sqlite3_exec function is better in terms of performance to
receive the callback?

b. Will sqlite3_interrupt function stop the query execution immediately upon
request or does it continue the current instruction and then stop or any
thing as such?

Please advise,

Thanks in advance,

Bharath


On 9/18/07 5:13 AM, "John Stanton" <[EMAIL PROTECTED]> wrote:

> Zbigniew Baniewski wrote:
>> An interesting method is "progress":
>> 
>>   "The progress callback can be used to display the status of a lengthy query
>>or to process GUI events during a lengthy query."
>> 
>> But I'm not quite sure presently, how it could look like in practice? To make
>> a "progress bar" I've got to know a maximal value of the records (table
>> rows) involved in a query BEFORE it'll be caused to run.
>> 
>> Perhaps again I've missed some simple thing(?) - but currently I don't know,
>> how can it be done in a simple way. The methods "changes" and "total_changes"
>> are giving the number or rows involved AFTER the query is done. So, how can
>> I count it all before, to make a nice looking progress bar? Just by making
>> additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "?
>> 
>> Perhaps someone could show me some example?
> You cannot know the result of a query until it is finished, and when it
> is finished there is no need for a progress bar.
> 
> The obvious solution is to not use a bar but to have another form of
> progress indicator.
> 
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
> 
> 



---
Robosoft Technologies - Come home to Technology

Disclaimer: This email may contain confidential material. If you were not an 
intended recipient, please notify the sender and delete all copies. Emails to 
and from our network may be logged and monitored. This email and its 
attachments are scanned for virus by our scanners and are believed to be safe. 
However, no warranty is given that this email is free of malicious content or 
virus.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An example for "progress" method?

2007-09-17 Thread Bharath Booshan L



On 9/18/07 5:00 AM, "Zbigniew Baniewski" <[EMAIL PROTECTED]> wrote:

> An interesting method is "progress":
> 
>   "The progress callback can be used to display the status of a lengthy query
>or to process GUI events during a lengthy query."
> 
> But I'm not quite sure presently, how it could look like in practice? To make
> a "progress bar" I've got to know a maximal value of the records (table
> rows) involved in a query BEFORE it'll be caused to run.
> 
> Perhaps again I've missed some simple thing(?) - but currently I don't know,
> how can it be done in a simple way. The methods "changes" and "total_changes"
> are giving the number or rows involved AFTER the query is done. So, how can
> I count it all before, to make a nice looking progress bar? Just by making
> additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "?


Yes, this could be one way to find the max value, but is feasible as long as
we can afford the time taken to execute the query.

One more way I would suggest is that, you could predict average number
records your query would retrieve in all circumstances. Just a suggestion
:-). It works well for me.




---
Robosoft Technologies - Come home to Technology

Disclaimer: This email may contain confidential material. If you were not an 
intended recipient, please notify the sender and delete all copies. Emails to 
and from our network may be logged and monitored. This email and its 
attachments are scanned for virus by our scanners and are believed to be safe. 
However, no warranty is given that this email is free of malicious content or 
virus.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An example for "progress" method?

2007-09-17 Thread John Stanton

Zbigniew Baniewski wrote:

An interesting method is "progress":

  "The progress callback can be used to display the status of a lengthy query
   or to process GUI events during a lengthy query."

But I'm not quite sure presently, how it could look like in practice? To make
a "progress bar" I've got to know a maximal value of the records (table
rows) involved in a query BEFORE it'll be caused to run.

Perhaps again I've missed some simple thing(?) - but currently I don't know,
how can it be done in a simple way. The methods "changes" and "total_changes"
are giving the number or rows involved AFTER the query is done. So, how can
I count it all before, to make a nice looking progress bar? Just by making
additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "?

Perhaps someone could show me some example?
You cannot know the result of a query until it is finished, and when it 
is finished there is no need for a progress bar.


The obvious solution is to not use a bar but to have another form of 
progress indicator.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An example for "progress" method?

2007-09-17 Thread drh
Zbigniew Baniewski <[EMAIL PROTECTED]> wrote:
> An interesting method is "progress":
> 
>   "The progress callback can be used to display the status of a lengthy query
>or to process GUI events during a lengthy query."
> 
> But I'm not quite sure presently, how it could look like in practice? To make
> a "progress bar" I've got to know a maximal value of the records (table
> rows) involved in a query BEFORE it'll be caused to run.
> 

You are correct - you cannot do a progress bar that shows percent
complete.  But you can do a graphic that shows the operation is
in process (perhaps spinning gears and a caption that says 
"Working...") and provide the user with a "Cancel" button.
--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] An example for "progress" method?

2007-09-17 Thread Zbigniew Baniewski
An interesting method is "progress":

  "The progress callback can be used to display the status of a lengthy query
   or to process GUI events during a lengthy query."

But I'm not quite sure presently, how it could look like in practice? To make
a "progress bar" I've got to know a maximal value of the records (table
rows) involved in a query BEFORE it'll be caused to run.

Perhaps again I've missed some simple thing(?) - but currently I don't know,
how can it be done in a simple way. The methods "changes" and "total_changes"
are giving the number or rows involved AFTER the query is done. So, how can
I count it all before, to make a nice looking progress bar? Just by making
additional, earlier query like "SELECT COUNT(*) FROM xyz WHERE "?

Perhaps someone could show me some example?
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Opinions about per-row tokenizers for fts?

2007-09-17 Thread Scott Hess
As part of doing internationalization work on Gears, it has been
determined that it is unlikely that you can just define a global
tokenizer that will work for everything.  Instead, in some cases you
may need to use a specific tokenizer, based on the content being
tokenized, or the source of the content.  This can be emulated by
using multiple tables and complicating your joins, but it would be
nicer if fts could just accommodate this use case.

In the interests of not committing something that people won't like,
my current proposal would be to add an implicit TOKENIZER column,
which will override the table's default tokenizer for that row.  So,
you could do something like:

   CREATE VIRTUAL TABLE t USING fts3(TOKENIZER icu(en), content);
   INSERT INTO t VALUES ('testing testing');  -- Uses icu(en).
   INSERT INTO t (tokenizer, content) VALUES ('icu(kr)', '나의');  --
Uses icu(kr).
   SELECT rowid FROM t WHERE t MATCH 'TOKENIZER:icu(kr) 의';

[Forgive me if you can read Korean and I just did something offensive.
 I'm doing copy/paste, here!]

fts allows for anything starting with 'tokenize' in that location in
the CREATE statement, so in the above all uses must match.  If you
used "TOKENIZE" in the create, you use "TOKENIZE" everywhere else.  In
MATCH, it must be the uppercase term use from the create (the other
places are case-insensitive), followed by : followed by a valid
tokenizer name followed by an optional parameter list.

A variant which I think is somewhat interesting would be:

  CREATE VIRTUAL TABLE t USING fts3(tokenizer TOKENIZER DEFAULT
icu(en), content);

This makes the "tokenizer" column a bit more explicit, and the
'DEFAULT ...' syntax makes it clear what's going on, but I couldn't
really think of any other sensible name for the column, so it also
feels redundent.  Since 'tokenize' is already a reserved prefix for
fts, I'm inclined towards the first variant.

Opinions?

-scott


Re: [sqlite] HELP!!!! LINKING AND LOADING FTS - on Linux/Ubuntu

2007-09-17 Thread Scott Derrick

is the fts2 a library?

if so you probably need to put it in a directory that is in the list of 
library directory's.  You also may need to run ldconfig to update the 
dynamic linker as the the whereabouts of your new lib.


use "man ldconfig" for the how-to

Scott

Uma Krishnan wrote:

Hello,

I'm having trouble loading fts2. I modified makefile to create fts2
 library on Linux/Ubuntu.

When I attempt to load fts2 using the command:
 select load_extension('fts2'), i get the error shared library not found.
( noticed that it had not created the .so file, only .la file.)

I do have the LD_LIBRARY set up correctly. 


What am I doing wrong?

Thanks in advance

Uma


Uma Krishnan <[EMAIL PROTECTED]> wrote: Hello,

I'm having trouble loading fts2. I modified makefile to create fts2 library on 
Linux/Ubuntu.I

When I attempt to load fts2 using the command select load_extension('fts2'), i 
get the error shared library not found.
( noticed that it had not created the .so file, only .la file.)

What am I doing wrong?

Thanks in advance

Uma

Igor Tandetnik  wrote: Kefah T. Issa  wrote:

I tried the ordered-urls-insert the results were better, but it is
still
taking progressively longer time as the number of records increases.

A fundamental question to be asked here :

Shouldn't the time complexity (Big-O) of the insert operation be
constant?


Of  course not. It takes O(log N) to find an appropriate place in the 
index for every new record (where N is the number of records already 
inserted). Also, it generates a lot of disk activity once the index 
grows too large to fit in memory cache.



I even did a third test where the integer primary key is not auto
increment;
the same problem is observed.


The id is not a problem: O(log N) is caused by the index on url.

Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-






--

-
Rightful liberty is unobstructed action according to our will 
within limits drawn around us by the equal rights of others. I do not 
add "within the limits of the law," because law is often but the 
tyrant's will, and always so when it violates the rights of the individual.


Thomas Jefferson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] extension-functions.tgz for sqlite3 3.4.1 ?

2007-09-17 Thread Liam Healy
Thank you Nuno and Joe for your help.  I have posted a new version, now
called extension-functions.c, which works on external interfaces only and
therefore does not require the sqlite3 source code.  I have made everything
a single C file with instructions as a comment at the top, hence no need for
a tarball.  I made some revisions so that it will compile without warnings
under Mac OS X, which is fussier about unsigned vs. signed chars.

Liam

On 9/15/07, Nuno Lucas <[EMAIL PROTECTED]> wrote:
>
> On 9/14/07, Liam Healy <[EMAIL PROTECTED]> wrote:
> > I tried eliminating sqliteInt.h and replacing with explicit declarations
> > of i64 and u8.   That part worked fine, but func_ext.c also uses
> > sqliteMalloc
>
> void *sqlite3_malloc(int);
> void *sqlite3_realloc(void*, int);
> void sqlite3_free(void*);
>
> are the right functions to use (they are in sqlite3.h).
>
> > which is also defined in sqliteInt.h which led me down a rabbit hole of
> > pulling
> > more and more from sqliteInt.h, and I still can't eliminate the errors
> and
> > warnings.  As a reminder, I didn't write the original source code, and I
> > have
> > only the vaguest sense of the meaning and need for these functions.  So,
> if
> > anyone has any insight on how to accomplish the same goal without using
> > internal definitions, I'd appreciate hearing about it.
>
> I attached a patch with the required changes just to compile using
> only .
> Used the current source code of the extensions on the contrib page.
>
> This is not enough to create a sqlite module, but at least it compiles
> without using the private sqlite headers.
>
>
> Regards,
> ~Nuno Lucas
>
> >
> > Liam
>
>


[sqlite] About cloning a table in a new dBase

2007-09-17 Thread A.J.Millan

Hi all:

Suppose a dBase named dBase1 and a table declared in it:

CREATE TABLE 'tb1' (Id INTEGER PRIMARY KEY,  Nm INTEGER,  Ci INTEGER);

Then, after some inserts I open a new dBase, say dBase2, and duplicate 
dBase1.tb1 with the C API functions executing the querys:


"ATTACH dBase1 AS dbOriginal"
"CREATE TABLE tb1 AS SELECT * FROM dbOriginal.tb1"

The new table is created and filled with proper values, but the new table 
schema isn't the same:


sqlite>.schema tb1
CREATE TABLE 'tb1' (Id INTEGER,  Nm INTEGER,   Ci INTEGER);

The PRIMARY KEY and any other constraints are lost.

Is this the usual behavior?

Must I use again the complete create sentence to get the desired design in 
the new table?


Thanks in advance

A.J.Millán
ZATOR Systems


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] HELP!!!! LINKING AND LOADING FTS - on Linux/Ubuntu

2007-09-17 Thread Scott Hess
To help isolate if it's actually a library path problem, you might try
using the full path.  So, instead of:

   select load_extension('fts2');

do:

  select load_extension('/path/to/libfts2.so');

Modified for Windows as appropriate (sorry, I mostly use Linux and
static linking).  If that works, then it's likely that you have a path
problem.  If that doesn't work, it's something else.

-scott


On 9/17/07, Scott Derrick <[EMAIL PROTECTED]> wrote:
> is the fts2 a library?
>
> if so you probably need to put it in a directory that is in the list of
> library directory's.  You also may need to run ldconfig to update the
> dynamic linker as the the whereabouts of your new lib.
>
> use "man ldconfig" for the how-to
>
> Scott
>
> Uma Krishnan wrote:
> > Hello,
> >
> > I'm having trouble loading fts2. I modified makefile to create fts2
> >  library on Linux/Ubuntu.
> >
> > When I attempt to load fts2 using the command:
> >  select load_extension('fts2'), i get the error shared library not found.
> > ( noticed that it had not created the .so file, only .la file.)
> >
> > I do have the LD_LIBRARY set up correctly.
> >
> > What am I doing wrong?
> >
> > Thanks in advance
> >
> > Uma
> >
> >
> > Uma Krishnan <[EMAIL PROTECTED]> wrote: Hello,
> >
> > I'm having trouble loading fts2. I modified makefile to create fts2 library 
> > on Linux/Ubuntu.I
> >
> > When I attempt to load fts2 using the command select 
> > load_extension('fts2'), i get the error shared library not found.
> > ( noticed that it had not created the .so file, only .la file.)
> >
> > What am I doing wrong?
> >
> > Thanks in advance
> >
> > Uma
> >
> > Igor Tandetnik  wrote: Kefah T. Issa  wrote:
> >>> I tried the ordered-urls-insert the results were better, but it is
> >>> still
> >>> taking progressively longer time as the number of records increases.
> >>>
> >>> A fundamental question to be asked here :
> >>>
> >>> Shouldn't the time complexity (Big-O) of the insert operation be
> >>> constant?
> >
> > Of  course not. It takes O(log N) to find an appropriate place in the
> > index for every new record (where N is the number of records already
> > inserted). Also, it generates a lot of disk activity once the index
> > grows too large to fit in memory cache.
> >
> >>> I even did a third test where the integer primary key is not auto
> >>> increment;
> >>> the same problem is observed.
> >
> > The id is not a problem: O(log N) is caused by the index on url.
> >
> > Igor Tandetnik
> >
> >
> > -
> > To unsubscribe, send email to [EMAIL PROTECTED]
> > -
> >
> >
> >
> >
>
> --
>
> -
>  Rightful liberty is unobstructed action according to our will
> within limits drawn around us by the equal rights of others. I do not
> add "within the limits of the law," because law is often but the
> tyrant's will, and always so when it violates the rights of the individual.
>
>  Thomas Jefferson
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] About cloning a table in a new dBase

2007-09-17 Thread drh
"A.J.Millan" <[EMAIL PROTECTED]> wrote:
> 
> Must I use again the complete create sentence to get the desired design in 
> the new table?
> 

yes.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] HELP!!!! LINKING AND LOADING FTS - on Linux/Ubuntu

2007-09-17 Thread Uma Krishnan

Hello,

I'm having trouble loading fts2. I modified makefile to create fts2
 library on Linux/Ubuntu.

When I attempt to load fts2 using the command:
 select load_extension('fts2'), i get the error shared library not found.
( noticed that it had not created the .so file, only .la file.)

I do have the LD_LIBRARY set up correctly. 

What am I doing wrong?

Thanks in advance

Uma


Uma Krishnan <[EMAIL PROTECTED]> wrote: Hello,

I'm having trouble loading fts2. I modified makefile to create fts2 library on 
Linux/Ubuntu.I

When I attempt to load fts2 using the command select load_extension('fts2'), i 
get the error shared library not found.
( noticed that it had not created the .so file, only .la file.)

What am I doing wrong?

Thanks in advance

Uma

Igor Tandetnik  wrote: Kefah T. Issa  wrote:
>> I tried the ordered-urls-insert the results were better, but it is
>> still
>> taking progressively longer time as the number of records increases.
>>
>> A fundamental question to be asked here :
>>
>> Shouldn't the time complexity (Big-O) of the insert operation be
>> constant?

Of  course not. It takes O(log N) to find an appropriate place in the 
index for every new record (where N is the number of records already 
inserted). Also, it generates a lot of disk activity once the index 
grows too large to fit in memory cache.

>> I even did a third test where the integer primary key is not auto
>> increment;
>> the same problem is observed.

The id is not a problem: O(log N) is caused by the index on url.

Igor Tandetnik 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-





Re: [sqlite] 3.4.2 (or 3.5.0) on Solaris 10?

2007-09-17 Thread Tim Bradshaw

On 14 Sep 2007, at 16:20, Tim Bradshaw wrote:

Has anyone successfully got either of these to build on Solaris 10,  
using the gcc that ships with it?  I've tried on 10u4 on x86 and  
(after fixing the known problem with B_FALSE/B_TRUE for 3.4.2) they  
both failed sometime while linking.  I just did a


./configure --prefix/what/ever

with no special options.


To answer my own question, the answer is whether or not to build  
static libraries.  I'm not sure you can at all on Solaris 10, but  
even if you can whatever configure generates can't.  So this works:


../sqlite-3.4.2/configure --prefix=/what/ever --enable-static=no

--tim

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] 3.4.2 (or 3.5.0) on Solaris 10?

2007-09-17 Thread Tim Bradshaw

On 17 Sep 2007, at 05:58, Halton Huo wrote:


I did not build sqlite on Solaris 10, but I do build it on Solaris
Express Community Edition successfully.


Are you using Sun's compiler or gcc?  I guess I could install some  
version of Studio on my host if the former...


--tim

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Multi-User confusion

2007-09-17 Thread Zbigniew Baniewski
On Mon, Sep 17, 2007 at 02:29:58PM +, [EMAIL PROTECTED] wrote:

> http://www.sqlite.org/pragma.html#pragma_locking_mode

Oh, boy... missed entire set of "pragma" commands. Thanks.
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Multi-User confusion

2007-09-17 Thread drh
Zbigniew Baniewski <[EMAIL PROTECTED]> wrote:
> On Sun, Sep 16, 2007 at 08:42:44PM -0700, Joe Wilson wrote:
> 
> > Host a shared database file on computer A, say shared.db.
> > >From computer B, open shared.db remotely and execute "BEGIN EXCLUSIVE;".
> > >From computer C, open shared.db remotely and execute "BEGIN EXCLUSIVE;".
> > If computer C has the error "SQL error: database is locked", then
> > its locking probably works.
> 
> BTW: I'm wondering, if there's a possibility to set in similar manner
> exclusive rights to access the database file for the duration of the entire
> "database session", not just transaction. I mean: when I'm using a program
> which is accessing the database - nobody else has access.

http://www.sqlite.org/pragma.html#pragma_locking_mode

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Multi-User confusion

2007-09-17 Thread Zbigniew Baniewski
On Sun, Sep 16, 2007 at 08:42:44PM -0700, Joe Wilson wrote:

> Host a shared database file on computer A, say shared.db.
> >From computer B, open shared.db remotely and execute "BEGIN EXCLUSIVE;".
> >From computer C, open shared.db remotely and execute "BEGIN EXCLUSIVE;".
> If computer C has the error "SQL error: database is locked", then
> its locking probably works.

BTW: I'm wondering, if there's a possibility to set in similar manner
exclusive rights to access the database file for the duration of the entire
"database session", not just transaction. I mean: when I'm using a program
which is accessing the database - nobody else has access.
-- 
pozdrawiam / regards

Zbigniew Baniewski

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Sqlite insertion performance

2007-09-17 Thread John Machin

On 17/09/2007 1:07 PM, Joe Wilson wrote:

--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:

I have been struggling with the performance of insertion in sqlite.

Here we have a very simple case :

A table with an integer autoincrement primary key and a text  
field that is

unique.

CREATE TABLE my (id PRIMARY KEY, url);

CREATE UNIQUE INDEX myurl ON my(url);


My application requires inserting up to 10 million records in  
batches of

20 thousand records.

For each group of 2 records, first insert them into a TEMP table.
Call the temp table t1.  Then transfer the records to the main table
as follows:

  INSERT OR IGNORE INTO my(url) SELECT url FROM t1 ORDER BY url;


I had no performance improvement with that temp store staging table
technique in my testing - actually it was slower.

  http://www.mail-archive.com/sqlite-users@sqlite.org/msg22143.html

Mind you, the table I was testing against had 4 indexes, whereas the above 
table has 2. I also wasn't using "OR IGNORE". There might be a difference.


Just setting pragma cache_size to a huge value and inserting into
the table normally in large batches resulted in better performance in 
my case. It may have already been mentioned, but having a big 
database page_size value helps minimize the disk writes as well.


A couple of thoughts:

OTTOMH, time to search index is approx O(D * log(K)) where D = depth of 
tree and K = number of keys per block, and K ** D is O(N)


So:

(1) Big block means big K and thus small D

(2) Long keys (like URLs!!) means small K and thus big D

Further on point (2), the OP seems unsure about whether his URLs are 
unique or not. Perhaps storing another column containing a 64-bit hash 
with an index on that column might be the way to go -- shorter key might 
might well outweigh the extra cost of checking for duplicates.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-