Re: [sqlite] Speeding up index creation

2011-09-05 Thread Filip Navara
On Mon, Sep 5, 2011 at 1:18 PM, Jaco Breitenbach
 wrote:
> Dear all,
>
> Any speculation on if and how the new code would affect inserts into an
> indexed table?
>
> Jaco

It doesn't affect them at all. The optimization specifically targets
sorting large amount of data.

Best regards,
Filip Navara
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-05 Thread Jaco Breitenbach
Dear all,

Any speculation on if and how the new code would affect inserts into an
indexed table?

Jaco

On 5 September 2011 09:15, Filip Navara  wrote:

> On Mon, Sep 5, 2011 at 9:39 AM, Baruch Burstein 
> wrote:
> > This is unrelated, but I have never done any serious testing, and am
> > wondering what the user/sys results are in all these performance tests.
> Is
> > user I/O time and sys CPU time?
>
> User is the time spent in user-mode code, ie. mostly SQLite code and
> operating system libraries.
> Sys is the time spent in kernel-mode code, ie. kernel, drivers, I/O
> operatons.
>
> In both cases the values are per-process and thus they should be more
> accurate than wall clock time.
>
> Sum of both values should give the total time spent executing a given
> command.
>
> Best regards,
> Filip Navara
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-05 Thread Filip Navara
On Mon, Sep 5, 2011 at 9:39 AM, Baruch Burstein  wrote:
> This is unrelated, but I have never done any serious testing, and am
> wondering what the user/sys results are in all these performance tests. Is
> user I/O time and sys CPU time?

User is the time spent in user-mode code, ie. mostly SQLite code and
operating system libraries.
Sys is the time spent in kernel-mode code, ie. kernel, drivers, I/O operatons.

In both cases the values are per-process and thus they should be more
accurate than wall clock time.

Sum of both values should give the total time spent executing a given command.

Best regards,
Filip Navara
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-05 Thread Baruch Burstein
This is unrelated, but I have never done any serious testing, and am
wondering what the user/sys results are in all these performance tests. Is
user I/O time and sys CPU time?

On Sun, Sep 4, 2011 at 11:43 AM, Filip Navara wrote:

> Updated results against current trunk:
>
> CPU Time: user 82.009726 sys 129.636831
>
> I rerun the profiler and looked at the results. Most of them looked
> sensible, but one thing still struck me. Once the new index is created
> in the VDBE Sorter it gets copied back into the new Btree using a loop
> with the following OPCodes:
>
>  sqlite3VdbeAddOp2(v, OP_SorterData, iSorter, regRecord);
>  sqlite3VdbeAddOp3(v, OP_IdxInsert, iIdx, regRecord, 1);
>  sqlite3VdbeChangeP5(v, OPFLAG_USESEEKRESULT);
>
> The OPFLAG_USESEEKRESULT is no-op in this case since OP_IsUnique is
> never used in the merge-sort branch of code. OP_IdxInsert always seeks
> for the new Btree row in the resulting tree. Couldn't we exploit the
> fact that the rows are already sorted to construct the Btree faster?
>
> Best regards,
> Filip Navara
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-04 Thread Filip Navara
Updated results against current trunk:

CPU Time: user 82.009726 sys 129.636831

I rerun the profiler and looked at the results. Most of them looked
sensible, but one thing still struck me. Once the new index is created
in the VDBE Sorter it gets copied back into the new Btree using a loop
with the following OPCodes:

  sqlite3VdbeAddOp2(v, OP_SorterData, iSorter, regRecord);
  sqlite3VdbeAddOp3(v, OP_IdxInsert, iIdx, regRecord, 1);
  sqlite3VdbeChangeP5(v, OPFLAG_USESEEKRESULT);

The OPFLAG_USESEEKRESULT is no-op in this case since OP_IsUnique is
never used in the merge-sort branch of code. OP_IdxInsert always seeks
for the new Btree row in the resulting tree. Couldn't we exploit the
fact that the rows are already sorted to construct the Btree faster?

Best regards,
Filip Navara
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-03 Thread Filip Navara
On Fri, Sep 2, 2011 at 6:04 PM, Dan Kennedy  wrote:
> On 09/02/2011 07:32 PM, Filip Navara wrote:
>>
>> On Fri, Sep 2, 2011 at 11:04 AM, Filip Navara
>>  wrote:
>> *snip*
>>>
>>> The time to create an index on my 266 Mb experimental database is more
>>> than 9 minutes.
>>
>> *snip*
>>
>> I erroneously measured the time with DEBUG build, so I've rerun the
>> experiment with several SQLite versions:
>>
>> [2869ed2829] Leaf: Avoid using uninitialized variables after failures
>> in the merge sort code. (user: drh, tags: trunk)
>> CPU Time: user 107.359888 sys 135.050066
>>
>> [7769fb988d] Instead of a temporary b-tree, use a linked-list and
>> merge-sort to sort records in main memory in vdbesort.c. (user: dan,
>> tags: merge-sort)
>> CPU Time: user 118.451559 sys 132.117247
>>
>> [71075673c6] Leaf: If all data being sorted fits in memory, avoid
>> writing any data out to temporary files in vdbesort.c. (user: dan,
>> tags: merge-sort)
>> CPU Time: user 116.813549 sys 132.710051
>
> Thanks for doing this. There is (hopefully) a fix for the performance
> regression up now.

Thanks, it looks much better now:

[68e26c4487] Leaf: The build works again with
-DSQLITE_OMIT_MERGE_SORT. The merge-sorter now avoids spilling to disk
(letting the in-memory linked list grow without bound) if PRAGMA
temp_store=3. (user: drh, tags: merge-sort)
CPU Time: user 86.486954 sys 132.273248

Best regards,
Filip Navara
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-02 Thread Dan Kennedy

On 09/02/2011 07:32 PM, Filip Navara wrote:

On Fri, Sep 2, 2011 at 11:04 AM, Filip Navara  wrote:
*snip*

The time to create an index on my 266 Mb experimental database is more
than 9 minutes.

*snip*

I erroneously measured the time with DEBUG build, so I've rerun the
experiment with several SQLite versions:

[2869ed2829] Leaf: Avoid using uninitialized variables after failures
in the merge sort code. (user: drh, tags: trunk)
CPU Time: user 107.359888 sys 135.050066

[7769fb988d] Instead of a temporary b-tree, use a linked-list and
merge-sort to sort records in main memory in vdbesort.c. (user: dan,
tags: merge-sort)
CPU Time: user 118.451559 sys 132.117247

[71075673c6] Leaf: If all data being sorted fits in memory, avoid
writing any data out to temporary files in vdbesort.c. (user: dan,
tags: merge-sort)
CPU Time: user 116.813549 sys 132.710051


Thanks for doing this. There is (hopefully) a fix for the performance
regression up now.

Dan.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-02 Thread Dan Kennedy

On 09/02/2011 09:24 PM, Simon Slavin wrote:


On 2 Sep 2011, at 10:04am, Filip Navara wrote:


The time to create an index on my 266 Mb experimental database is more
than 9 minutes. The database is available at
http://www.emclient.com/temp/mail_index.zip and the command I use to
create the index is

  CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type",
"address", "parentId");

I had run the shell under profiler


Strangely, on my Mac running the shell tool provided with OS X 10.7.1, SQLite 
3.7.5, there seems to be a problem.  It's still going after more than 2 hours.

Loading your database and running your CREATE INDEX command, the application 
only seems to be using about 1% of one of my CPUs.  I looked to see if it was 
i/o bound instead of CPU bound but it seems only to be reading 320KB/s and my 
computer can handle a lot more than that.  (All above figures from Activity 
Monitor.)


We were just wondering a half hour ago how long this would
take with 3.7.7. Thanks!

Released versions of SQLite build an index by inserting
all values from the indexed column(s) in whatever order
they appear in the table (i.e. unsorted order) into a new
b-tree. This is fine if the index b-tree you are constructing
fits in the cache.

If it doesn't fit in the cache you have a problem. Each
time you go to insert a new entry into the b-tree you have
to find the leaf page that the new entry will be added to.
Since your b-tree doesn't fit in the cache, odds are that
this means reading the page from the file-system. And since
you are inserting in arbitrary order, the page could be
anywhere in the database (or WAL) file. In the worst case,
if your page is not cached in OS memory, you may even have
to shift the disk arm to get at it. Way slow.

The result is that reading data from disk becomes the
bottleneck when writing unsorted values to a b-tree. Hence
your 1% CPU measurement.

The new version uses a merge-sort to sort all the index
entries before it inserts them into the b-tree. This way
it doesn't matter if your b-tree is larger than the cache,
as you are always inserting into the right-most leaf node.
No need to go searching through the file-system/disk for
pages while building the b-tree.

Dan.





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-02 Thread Simon Slavin

On 2 Sep 2011, at 4:38pm, Filip Navara wrote:

> With SQLite 3.7.7 and older the index creation takes eons since the
> file is in WAL mode and the journal file grows uncontrollably.

Yeah.  It was 5.1Gig, and the underlying database was only 266Meg.  Heh.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-02 Thread Filip Navara
On Fri, Sep 2, 2011 at 5:23 PM, Simon Slavin  wrote:
>
> On 2 Sep 2011, at 3:24pm, Simon Slavin wrote:
>
>> On 2 Sep 2011, at 10:04am, Filip Navara wrote:
>>
>>> The time to create an index on my 266 Mb experimental database is more
>>> than 9 minutes. The database is available at
>>> http://www.emclient.com/temp/mail_index.zip and the command I use to
>>> create the index is
>>>
>>> CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type",
>>> "address", "parentId");
>>>
>>> I had run the shell under profiler
>>
>> Strangely, on my Mac running the shell tool provided with OS X 10.7.1, 
>> SQLite 3.7.5, there seems to be a problem.  It's still going after more than 
>> 2 hours.
>
> I correct myself: it had finished.  It just wasn't showing the next prompt, 
> for some reason.  After force-quitting and restarting the shell it showed the 
> index as existing and the index worked.
>
> Simon.

With SQLite 3.7.7 and older the index creation takes eons since the
file is in WAL mode and the journal file grows uncontrollably. Since I
run it on small SSD I was never able to let it finish, it always
filled up the disk first.

Best regards,
Filip Navara
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-02 Thread Simon Slavin

On 2 Sep 2011, at 3:24pm, Simon Slavin wrote:

> On 2 Sep 2011, at 10:04am, Filip Navara wrote:
> 
>> The time to create an index on my 266 Mb experimental database is more
>> than 9 minutes. The database is available at
>> http://www.emclient.com/temp/mail_index.zip and the command I use to
>> create the index is
>> 
>> CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type",
>> "address", "parentId");
>> 
>> I had run the shell under profiler
> 
> Strangely, on my Mac running the shell tool provided with OS X 10.7.1, SQLite 
> 3.7.5, there seems to be a problem.  It's still going after more than 2 hours.

I correct myself: it had finished.  It just wasn't showing the next prompt, for 
some reason.  After force-quitting and restarting the shell it showed the index 
as existing and the index worked.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-02 Thread Simon Slavin

On 2 Sep 2011, at 10:04am, Filip Navara wrote:

> The time to create an index on my 266 Mb experimental database is more
> than 9 minutes. The database is available at
> http://www.emclient.com/temp/mail_index.zip and the command I use to
> create the index is
> 
>  CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type",
> "address", "parentId");
> 
> I had run the shell under profiler

Strangely, on my Mac running the shell tool provided with OS X 10.7.1, SQLite 
3.7.5, there seems to be a problem.  It's still going after more than 2 hours.

Loading your database and running your CREATE INDEX command, the application 
only seems to be using about 1% of one of my CPUs.  I looked to see if it was 
i/o bound instead of CPU bound but it seems only to be reading 320KB/s and my 
computer can handle a lot more than that.  (All above figures from Activity 
Monitor.)

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-02 Thread Filip Navara
On Fri, Sep 2, 2011 at 11:04 AM, Filip Navara  wrote:
*snip*
> The time to create an index on my 266 Mb experimental database is more
> than 9 minutes.
*snip*

I erroneously measured the time with DEBUG build, so I've rerun the
experiment with several SQLite versions:

[2869ed2829] Leaf: Avoid using uninitialized variables after failures
in the merge sort code. (user: drh, tags: trunk)
CPU Time: user 107.359888 sys 135.050066

[7769fb988d] Instead of a temporary b-tree, use a linked-list and
merge-sort to sort records in main memory in vdbesort.c. (user: dan,
tags: merge-sort)
CPU Time: user 118.451559 sys 132.117247

[71075673c6] Leaf: If all data being sorted fits in memory, avoid
writing any data out to temporary files in vdbesort.c. (user: dan,
tags: merge-sort)
CPU Time: user 116.813549 sys 132.710051

Best regards,
Filip Navara
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation?

2008-05-24 Thread Stefan Arentz
On Fri, May 23, 2008 at 6:37 PM, Nemanja Čorlija <[EMAIL PROTECTED]> wrote:
> On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote:
>> I have an interesting problem. I need to generate a large table
>> periodically. The table contains a unique SHA1 hash code and 6 integer
>> values and has about 6 million rows. Generating this table is fast. I
>> can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and
>> slow disk). The thing that takes a (relatively) long time is the index
>> creation on the unique hash code .. 720 seconds.
>>
>> (I'm importing within transactions with chunks of 25.000 records)
>>
>> The question is, is there any way to speed up the indexing proces?
>>
>> Couple of things on my mind:
>>
>> Doing this all on an in-memory database takes about 150 seconds in
>> total. Is it possible to build a database in memory and then dump it
>> to disk?
>>
>> Would it make sense to sort the records that i import? Could that
>> result in a quicker index operation? This certainly helps for Berkeley
>> DB. But it is lower level and you can use it's internal hashing
>> functions on your data to pre-sort it. Maybe SQLite has something
>> similar?
>>
>
> It does make sense to presort records before inserting into on-disk
> db. You should insert into memory db first and then insert sorted
> records into disk db from there. You can batch this into chunks of 25K
> rows as you're doing now. But if you have enough memory, it will
> probably be better to increase number of rows.
> You should also play with increasing cache_size and page_size values.
> Pseudo code:
>
> OPEN :MEMORY:
> ATTACH DISK_DB;
>
> DO
>  BEGIN;
>  INSERT INTO MEMORY_TABLE VALUES(...); x 25.000
>  INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash;
>  COMMIT;
> WHILE has_more_rows

This is a great tip. I implemented the above and it works very well.
The load speed on Ubuntu Hardy went down from an hour+ to 6 minutes.

I'm now experimenting with the pragmas to see if this can go even more faster.

 S.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation?

2008-05-23 Thread Nemanja Čorlija
On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote:
> I have an interesting problem. I need to generate a large table
> periodically. The table contains a unique SHA1 hash code and 6 integer
> values and has about 6 million rows. Generating this table is fast. I
> can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and
> slow disk). The thing that takes a (relatively) long time is the index
> creation on the unique hash code .. 720 seconds.
>
> (I'm importing within transactions with chunks of 25.000 records)
>
> The question is, is there any way to speed up the indexing proces?
>
> Couple of things on my mind:
>
> Doing this all on an in-memory database takes about 150 seconds in
> total. Is it possible to build a database in memory and then dump it
> to disk?
>
> Would it make sense to sort the records that i import? Could that
> result in a quicker index operation? This certainly helps for Berkeley
> DB. But it is lower level and you can use it's internal hashing
> functions on your data to pre-sort it. Maybe SQLite has something
> similar?
>

It does make sense to presort records before inserting into on-disk
db. You should insert into memory db first and then insert sorted
records into disk db from there. You can batch this into chunks of 25K
rows as you're doing now. But if you have enough memory, it will
probably be better to increase number of rows.
You should also play with increasing cache_size and page_size values.
Pseudo code:

OPEN :MEMORY:
ATTACH DISK_DB;

DO
  BEGIN;
  INSERT INTO MEMORY_TABLE VALUES(...); x 25.000
  INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash;
  COMMIT;
WHILE has_more_rows

Search this list's archives for "locality of reference" for more
helpful tips on the subject of building large indexes in SQLite.


> Are there any other tunable options that can improve indexing speed?
>
>  S.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Nemanja Čorlija <[EMAIL PROTECTED]>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation?

2008-05-23 Thread Jay A. Kreibich
On Fri, May 23, 2008 at 10:20:45AM -0400, Stefan Arentz scratched on the wall:
> I have an interesting problem. I need to generate a large table
> periodically. The table contains a unique SHA1 hash code and 6 integer
> values and has about 6 million rows. Generating this table is fast. I
> can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and
> slow disk). The thing that takes a (relatively) long time is the index
> creation on the unique hash code .. 720 seconds.

> Doing this all on an in-memory database takes about 150 seconds in
> total. Is it possible to build a database in memory and then dump it
> to disk?

  Not at this time.

> Are there any other tunable options that can improve indexing speed?

  The biggest thing you can do to improve indexing performance is to
  increase the size of the page cache.  Assuming you're using the
  default 1K page size, each page takes up about 1.5K of RAM in the
  cache.  The default cache size is 2000 pages (3MB), but you should crank
  this up as high as you can while having a reasonable chance of still
  keeping the whole thing in physical RAM... say 75% of your machine's
  total RAM, if you've got a gig or two.  Just be aware that the value
  is the number of pages, not the number of KB.

  The other thing that helps a little is to turn synchronous to off
  while you're creating the index.  This is normally a dangerous thing
  to do, but if you're just dumping data into a database chances are 
  you can re-start from scratch if things go wrong. 

  You may also see some performance from setting the temp_store to memory.



  The biggest single thing is the page cache, however.

  http://www.sqlite.org/pragma.html

  PRAGMA page_cache
  PRAGMA synchronous
  PRAGMA temp_store


   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users