Re: [sqlite] how is "pragma threads = 4" working
On Fri, 16 Sep 2016 07:29:28 -0400 Richard Hipp wrote: > The algorithm used for "ORDER BY ... LIMIT N" uses much less memory > than a full-up "ORDER BY" because is only keeps track of the top N > entries seen so far, discarding the rest. But it also only uses a > single thread. My immediate thought was that this is an optimization opportunity. As the OP alludes to, N is the sum of LIMIT and OFFSET. Would you have information on how these are typically used? My guess is that the LIMIT argument is typically small, less than 20, but that OFFSET marches on, and grows to be a significant fraction of the table. If LIMIT N is small and OFFSET is not used, a memory-efficient, nonlocking parallel algorithm would reserve N slots for each thread, and divide the table among the threads, each processing 1/threads rows. Then merge-sort their outputs. Humbly submitted, --jkl ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how is "pragma threads = 4" working
On 08/15/2016 12:02 AM, Венцислав Русев wrote: My computer has 4 cores. I have compile sqlite like this "gcc -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 shell.c sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that "pragma threads = 4" doesn't decrease runtime of the query that sorts 1 milion records. SQLite version 3.8.8 sqlite> pragma threads; 4 sqlite> CREATE TABLE event ( ID INTEGER PRIMARY KEY NOT NULL, date INTEGER NOT NULL, value INTEGER NOT NULL ); sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1; 0|0|0|SCAN TABLE event 0|0|0|USE TEMP B-TREE FOR ORDER BY sqlite> SELECT ID FROM event ORDER BY date LIMIT 1; 4101021 Run Time: real 2.493 user 2.426000 sys 0.049000 sqlite> pragma threads = 0; 0 sqlite> SELECT ID FROM event ORDER BY date LIMIT 1; 4101021 Run Time: real 2.484 user 2.421000 sys 0.044000 To sum it up I have: - multi core cpu; - sqlite that is compiled to use "auxiliary threads"; - a table with 1 million records; - a query that scans through the table, sorts all records and outputs the first ID. sqlite doesn't use these "auxiliary threads" that sqlite docs talks about and the runtime of that query is the same with or without "pragma threads = 4". ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users Some clarifications: - OS: Ubuntu 14.04 LTS; - "PRAGMA compile_options" gives "SYSTEM_MALLOC THREADSAFE=1"; - when sqlite shell is running and no query is running at the moment the OS reports that it has only one thread; On 09/16/2016 11:29 AM, Richard Hipp wrote: The algorithm used for "ORDER BY ... LIMIT N" uses much less memory than a full-up "ORDER BY" because is only keeps track of the top N entries seen so far, discarding the rest. But it also only uses a single thread. If you want multiple threads to be used, you'll need to drop the LIMIT, though I imagine that would defeat your purpose, no? Richard Hipp has a point. I searched the sources and found that the sorting algorithm uses threads only if there isn't a LIMIT clause. To be exact the sources say "This file contains code for the VdbeSorter object, used in concert with a VdbeCursor to sort large numbers of keys for CREATE INDEX statements or by SELECT statements with ORDER BY clauses that cannot be satisfied using indexes and without LIMIT clauses." But I'm wondering what if your query is like so "SELECT ID FROM event ORDER BY date LIMIT 1 OFFSET 100;"? Isn't it better to sort the results using available threads? On 09/16/2016 12:13 PM, Adam Devita wrote: Have you proven that the cpu is the bottleneck? Sorting a million rows seems like a lot, but even older single core cpus may be capable of 2 billion ops per second. [I apologize if this has been sorted out already I've got about 2 days of history on the thread] regards, Adam DeVita This was just a test to see the performance of "PRAGMA threads". Actually I expect to sort 10-30 million rows or more. But after the tests I may choose the index. Just to let you know what other tests I've conducted. sqlite> .output stdout sqlite> pragma threads = 4; sqlite> select ID from event order by date; Run Time: real 2.880 user 4.469000 sys 0.147000 sqlite> pragma threads = 0; sqlite> select ID from event order by date; Run Time: real 3.870 user 3.771000 sys 0.075000 Besides the runtime difference I inspected the process information while the query is running. When "pragma threads = 4;" sqlite shell starts new threads for sorting and stops them after the query has finished. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how is "pragma threads = 4" working
that is a single sql statement though ; it doesn't spread itself amongst threads, the compile option says 'limits number of threads' so if you had an application (not sqlite shell) that had threads it could have a limit? On Fri, Sep 16, 2016 at 5:13 AM, Adam Devita wrote: > Have you proven that the cpu is the bottleneck? Sorting a million rows > seems like a lot, but even older single core cpus may be capable of 2 > billion ops per second. [I apologize if this has been sorted out > already I've got about 2 days of history on the thread] > > regards, > Adam DeVita > > On Fri, Sep 16, 2016 at 7:56 AM, Stephen Chrzanowski > wrote: > > > Although programmatically easily done, from the SQLite point of view, > what > > if that query, sans LIMIT, were a subquery, and the limit was put out on > > the outside? Would the inner query execute, use all the threads, then > > return just one row on the outer? > > > > On Fri, Sep 16, 2016 at 7:29 AM, Richard Hipp wrote: > > > > > On 8/14/16, Венцислав Русев wrote: > > > > My computer has 4 cores. I have compile sqlite like this "gcc > > > > -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 > > shell.c > > > > sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found > that > > > > "pragma threads = 4" doesn't decrease runtime of the query that > sorts 1 > > > > milion records. > > > > > > > > SQLite version 3.8.8 > > > > sqlite> pragma threads; > > > > 4 > > > > sqlite> CREATE TABLE event ( > > > > ID INTEGER PRIMARY KEY NOT NULL, > > > > date INTEGER NOT NULL, > > > > value INTEGER NOT NULL ); > > > > sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT > 1; > > > > > > The algorithm used for "ORDER BY ... LIMIT N" uses much less memory > > > than a full-up "ORDER BY" because is only keeps track of the top N > > > entries seen so far, discarding the rest. But it also only uses a > > > single thread. If you want multiple threads to be used, you'll need > > > to drop the LIMIT, though I imagine that would defeat your purpose, > > > no? > > > > > > -- > > > D. Richard Hipp > > > d...@sqlite.org > > > ___ > > > sqlite-users mailing list > > > sqlite-users@mailinglists.sqlite.org > > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > > > > -- > -- > VerifEye Technologies Inc. > 151 Whitehall Dr. Unit 2 > Markham, ON > L3R 9T1 > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how is "pragma threads = 4" working
Have you proven that the cpu is the bottleneck? Sorting a million rows seems like a lot, but even older single core cpus may be capable of 2 billion ops per second. [I apologize if this has been sorted out already I've got about 2 days of history on the thread] regards, Adam DeVita On Fri, Sep 16, 2016 at 7:56 AM, Stephen Chrzanowski wrote: > Although programmatically easily done, from the SQLite point of view, what > if that query, sans LIMIT, were a subquery, and the limit was put out on > the outside? Would the inner query execute, use all the threads, then > return just one row on the outer? > > On Fri, Sep 16, 2016 at 7:29 AM, Richard Hipp wrote: > > > On 8/14/16, Венцислав Русев wrote: > > > My computer has 4 cores. I have compile sqlite like this "gcc > > > -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 > shell.c > > > sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that > > > "pragma threads = 4" doesn't decrease runtime of the query that sorts 1 > > > milion records. > > > > > > SQLite version 3.8.8 > > > sqlite> pragma threads; > > > 4 > > > sqlite> CREATE TABLE event ( > > > ID INTEGER PRIMARY KEY NOT NULL, > > > date INTEGER NOT NULL, > > > value INTEGER NOT NULL ); > > > sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1; > > > > The algorithm used for "ORDER BY ... LIMIT N" uses much less memory > > than a full-up "ORDER BY" because is only keeps track of the top N > > entries seen so far, discarding the rest. But it also only uses a > > single thread. If you want multiple threads to be used, you'll need > > to drop the LIMIT, though I imagine that would defeat your purpose, > > no? > > > > -- > > D. Richard Hipp > > d...@sqlite.org > > ___ > > sqlite-users mailing list > > sqlite-users@mailinglists.sqlite.org > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > -- -- VerifEye Technologies Inc. 151 Whitehall Dr. Unit 2 Markham, ON L3R 9T1 ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how is "pragma threads = 4" working
Although programmatically easily done, from the SQLite point of view, what if that query, sans LIMIT, were a subquery, and the limit was put out on the outside? Would the inner query execute, use all the threads, then return just one row on the outer? On Fri, Sep 16, 2016 at 7:29 AM, Richard Hipp wrote: > On 8/14/16, Венцислав Русев wrote: > > My computer has 4 cores. I have compile sqlite like this "gcc > > -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 shell.c > > sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that > > "pragma threads = 4" doesn't decrease runtime of the query that sorts 1 > > milion records. > > > > SQLite version 3.8.8 > > sqlite> pragma threads; > > 4 > > sqlite> CREATE TABLE event ( > > ID INTEGER PRIMARY KEY NOT NULL, > > date INTEGER NOT NULL, > > value INTEGER NOT NULL ); > > sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1; > > The algorithm used for "ORDER BY ... LIMIT N" uses much less memory > than a full-up "ORDER BY" because is only keeps track of the top N > entries seen so far, discarding the rest. But it also only uses a > single thread. If you want multiple threads to be used, you'll need > to drop the LIMIT, though I imagine that would defeat your purpose, > no? > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how is "pragma threads = 4" working
On 8/14/16, Венцислав Русев wrote: > My computer has 4 cores. I have compile sqlite like this "gcc > -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 shell.c > sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that > "pragma threads = 4" doesn't decrease runtime of the query that sorts 1 > milion records. > > SQLite version 3.8.8 > sqlite> pragma threads; > 4 > sqlite> CREATE TABLE event ( > ID INTEGER PRIMARY KEY NOT NULL, > date INTEGER NOT NULL, > value INTEGER NOT NULL ); > sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1; The algorithm used for "ORDER BY ... LIMIT N" uses much less memory than a full-up "ORDER BY" because is only keeps track of the top N entries seen so far, discarding the rest. But it also only uses a single thread. If you want multiple threads to be used, you'll need to drop the LIMIT, though I imagine that would defeat your purpose, no? -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] how is "pragma threads = 4" working
On 15 Aug 2016, at 1:02am, Венцислав Русев wrote: > sqlite doesn't use these "auxiliary threads" that sqlite docs talks about and > the runtime of that query is the same with or without "pragma threads = 4". I cannot solve your problem, but this information may help the person who does. Which OS (include rough version number) are you running ? What does "PRAGMA compile_options" say about THREADSAFE ? Let's suppose that SQLite does actually launch those threads but they make no change in the time the command takes for some reason. Does your OS have a process tracker which can show you how many threads are in use ? For instance, in macOS I'd look at the "Activity Monitor.app" main window. In Windows I'd use Task Manager, select the process and click "Properties". You may have to increase the table size to make it take long enough to monitor. I'm just looking for some confirmation apart from the time taken. Simon. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] how is "pragma threads = 4" working
My computer has 4 cores. I have compile sqlite like this "gcc -DSQLITE_MAX_WORKER_THREADS=4 -DSQLITE_DEFAULT_WORKER_THREADS=4 shell.c sqlite3.c -lpthread -ldl -o sqlite3". I made some tests and found that "pragma threads = 4" doesn't decrease runtime of the query that sorts 1 milion records. SQLite version 3.8.8 sqlite> pragma threads; 4 sqlite> CREATE TABLE event ( ID INTEGER PRIMARY KEY NOT NULL, date INTEGER NOT NULL, value INTEGER NOT NULL ); sqlite> EXPLAIN QUERY PLAN SELECT ID FROM event ORDER BY date LIMIT 1; 0|0|0|SCAN TABLE event 0|0|0|USE TEMP B-TREE FOR ORDER BY sqlite> SELECT ID FROM event ORDER BY date LIMIT 1; 4101021 Run Time: real 2.493 user 2.426000 sys 0.049000 sqlite> pragma threads = 0; 0 sqlite> SELECT ID FROM event ORDER BY date LIMIT 1; 4101021 Run Time: real 2.484 user 2.421000 sys 0.044000 To sum it up I have: - multi core cpu; - sqlite that is compiled to use "auxiliary threads"; - a table with 1 million records; - a query that scans through the table, sorts all records and outputs the first ID. sqlite doesn't use these "auxiliary threads" that sqlite docs talks about and the runtime of that query is the same with or without "pragma threads = 4". ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users