Re: [sqlite] Threading makes SQLite 3x slower??

2013-08-20 Thread Seth Price
So far I've sped it up by 2x-3x by increasing the cache size by 10x, removing 
some extraneous SQL, and moving a bit of processing in-app. But using the full 
dataset with two threads instead of one still results in a 3x-5x time increase 
with either THREADSAFE=1 or 2. The fastest is single-threaded with 
THREADSAFE=0. My prepared queries now look like this:

SELECT class FROM data_r JOIN data USING (rowNum) WHERE ?1  col0min AND 
col0max  ?2 AND ?3  col1min AND col1max  ?4 AND ?5  col2min AND col2max  
?6 AND ?7  col3min AND col3max  ?8 AND ?9  col4min AND col4max  ?10 AND ?11 
 col5 AND col5  ?12;

Am I configuring locking right?

I have 17 million queries like the above. Since they are all read-only, I 
shouldn't be taking such a hit. Every thread has it's own DB connection and 
prepared statements, so THREADSAFE=2 should work fine (as I understand it). 
Instead it destroys performance.
Thanks,
Seth


On Aug 4, 2011, at 12:36 PM, Seth Price wrote:

 Removing COUNT(*) AS count and GROUP BY class and doing it in-program 
 shaved ~10% off of the time. I'll keep it. :)
 ~Seth
 
 On Aug 4, 2011, at 11:30 AM, Eduardo Morras wrote:
 
 
 Oks, another let's try another thing/think.
 
 Try the select without the COUNT(*):
 
 SELECT class FROM data_r JOIN data USING (rowNum) WHERE 57  col0min 
 AND col0max  61 AND 52  col1min AND col1max  56 AND 66  col2min 
 AND col2max  70 AND 88  col3min AND col3max  92 AND 133  col4min 
 AND col4max  137 AND 57  col0 AND col0  61 AND 52  col1 AND col1 
  56 AND 66  col2 AND col2  70 AND 88  col3 AND col3  92 AND 133 
  col4 AND col4  137 AND 81  col5 AND col5  85 GROUP BY class;
 
 In some rdbms (don' know in sqlite), count, avg, sum, etc... implies 
 a table scan, making the select very slow.
 
 HTH
 
 
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
 
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
 You didn't show your timing results or say what kind of machine you're 
 running on.


I'm running on a 2.26 GHz Mac Pro with 8 physical cores and 16 GB of RAM.

 SQL error (635): near ORDER: syntax error


You're probably seeing that error because you need to recompile with 
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT. You would need to rebuild the parser, 
but I don't think it affects the results that we're looking at. So ignore the 
error. :)

 What I would do is one query and split the results of that query between 
 threads.


As is, the result from each query is ~1-3 rows that don't require much 
processing. A profiler says I spend about 99% of my time in SQLite.
~Seth

On Aug 4, 2011, at 6:33 AM, Black, Michael (IS) wrote:

 You didn't show your timing results or say what kind of machine you're 
 running on.
 I'm also seeing
 Calculating Subset Sample...
 SQL error (635): near ORDER: syntax error
 
 I'm running Linux, sqlite 3.7.5, E5520 2.27Ghz Intel with 16 threads possible.
 
 Here's my timings just on the first few rows of your timing output shows 2 
 threads is a touch better.
 1 thread
 ( 0) Rows Proc: 5000 Avg: 0.00144 Elapsed:   7.20
 ( 0) Rows Proc:1 Avg: 0.00104 Elapsed:  10.37
 ( 0) Rows Proc:15000 Avg: 0.00098 Elapsed:  14.77
 ( 0) Rows Proc:2 Avg: 0.00092 Elapsed:  18.36
 ( 0) Rows Proc:25000 Avg: 0.00084 Elapsed:  21.08
 
 2 threads
 ( 1) Rows Proc: 5000 Avg: 0.00103 Elapsed:   5.20
 ( 1) Rows Proc:1 Avg: 0.00104 Elapsed:  10.45
 ( 1) Rows Proc:15000 Avg: 0.00103 Elapsed:  15.53
 ( 0) Rows Proc: 5000 Avg: 0.00419 Elapsed:  20.96
 ( 1) Rows Proc:2 Avg: 0.00106 Elapsed:  21.23
 
 3 threads
 ( 1) Rows Proc: 5000 Avg: 0.00536 Elapsed:  26.91
 ( 2) Rows Proc: 5000 Avg: 0.00598 Elapsed:  30.05
 ( 1) Rows Proc:1 Avg: 0.00535 Elapsed:  53.67
 ( 2) Rows Proc:1 Avg: 0.00612 Elapsed:  61.36
 ( 1) Rows Proc:15000 Avg: 0.00537 Elapsed:  80.75
 
 What I would do is one query and split the results of that query between 
 threads.
 
 You can use OpenMP to do that.
 
 
 Michael D. Black
 Senior Scientist
 NG Information Systems
 Advanced Analytics Directorate
 
 
 
 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Seth Price [s...@pricepages.org]
 Sent: Wednesday, August 03, 2011 9:07 PM
 To: sqlite-users@sqlite.org
 Subject: EXT :[sqlite] Threading makes SQLite 3x slower??
 
 
 Hey all, I have a small DB bound application that I'm working on. I expect 
 that it will take 10-20 days to run when I'm done with it. So I'm trying to 
 make it multithreaded. But after spending all afternoon getting threading 
 going, it runs on the order of 3x slower per query when I go from one to two 
 threads.
 
 Is this expected?
 
 The full DB has around 17 million rows in it, and for each row I am trying to 
 count all rows with similar characteristics, divided into different 
 classifications. I was already able to improve speed 10x by using the R*tree 
 extension to narrow my search. My queries look like this:
 
 SELECT class, COUNT(*) AS count FROM data_r JOIN data USING (rowNum) WHERE 57 
  col0min AND col0max  61 AND 52  col1min AND col1max  56 AND 66  col2min 
 AND col2max  70 AND 88  col3min AND col3max  92 AND 133  col4min AND 
 col4max  137 AND 57  col0 AND col0  61 AND 52  col1 AND col1  56 AND 66 
  col2 AND col2  70 AND 88  col3 AND col3  92 AND 133  col4 AND col4  
 137 AND 81  col5 AND col5  85 GROUP BY class;
 
 They take around 0.04 seconds each (times 17 million). I thought that I 
 should be able to access the DB from each thread without penalty because they 
 are read-only queries. I also tried making copies of the DB file to access a 
 different file with each thread (also slower). Oddly, there are no frequent 
 disk access while I run the program, so there must be caching somewhere.
 
 How can I make threading work? Download the example set and code here:
 http://seth.bluezone.usu.edu/sqlite/
 
 There is info on how I compile and run the program in the header. Use 
 NUM_THREADS to change the number. The stats on which thread is taking how 
 much time is printed under Pruning Conflicting Examples... and the 
 threading code is under pruneEx(ExPtr ex). I'm on a Mac Pro running 10.6.8.
 
 Thoughts? Comments? Ideas?
 
 Thanks,
 Seth
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
On Aug 4, 2011, at 1:54 AM, Eduardo Morras wrote:
 
 They block each other not trying to get a lock, but trying to get 
 access to disk and cache. Disk access time and cache is shared 
 between all threads and if all threads needs access to different 
 parts of the db they will figth like barbarians, you should convert 
 them to roman cohorts instead. You can make your page cache size 
 bigger using pragmas, check maillist and documentation.

Using a 10x larger temp cache and default cache at compile time increases my 
runtime of that section from 166 to 167. I'm using two threads still.

 You don't need to update your table, you can use a temp memory table 
 for that with candidates. First select candidates where 57col0min, 
 then you delete candidates from temp table where col0min62 and so 
 on. You don't need to copy the original table schema for temp table, 
 use pk only. The select of the first candidates is important, use the 
 one that minimizes the number of initial data.

I was hoping that SQLite was doing that with the JOIN statement. It's narrowing 
down the results with the query from the R*table (col*min  col*max). A large 
query will have ~14k rows at this point. Then JOINs with the original table 
(data) and it's narrowing results further with the remaining part of the WHERE 
(col*). The large query will have ~11k rows. After this it groups by the 
'class' column in 'data' and counts how many rows per class.

I'm really skeptical that I can chop this up into many queries to make it run 
faster. I've always assumed that if I can get final data back from a query, 
then it's exactly the query I'm looking for. I would think that running three 
queries would use three times the resources.
Thanks,
Seth
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
 With SQLITE_THREADSAFE=2
 with SQLITE_THREADSAFE=1

With 2 threads and THREADSAFE=2 I get 167 seconds.
With 2 threads and THREADSAFE=1, I get 177 seconds.
With 1 thread and THREADSAFE=1, I get 53 seconds.
With 1 thread and THREADSAFE=2, I get 52 seconds.

One thing I'm worried about is I have two differing definitions of 
SQLITE_THREADSAFE when I compile via the command line with 
-DSQLITE_THREADSAFE=2, so I'm worried that one is clobbering the other. Here 
is a clip from the compile log, you can see *both* THREADSAFE=2 and 
THREADSAFE=1:

 gcc -DPACKAGE_NAME=\sqlite\ -DPACKAGE_TARNAME=\sqlite\ 
-DPACKAGE_VERSION=\3.7.7.1\ -DPACKAGE_STRING=\sqlite 3.7.7.1\ 
-DPACKAGE_BUGREPORT=\http://www.sqlite.org\; -DPACKAGE_URL=\\ 
-DPACKAGE=\sqlite\ -DVERSION=\3.7.7.1\ -DSTDC_HEADERS=1 
-DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
-DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DHAVE_FDATASYNC=1 -DHAVE_USLEEP=1 
-DHAVE_LOCALTIME_R=1 -DHAVE_GMTIME_R=1 -DHAVE_DECL_STRERROR_R=1 
-DHAVE_STRERROR_R=1 -DHAVE_READLINE=1 -I.@am__isrc@ -D_REENTRANT=1 
-DSQLITE_THREADSAFE=1 -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_RTREE -O2 
-DSQLITE_ENABLE_UPDATE_DELETE_LIMIT -DSQLITE_ENABLE_RTREE -DSQLITE_THREADSAFE=2 
-DSQLITE_OPEN_NOMUTEX -DSQLITE_ENABLE_LOCKING_STYLE 
-DSQLITE_DEFAULT_TEMP_CACHE_SIZE=5000 -DSQLITE_DEFAULT_CACHE_SIZE=2 -MT 
sqlite3.lo -MD -MP -MF .deps/sqlite3.Tpo -c sqlite3.c  -fno-common -DPIC -o 
.libs/sqlite3.o
command-line: warning: SQLITE_THREADSAFE redefined
command-line: warning: this is the location of the previous definition


 The way you're splitting your work is bad.  You can see that your row counts 
 for 1 thread are 3 and for 2 threads 45,000 so you're not gaining a lot.  
 3 threads is 90,000 so you've tripled your workload totally defeating 
 multi-threading.

When I run with 1 thread, it queries all 99k examples. When I use two threads, 
each thread query 45k of the examples. Three threads query 33k of the examples 
each. The total should always be 99k examples, which is what I'm seeing on my 
machine.

Thanks,
Seth

On Aug 4, 2011, at 6:36 AM, Black, Michael (IS) wrote:

 Also..
 
 .
 
 Your elapsed time is using clock() which tells you processor time.
 
 
 
 With SQLITE_THREADSAFE=2
 
 59.855 - 1 thread
 
 49.535 - 2 threads
 
 92.789 - 3 threads
 
 
 
 with SQLITE_THREADSAFE=1
 
 61.146 - 1 thread
 
 49.568 - 2 threads
 
 64.932 - 3 threads
 
 
 
 The way you're splitting your work is bad.  You can see that your row counts 
 for 1 thread are 3 and for 2 threads 45,000 so you're not gaining a lot.  
 3 threads is 90,000 so you've tripled your workload totally defeating 
 multi-threading.
 
 
 
 
 
 
 
 
 
 Michael D. Black
 
 Senior Scientist
 
 NG Information Systems
 
 Advanced Analytics Directorate
 
 
 
 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Seth Price [s...@pricepages.org]
 Sent: Wednesday, August 03, 2011 9:07 PM
 To: sqlite-users@sqlite.org
 Subject: EXT :[sqlite] Threading makes SQLite 3x slower??
 
 Hey all, I have a small DB bound application that I'm working on. I expect 
 that it will take 10-20 days to run when I'm done with it. So I'm trying to 
 make it multithreaded. But after spending all afternoon getting threading 
 going, it runs on the order of 3x slower per query when I go from one to two 
 threads.
 
 Is this expected?
 
 The full DB has around 17 million rows in it, and for each row I am trying to 
 count all rows with similar characteristics, divided into different 
 classifications. I was already able to improve speed 10x by using the R*tree 
 extension to narrow my search. My queries look like this:
 
 SELECT class, COUNT(*) AS count FROM data_r JOIN data USING (rowNum) WHERE 57 
  col0min AND col0max  61 AND 52  col1min AND col1max  56 AND 66  col2min 
 AND col2max  70 AND 88  col3min AND col3max  92 AND 133  col4min AND 
 col4max  137 AND 57  col0 AND col0  61 AND 52  col1 AND col1  56 AND 66 
  col2 AND col2  70 AND 88  col3 AND col3  92 AND 133  col4 AND col4  
 137 AND 81  col5 AND col5  85 GROUP BY class;
 
 They take around 0.04 seconds each (times 17 million). I thought that I 
 should be able to access the DB from each thread without penalty because they 
 are read-only queries. I also tried making copies of the DB file to access a 
 different file with each thread (also slower). Oddly, there are no frequent 
 disk access while I run the program, so there must be caching somewhere.
 
 How can I make threading work? Download the example set and code here:
 http://seth.bluezone.usu.edu/sqlite/
 
 There is info on how I compile and run the program in the header. Use 
 NUM_THREADS to change the number. The stats on which thread is taking how 
 much time is printed under Pruning Conflicting Examples... and the 
 threading code is under pruneEx(ExPtr ex). I'm on a Mac Pro running 10.6.8

Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
 Any threads or processes would still be in contention for the same resource: 
 access to that file on disk.


Midway through the run, `iostat` is showing me exactly 0 disk accesses, so I'm 
assuming it's a locking problem and not a physical I/O bottleneck. I'm hoping 
that you folks could give me an idea of what I'm doing wrong. (It could also be 
a problem with the Mac OS locking.)

 The best indexes to create will depend on whether this is a one-time job or 
 whether you're going to have to do it repeatedly with similar or completely 
 different data.


For each row, I'm making a query for similar rows. The full DB is 17 million 
rows. I've tried adding all the indexes I can think of, but the only one that 
helps this query is the R*tree index. They were all simple indices on one or 
more columns, so if you have an idea on a more complex index, I'd apply it and 
test it out.

~Seth


On Aug 4, 2011, at 9:54 AM, Simon Slavin wrote:

 
 On 4 Aug 2011, at 4:44pm, Seth Price wrote:
 
 I was hoping that SQLite was doing that with the JOIN statement. It's 
 narrowing down the results with the query from the R*table (col*min  
 col*max). A large query will have ~14k rows at this point. Then JOINs with 
 the original table (data) and it's narrowing results further with the 
 remaining part of the WHERE (col*). The large query will have ~11k rows. 
 After this it groups by the 'class' column in 'data' and counts how many 
 rows per class.
 
 I'm really skeptical that I can chop this up into many queries to make it 
 run faster. I've always assumed that if I can get final data back from a 
 query, then it's exactly the query I'm looking for. I would think that 
 running three queries would use three times the resources.
 
 Threading is a red herring.  Any threads or processes would still be in 
 contention for the same resource: access to that file on disk.  This is what 
 caused the observation in your 'Subject' header.  All the threads are doing 
 is waiting for another thread to get out the way.
 
 Speed increases can come from adding one or more indexes.  The best indexes 
 to create will depend on whether this is a one-time job or whether you're 
 going to have to do it repeatedly with similar or completely different data.
 
 Simon.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
I tried one index on all columns originally, but it didn't help nearly as much 
as the R*tree.
~Seth


On Aug 4, 2011, at 10:26 AM, Simon Slavin wrote:

 
 On 4 Aug 2011, at 5:13pm, Seth Price wrote:
 
 They were all simple indices on one or more columns, so if you have an idea 
 on a more complex index, I'd apply it and test it out.
 
 That's not going to be much use, is it ?  A single SELECT which tests all 
 those different columns can only use one index, and if it chooses any of 
 those indexes it will help only with one column.
 
 Seems like you need one index on all the columns mentioned in your SELECT.  
 But save yourself some time and effort and try one with just five of the 
 columns first.  See if that speeds things up.  Delete all the indexes on just 
 one column.
 
 Simon.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
Removing COUNT(*) AS count and GROUP BY class and doing it in-program 
shaved ~10% off of the time. I'll keep it. :)
~Seth

On Aug 4, 2011, at 11:30 AM, Eduardo Morras wrote:

 
 Oks, another let's try another thing/think.
 
 Try the select without the COUNT(*):
 
 SELECT class FROM data_r JOIN data USING (rowNum) WHERE 57  col0min 
 AND col0max  61 AND 52  col1min AND col1max  56 AND 66  col2min 
 AND col2max  70 AND 88  col3min AND col3max  92 AND 133  col4min 
 AND col4max  137 AND 57  col0 AND col0  61 AND 52  col1 AND col1 
  56 AND 66  col2 AND col2  70 AND 88  col3 AND col3  92 AND 133 
  col4 AND col4  137 AND 81  col5 AND col5  85 GROUP BY class;
 
 In some rdbms (don' know in sqlite), count, avg, sum, etc... implies 
 a table scan, making the select very slow.
 
 HTH
 
 
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
Those fields span two tables.

~Seth


via iPhone

On Aug 4, 2011, at 2:45 PM, Simon Slavin slav...@bigfraud.org wrote:

 
 On 4 Aug 2011, at 8:40pm, Seth Price wrote:
 
 SELECT class FROM data_r JOIN data USING (rowNum) WHERE ?1  col0min AND 
 col0max  ?2 AND ?3  col1min AND col1max  ?4 AND ?5  col2min AND col2max 
  ?6 AND ?7  col3min AND col3max  ?8 AND ?9  col4min AND col4max  ?10 
 AND ?11  col5 AND col5  ?12;
 
 Create one index on all these fields:
 
 (col0max,col1min,col1max,col2min,col2max,col3min,col3max,col4min,col4max,col5)
 
 Simon.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-04 Thread Seth Price
The table with the first five columns is the R*tree table, and thus already has 
an index. I just tried adding an index to col5 also, but no change in 
performance.
~Seth 


On Aug 4, 2011, at 3:27 PM, Simon Slavin wrote:

 
 On 4 Aug 2011, at 9:59pm, Seth Price wrote:
 
 Those fields span two tables.
 
 Then create one index for each table.
 
 Simon.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Threading makes SQLite 3x slower??

2011-08-03 Thread Seth Price
Hey all, I have a small DB bound application that I'm working on. I expect that 
it will take 10-20 days to run when I'm done with it. So I'm trying to make it 
multithreaded. But after spending all afternoon getting threading going, it 
runs on the order of 3x slower per query when I go from one to two threads.

Is this expected?

The full DB has around 17 million rows in it, and for each row I am trying to 
count all rows with similar characteristics, divided into different 
classifications. I was already able to improve speed 10x by using the R*tree 
extension to narrow my search. My queries look like this:

SELECT class, COUNT(*) AS count FROM data_r JOIN data USING (rowNum) WHERE 57  
col0min AND col0max  61 AND 52  col1min AND col1max  56 AND 66  col2min AND 
col2max  70 AND 88  col3min AND col3max  92 AND 133  col4min AND col4max  
137 AND 57  col0 AND col0  61 AND 52  col1 AND col1  56 AND 66  col2 AND 
col2  70 AND 88  col3 AND col3  92 AND 133  col4 AND col4  137 AND 81  
col5 AND col5  85 GROUP BY class;

They take around 0.04 seconds each (times 17 million). I thought that I should 
be able to access the DB from each thread without penalty because they are 
read-only queries. I also tried making copies of the DB file to access a 
different file with each thread (also slower). Oddly, there are no frequent 
disk access while I run the program, so there must be caching somewhere.

How can I make threading work? Download the example set and code here:
http://seth.bluezone.usu.edu/sqlite/

There is info on how I compile and run the program in the header. Use 
NUM_THREADS to change the number. The stats on which thread is taking how much 
time is printed under Pruning Conflicting Examples... and the threading code 
is under pruneEx(ExPtr ex). I'm on a Mac Pro running 10.6.8.

Thoughts? Comments? Ideas?

Thanks,
Seth
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Threading makes SQLite 3x slower??

2011-08-03 Thread Seth Price
I was hoping they wouldn't block each other because it's a read lock. I tried 
making an index on all the columns, but R*tree table + JOIN that I'm using runs 
about 10x faster. I might have done something wrong, so I'm open to suggestions 
on a better index, though.

I don't think that UPDATEing 99.99% of 17 million columns is going to run 
faster than the pure select statement that I have. Each select statement is 
fairly random, a small selection out of an N-dimensional space.

Am I understanding you? Any other ideas?
~Seth


On Aug 3, 2011, at 8:43 PM, Simon Slavin wrote:

 
 On 4 Aug 2011, at 3:07am, Seth Price wrote:
 
 The full DB has around 17 million rows in it, and for each row I am trying 
 to count all rows with similar characteristics, divided into different 
 classifications. I was already able to improve speed 10x by using the R*tree 
 extension to narrow my search. My queries look like this:
 
 SELECT class, COUNT(*) AS count FROM data_r JOIN data USING (rowNum) WHERE 
 57  col0min AND col0max  61 AND 52  col1min AND col1max  56 AND 66  
 col2min AND col2max  70 AND 88  col3min AND col3max  92 AND 133  col4min 
 AND col4max  137 AND 57  col0 AND col0  61 AND 52  col1 AND col1  56 
 AND 66  col2 AND col2  70 AND 88  col3 AND col3  92 AND 133  col4 AND 
 col4  137 AND 81  col5 AND col5  85 GROUP BY class;
 
 Forget multithreading.  You have a bottleneck of accessing the database file 
 and the threads will just block each other.
 
 There're two ways to make that SELECT run quickly.  One is to create one 
 index on all those columns.
 
 The other is to do the SELECT in parts, by disqualifying records you don't 
 want.  Create another column called, perhaps, notThisOne.  Then do a 
 succession of disqualifying checks:
 
 UPDATE data SET notThisOne=1 WHERE col0min=57 OR col0min=61;
 
 etc.  Then all the rows you didn't disqualify are the ones you want.  
 Obviously, separate indices on each of those columns will speed up the 
 disqualifying check.
 
 Simon.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users