Re: [sqlite] COUNT() extremely slow first time!
Thanks for the responses guys. > I would never have any table with 150 columns. It should be possible to keep the schema for your table in your head. Unfortunately those are the cards I have been dealt. The columns are just buckets of data (bucket1, bucket2, bucket3, etc). Each bucket could be 1 month or 1 weeks worth of data, it can vary from table to table. They can have up to two years worth of data loaded, so that could be 24 monthly buckets or 104 weekly buckets. We don't have any control over the machines this code is running on, so hardware changes are not an option. These machines are definitely less than ideal for what we are trying to get them to do, but what can you do. > Giving this, if the fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm this? I can confirm this, my tests were run on a machine that uses a solid state drive. Side note - how do I become a member of this list? Do my posts always need to be approved? On Fri, Feb 25, 2011 at 4:57 AM, Max Vlasov <max.vla...@gmail.com> wrote: > On Fri, Feb 25, 2011 at 1:51 AM, Greg Barker <fle...@fletchowns.net> > wrote: > > > I ran my tests again, this time setting the cache_size based on the > > page_size. For some reason I wasn't able to replicate the timing I got > > before for the 1024 page size (40.2 seconds) which is really confusing > me, > > since I was careful to make sure nothing was running in the background > > while > > I was testing. > > > > page_size/cache_size: SELECT COUNT time > > 1024/2000: 20.83s > > 4096/500: 14.4s > > 8192/250: 8.8s > > 16384/125: 5.0s > > 32768/62: 3.6s > > > > I'm assuming reducing the cache size like that will definitely have an > > impact on the performance of an actual application? Optimizing > performance > > for an application where both the queries and data can take many > different > > shapes and sizes is beginning to seem like quite a daunting task. What do > > you do if there could be anywhere between 30-150 columns? > > > > > Greg, first, I suggest to forget about sqlite cache for scanning > operations. > This is because the os cache is also used and it's a known fact that they > sometimes store the same data twice, so disabling or decreasing one still > leaves another in effect. > > I saw that the db you have is about 400 MB in size. Giving this, if the > fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm > this? The scanning of such big base with at least 5-years old hd definitely > should be 10 seconds or even slower. Did you reboot your comp this time? By > the way, a faster way to clear the system cache for a particular file is to > "touch" the file with CreateFile(...FILE_FLAG_NO_BUFFERING). > > I tried to emulate your base with a table > CREATE TABLE [TestTable] ([Id] Integer Primary key autoincrement, [a] Text, > [b] Text, [c] Text, [d] Text); > and filling it with 1,000,000 records having a, b,c,d about 100 bytes each. > I saw no big difference between 1024 and 8192 page sizes. When copy the db > to Nul took about 20 seconds, 1024-select count took 25 seconds and 8192 - > 20 seconds. > > Anyway, select count usually is a very slow operation regardless of the > tweaks we can use, almost in every scenario it's equivalent to reading all > the data of the table. I suppose the only exception is if your records is > very big to take advantage of bypassing overflow data. So, probably in a > table with 100 records each containing a 1MB text, 'select count' will be > very fast > > Max Vlasov > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
I ran my tests again, this time setting the cache_size based on the page_size. For some reason I wasn't able to replicate the timing I got before for the 1024 page size (40.2 seconds) which is really confusing me, since I was careful to make sure nothing was running in the background while I was testing. page_size/cache_size: SELECT COUNT time 1024/2000: 20.83s 4096/500: 14.4s 8192/250: 8.8s 16384/125: 5.0s 32768/62: 3.6s I'm assuming reducing the cache size like that will definitely have an impact on the performance of an actual application? Optimizing performance for an application where both the queries and data can take many different shapes and sizes is beginning to seem like quite a daunting task. What do you do if there could be anywhere between 30-150 columns? I'll post the times I got before, when using the default cache_size of 2000: 1024: 40.2s 4096: 15.5s 8192: 8.5s 16384: 5.3s 32768: 3.8s On Thu, Feb 24, 2011 at 11:19 AM, Greg Barker <fle...@fletchowns.net> wrote: > Average payload per entry for my test table was 409.00. Sounds about right > since the db has 4 columns and each is filled with a random string of length > 100. > > I've uploaded the full output from the sqlite3_analyzer for a few different > page sizes: > > http://fletchowns.net/files/1024-analyzer.txt > http://fletchowns.net/files/4096-analyzer.txt > http://fletchowns.net/files/8192-analyzer.txt > > I'm not quite sure how to interpret everything in there. What stands out to > you guys? What is Average Fanout? > > Greg > > > On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasov <max.vla...@gmail.com> wrote: > >> Yes, Greg, please post this value for this table you use in count query >> >> Max >> >> >> On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker <fle...@fletchowns.net> >> wrote: >> >> > Is the record size you refer to here the same as the "Average payload >> per >> > entry" that sqlite3_analyzer determines for me? >> > >> > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov <max.vla...@gmail.com> >> wrote: >> > >> > > Greg, you should also take the record size into account. My hypothesis >> is >> > > that if your record is comparatively small (several fits into 1024) >> the >> > > speed of select count will be the same for any page size (my quick >> tests >> > > confirm this). It's interesting to know what is an average size of >> your >> > > record to understand why the numbers are so different. >> > > >> > > Max >> > >> > >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Average payload per entry for my test table was 409.00. Sounds about right since the db has 4 columns and each is filled with a random string of length 100. I've uploaded the full output from the sqlite3_analyzer for a few different page sizes: http://fletchowns.net/files/1024-analyzer.txt http://fletchowns.net/files/4096-analyzer.txt http://fletchowns.net/files/8192-analyzer.txt I'm not quite sure how to interpret everything in there. What stands out to you guys? What is Average Fanout? Greg On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasov <max.vla...@gmail.com> wrote: > Yes, Greg, please post this value for this table you use in count query > > Max > > > On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker <fle...@fletchowns.net> > wrote: > > > Is the record size you refer to here the same as the "Average payload per > > entry" that sqlite3_analyzer determines for me? > > > > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov <max.vla...@gmail.com> > wrote: > > > > > Greg, you should also take the record size into account. My hypothesis > is > > > that if your record is comparatively small (several fits into 1024) the > > > speed of select count will be the same for any page size (my quick > tests > > > confirm this). It's interesting to know what is an average size of your > > > record to understand why the numbers are so different. > > > > > > Max > > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
Is the record size you refer to here the same as the "Average payload per entry" that sqlite3_analyzer determines for me? On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasovwrote: > Greg, you should also take the record size into account. My hypothesis is > that if your record is comparatively small (several fits into 1024) the > speed of select count will be the same for any page size (my quick tests > confirm this). It's interesting to know what is an average size of your > record to understand why the numbers are so different. > > Max > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
I'm currently dealing with a similar issue. I've found that the page_size PRAGMA setting can have a dramatic effect on how long it takes to "warm up" the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes 8.5 seconds. This was done with a reboot between each test. This page recommends a page_size of 4096: http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows If I see better performance with the larger page sizes (going to test 16384 and beyond after this) is there any reason not to use them? Greg On Mon, Feb 21, 2011 at 4:37 PM, Stephen Oberholtzer < oliverkloz...@gmail.com> wrote: > On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxsonwrote: > > On Mon, Feb 21, 2011 at 11:05 AM, Sven L wrote: > >> > >> Thank you for your detailed explanation! > >> First, can you please tell me how to purge the cache in Windows 7? This > could be very useful for my tests! > > > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the > > development tools. On Linux, you do: echo 3 > > > /prog/sys/vm/drop_caches > > Just make sure you either (a) quote the 3 (echo '3' > > /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >. > If you don't quote it, and you don't put the space in (echo > 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I > won't go into. > > -- > -- Stevie-O > Real programmers use COPY CON PROGRAM.EXE > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users