Re: [sqlite] COUNT() extremely slow first time!

2011-02-26 Thread Greg Barker
Thanks for the responses guys.

> I would never have any table with 150 columns.  It should be possible to
keep the schema for your table in your head.

Unfortunately those are the cards I have been dealt. The columns are just
buckets of data (bucket1, bucket2, bucket3, etc). Each bucket could be 1
month or 1 weeks worth of data, it can vary from table to table. They can
have up to two years worth of data loaded, so that could be 24 monthly
buckets or 104 weekly buckets.

We don't have any control over the machines this code is running on, so
hardware changes are not an option. These machines are definitely less than
ideal for what we are trying to get them to do, but what can you do.

> Giving this, if the fastest is 3.6 seconds, you have a very fast
harddrive. Can you confirm this?

I can confirm this, my tests were run on a machine that uses a solid state
drive.

Side note - how do I become a member of this list? Do my posts always need
to be approved?

On Fri, Feb 25, 2011 at 4:57 AM, Max Vlasov <max.vla...@gmail.com> wrote:

> On Fri, Feb 25, 2011 at 1:51 AM, Greg Barker <fle...@fletchowns.net>
> wrote:
>
> > I ran my tests again, this time setting the cache_size based on the
> > page_size. For some reason I wasn't able to replicate the timing I got
> > before for the 1024 page size (40.2 seconds) which is really confusing
> me,
> > since I was careful to make sure nothing was running in the background
> > while
> > I was testing.
> >
> > page_size/cache_size: SELECT COUNT time
> > 1024/2000: 20.83s
> > 4096/500: 14.4s
> > 8192/250: 8.8s
> > 16384/125: 5.0s
> > 32768/62: 3.6s
> >
> > I'm assuming reducing the cache size like that will definitely have an
> > impact on the performance of an actual application? Optimizing
> performance
> > for an application where both the queries and data can take many
> different
> > shapes and sizes is beginning to seem like quite a daunting task. What do
> > you do if there could be anywhere between 30-150 columns?
> >
> >
> Greg, first, I suggest to forget about sqlite cache for scanning
> operations.
> This is because the os cache is also used and it's a known fact that they
> sometimes store the same data twice, so disabling or decreasing one still
> leaves another in effect.
>
> I saw that the db you have is about 400 MB in size. Giving this, if the
> fastest is 3.6 seconds, you have a very fast harddrive. Can you confirm
> this? The scanning of such big base with at least 5-years old hd definitely
> should be 10 seconds or even slower. Did you reboot your comp this time? By
> the way, a faster way to clear the system cache for a particular file is to
> "touch" the file with CreateFile(...FILE_FLAG_NO_BUFFERING).
>
> I tried to emulate your base with a table
> CREATE TABLE [TestTable] ([Id] Integer Primary key autoincrement, [a] Text,
> [b] Text, [c] Text, [d] Text);
> and filling it with 1,000,000 records having a, b,c,d about 100 bytes each.
> I saw no big difference between 1024 and 8192 page sizes. When copy the db
> to Nul took about 20 seconds, 1024-select count took 25 seconds and 8192 -
> 20 seconds.
>
> Anyway, select count usually is a very slow operation regardless of the
> tweaks we can use, almost in every scenario it's equivalent to reading all
> the data of the table. I suppose the only exception is if your records is
> very big to take advantage of bypassing overflow data. So, probably in a
> table with 100 records each containing a 1MB text, 'select count' will be
> very fast
>
> Max Vlasov
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-25 Thread Greg Barker
I ran my tests again, this time setting the cache_size based on the
page_size. For some reason I wasn't able to replicate the timing I got
before for the 1024 page size (40.2 seconds) which is really confusing me,
since I was careful to make sure nothing was running in the background while
I was testing.

page_size/cache_size: SELECT COUNT time
1024/2000: 20.83s
4096/500: 14.4s
8192/250: 8.8s
16384/125: 5.0s
32768/62: 3.6s

I'm assuming reducing the cache size like that will definitely have an
impact on the performance of an actual application? Optimizing performance
for an application where both the queries and data can take many different
shapes and sizes is beginning to seem like quite a daunting task. What do
you do if there could be anywhere between 30-150 columns?

I'll post the times I got before, when using the default cache_size of 2000:
1024: 40.2s
4096: 15.5s
8192: 8.5s
16384: 5.3s
32768: 3.8s

On Thu, Feb 24, 2011 at 11:19 AM, Greg Barker <fle...@fletchowns.net> wrote:

> Average payload per entry for my test table was 409.00. Sounds about right
> since the db has 4 columns and each is filled with a random string of length
> 100.
>
> I've uploaded the full output from the sqlite3_analyzer for a few different
> page sizes:
>
> http://fletchowns.net/files/1024-analyzer.txt
> http://fletchowns.net/files/4096-analyzer.txt
> http://fletchowns.net/files/8192-analyzer.txt
>
> I'm not quite sure how to interpret everything in there. What stands out to
> you guys? What is Average Fanout?
>
> Greg
>
>
> On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasov <max.vla...@gmail.com> wrote:
>
>> Yes, Greg, please post this value for this table you use in count query
>>
>> Max
>>
>>
>> On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker <fle...@fletchowns.net>
>> wrote:
>>
>> > Is the record size you refer to here the same as the "Average payload
>> per
>> > entry" that sqlite3_analyzer determines for me?
>> >
>> > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov <max.vla...@gmail.com>
>> wrote:
>> >
>> > > Greg, you should also take the record size into account. My hypothesis
>> is
>> > > that if your record is comparatively small (several fits into 1024)
>> the
>> > > speed of select count will be the same for any page size (my quick
>> tests
>> > > confirm this). It's interesting to know what is an average size of
>> your
>> > > record to understand why the numbers are so different.
>> > >
>> > > Max
>> >
>> >
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-25 Thread Greg Barker
Average payload per entry for my test table was 409.00. Sounds about right
since the db has 4 columns and each is filled with a random string of length
100.

I've uploaded the full output from the sqlite3_analyzer for a few different
page sizes:

http://fletchowns.net/files/1024-analyzer.txt
http://fletchowns.net/files/4096-analyzer.txt
http://fletchowns.net/files/8192-analyzer.txt

I'm not quite sure how to interpret everything in there. What stands out to
you guys? What is Average Fanout?

Greg

On Thu, Feb 24, 2011 at 4:28 AM, Max Vlasov <max.vla...@gmail.com> wrote:

> Yes, Greg, please post this value for this table you use in count query
>
> Max
>
>
> On Wed, Feb 23, 2011 at 9:58 PM, Greg Barker <fle...@fletchowns.net>
> wrote:
>
> > Is the record size you refer to here the same as the "Average payload per
> > entry" that sqlite3_analyzer determines for me?
> >
> > On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov <max.vla...@gmail.com>
> wrote:
> >
> > > Greg, you should also take the record size into account. My hypothesis
> is
> > > that if your record is comparatively small (several fits into 1024) the
> > > speed of select count will be the same for any page size (my quick
> tests
> > > confirm this). It's interesting to know what is an average size of your
> > > record to understand why the numbers are so different.
> > >
> > > Max
> >
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-24 Thread Greg Barker
Is the record size you refer to here the same as the "Average payload per
entry" that sqlite3_analyzer determines for me?

On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov  wrote:

> Greg, you should also take the record size into account. My hypothesis is
> that if your record is comparatively small (several fits into 1024) the
> speed of select count will be the same for any page size (my quick tests
> confirm this). It's interesting to know what is an average size of your
> record to understand why the numbers are so different.
>
> Max
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Greg Barker
I'm currently dealing with a similar issue. I've found that the page_size
PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
8.5 seconds. This was done with a reboot between each test.

This page recommends a page_size of 4096:
http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows

If I see better performance with the larger page sizes (going to test 16384
and beyond after this) is there any reason not to use them?

Greg

On Mon, Feb 21, 2011 at 4:37 PM, Stephen Oberholtzer <
oliverkloz...@gmail.com> wrote:

> On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxson  wrote:
> > On Mon, Feb 21, 2011 at 11:05 AM, Sven L  wrote:
> >>
> >> Thank you for your detailed explanation!
> >> First, can you please tell me how to purge the cache in Windows 7? This
> could be very useful for my tests!
> >
> > Sorry, dunno for Windows.  On Mac OSX it is the purge command, in the
> > development tools.  On Linux, you do: echo 3 >
> > /prog/sys/vm/drop_caches
>
> Just make sure you either (a) quote the 3 (echo '3' >
> /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >.
> If you don't quote it, and you don't put the space in (echo
> 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I
> won't go into.
>
> --
> -- Stevie-O
> Real programmers use COPY CON PROGRAM.EXE
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users