Re: [sqlite] COUNT() extremely slow first time!

2011-02-24 Thread Sven L

Note sure. Will get back on this as soon as I have played a bit with the 
analyzer utility!
 
What I meant was "My own estimated x bytes/row" :)
x + ID + ~20 characters should make some ~32 bytes :)
 
> Date: Wed, 23 Feb 2011 10:58:00 -0800
> From: fle...@fletchowns.net
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> Is the record size you refer to here the same as the "Average payload per
> entry" that sqlite3_analyzer determines for me?
> 
> On Wed, Feb 23, 2011 at 5:09 AM, Max Vlasov  wrote:
> 
> > Greg, you should also take the record size into account. My hypothesis is
> > that if your record is comparatively small (several fits into 1024) the
> > speed of select count will be the same for any page size (my quick tests
> > confirm this). It's interesting to know what is an average size of your
> > record to understand why the numbers are so different.
> >
> > Max
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Problem with VACUUM feature

2011-02-23 Thread Sven L

Make sure your antivirus is turned off when you run your vacuum test.
It's a long-shot, but I've seen some AVs lock files etc...

 
> Date: Wed, 23 Feb 2011 10:14:15 -0600
> From: j...@kreibi.ch
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Problem with VACUUM feature
> 
> On Wed, Feb 23, 2011 at 11:50:05AM +0530, Sudha Venkatareddy scratched on the 
> wall:
> > Hi All,
> > 
> > Input:MyDb.db with size 23KB (has lot of empty pages caused due to delete
> > operation)
> > 
> > *Expected OutPut: after applying Vacuum command, should be MyDb.db with
> > reduced file size of 13KB.*
> > 
> > *Actual output: MyDb.db remains size 23KB(size not changes from original)
> > and creates temporary file etilqs_Hm4RUi6JPXcMZ17 whose data is same as
> > MyDb.db but the size is reduced to 13KB*
> 
> VACUUM is a two step process. First, the data is copied from the
> original file to a temp file. This is a high-level copy, where the
> data is compacted and reordered, and free pages are eliminated.
> 
> The second step copies the data from the temp file back to the
> primary file. This is done as a low-level page-by-page copy. It is
> *not* an OS file copy. By using the page update system already built
> into SQLite, the copy-back will create a rollback journal and remain
> transaction-safe for the whole VACUUM process.
> 
> From the sound of things, the first step is working, but the second
> step is failing for some reason. My first guess would be that there
> are permissions issues with creating the rollback file, so the second
> copy process fails. That's just a guess, however, as there could be
> a number of other issues. If you can figure out if a rollback file
> is ever being created, that would help determine if the copy-back is
> starting, but fails for some reason, or if the copy-back step is
> failing right from the start. Given the small database size, it
> might be somewhat hard to figure that out, however-- any rollback is
> going to be there and gone (or not there at all) very quickly.
> 
> -j
> 
> -- 
> Jay A. Kreibich < J A Y @ K R E I B I.C H >
> 
> "Intelligence is like underwear: it is important that you have it,
> but showing it to the wrong people has the tendency to make them
> feel uncomfortable." -- Angela Johnson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Sven L

Thanks for pointing this out!
 
In my case I have spent much time on normalizing my tables, so the row size 
should be constant in most cases. I do wonder though, what if the row size is 
32 bytes? Or is there a minimum?
 
For instance, I have many lookup tables with ID+text (usually around 20 
characters):
MyID|MyText
 
With a page size of 4096, will SQLite put ~200 rows in one page?
 

 
> Date: Wed, 23 Feb 2011 10:47:03 -0500
> From: pri...@gmail.com
> To: t...@djii.com; sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> The SQLite cache size is in pages (2000 by default), so by increasing
> the page size 8x, you're also increasing the cache size and memory
> requirements by 8x. Not saying it's a bad thing, just something to be
> aware of.
> 
> If you want to compare 1K and 8K page size and only compare the effect
> page size has, you should either increase the cache size to 16000 for
> 1K pages or decrease the cache to 250 for 8K pages.
> 
> The other thing to be aware of is that SQLite will not allow a row to
> cross 2 pages. (It does allow a row to be larger than a page, using
> an overflow page.) So for example, if your page size is 1024 and row
> size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this
> somewhat and ignoring internal SQLite data, but you get the idea. If
> your row size is 513 bytes, you will have 511 bytes of waste on each
> page, so 50% of your database will be "air". As your row size heads
> toward 1024 there will be less waste. At 1025 bytes, SQLite will
> start splitting rows into overflow pages, putting 1024 bytes into the
> overflow page and 1 byte in the btree page. These numbers aren't
> right, but illustrate the point.
> 
> So to find a good page size, experiment and measure.
> 
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> 
> 
> 
> On Wed, Feb 23, 2011 at 10:20 AM, Teg  wrote:
> > Hello Greg,
> >
> > I found this to be the case too. The difference between 1K and 8K is
> > staggering. I default all my windows DB's to 8K now.
> >
> >
> > Tuesday, February 22, 2011, 1:59:29 PM, you wrote:
> >
> > GB> I'm currently dealing with a similar issue. I've found that the 
> > page_size
> > GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm 
> > up"
> > GB> the table. On Windows 7, with page_size=1024, a SELECT 
> > COUNT(last_column)
> > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 
> > takes
> > GB> 8.5 seconds. This was done with a reboot between each test.
> >
> >
> >
> >
> > --
> > Best regards,
> >  Tegmailto:t...@djii.com
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Sven L

Does this trick work on the primary key? If not, why?
 
> From: max.vla...@gmail.com
> Date: Wed, 23 Feb 2011 16:09:04 +0300
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> On Tue, Feb 22, 2011 at 9:59 PM, Greg Barker  wrote:
> 
> > I'm currently dealing with a similar issue. I've found that the page_size
> > PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
> > the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
> > takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
> > 8.5 seconds. This was done with a reboot between each test.
> >
> > This page recommends a page_size of 4096:
> > http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows
> >
> > If I see better performance with the larger page sizes (going to test 16384
> > and beyond after this) is there any reason not to use them?
> >
> >
> Greg, you should also take the record size into account. My hypothesis is
> that if your record is comparatively small (several fits into 1024) the
> speed of select count will be the same for any page size (my quick tests
> confirm this). It's interesting to know what is an average size of your
> record to understand why the numbers are so different.
> 
> Returning to the original topic, for performance reasons I sometimes
> recommend using an index created on the id/rowid. It's a strange construct
> that makes no sense, but actually it sometimes give a speed improvement.
> This is because any index contains only the data used in it and if the query
> doesn't require getting additional data from the table it was created for,
> sqlite only reads this index and nothing else.
> 
> So to get the fastest count result one can create the following index
> (assuming id is the alias for rowid)
> 
> CREATE INDEX [idx_MyTableId] ON [MyTable] ([ID] )
> 
> And use the following query
> 
> SELECT COUNT(id) from (SELECT id FROM MyTable ORDER By Id)
> 
> "Order by" here forces using this index and I used outer select since
> count(id) inside the main select for unknown reasons triggers the table
> scanning.
> 
> For any query in my tests that usually takes 5-50 seconds, this one is
> always less than a second. But is costs a little in term of the size (the
> index takes space) and the speed of insert. If this is a small price to pay
> then this may be an answer.
> 
> Max
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Sven L

Interesting!
I've been using "PRAGMA page_size = 4096;" in my software. Perhaps I should 
increase it and see if I can get a performance gain.
 
Does it affect INSERTs too?
 
> Date: Tue, 22 Feb 2011 10:59:29 -0800
> From: fle...@fletchowns.net
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> I'm currently dealing with a similar issue. I've found that the page_size
> PRAGMA setting can have a dramatic effect on how long it takes to "warm up"
> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 takes
> 8.5 seconds. This was done with a reboot between each test.
> 
> This page recommends a page_size of 4096:
> http://www.sqlite.org/cvstrac/wiki?p=PerformanceTuningWindows
> 
> If I see better performance with the larger page sizes (going to test 16384
> and beyond after this) is there any reason not to use them?
> 
> Greg
> 
> On Mon, Feb 21, 2011 at 4:37 PM, Stephen Oberholtzer <
> oliverkloz...@gmail.com> wrote:
> 
> > On Mon, Feb 21, 2011 at 11:35 AM, Jim Wilcoxson  wrote:
> > > On Mon, Feb 21, 2011 at 11:05 AM, Sven L  wrote:
> > >>
> > >> Thank you for your detailed explanation!
> > >> First, can you please tell me how to purge the cache in Windows 7? This
> > could be very useful for my tests!
> > >
> > > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the
> > > development tools. On Linux, you do: echo 3 >
> > > /prog/sys/vm/drop_caches
> >
> > Just make sure you either (a) quote the 3 (echo '3' >
> > /proc/sys/vm/drop_caches) or (b) put a space between the 3 and the >.
> > If you don't quote it, and you don't put the space in (echo
> > 3>/proc/sys/vm/drop_caches) it doesn't do anything, for reasons I
> > won't go into.
> >
> > --
> > -- Stevie-O
> > Real programmers use COPY CON PROGRAM.EXE
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Auto-grow setting?

2011-02-22 Thread Sven L

What I meant is this:
Database size = 1 MB. When opening connection, set chunk to ~100 kB.
Database size = 100 MB. When opening connection, set chunk to ~10 MB.
Database size = 1 GB. When opening connection, set chunk to ~100 MB.
 
I guess SQLITE_FCNTL_CHUNK_SIZE should be a 2^n value, so this gives me this 
algorithm:
 
unsigned int v = databaseSizeBytes / 10;
// Round up to the next highest power of 2 (well-known bit trick ;)
--v;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
++v;
 
// A minimum of 32 kB is desirable?
chunkSize = max(32768, v);

 
Thoughts?
 
 
> From: paiva...@gmail.com
> Date: Tue, 22 Feb 2011 10:01:03 -0500
> Subject: Re: [sqlite] Auto-grow setting?
> To: sqlite-users@sqlite.org
> CC: larvpo...@hotmail.se
> 
> Please reply to the list, not to me only.
> 
> It's impossible to set chunk size to percentage of the database size,
> you can only set a constant value.
> 
> 
> Pavel
> 
> On Tue, Feb 22, 2011 at 9:13 AM, Sven L  wrote:
> > Thanks a lot! :D
> >
> > What do you think of setting the chunk size to approximately 10% of the
> > database file size? Or is it better to use a constant?
> >
> >> From: paiva...@gmail.com
> >> Date: Tue, 22 Feb 2011 08:30:54 -0500
> >> Subject: Re: [sqlite] Auto-grow setting?
> >> To: sqlite-users@sqlite.org
> >> CC: larvpo...@hotmail.se
> >>
> >> Is SQLITE_FCNTL_CHUNK_SIZE what you are looking for? See more
> >> information about it here:
> >> http://www.sqlite.org/c3ref/c_fcntl_chunk_size.html. Notice that this
> >> feature appeared only in recent version of SQLite, so if you have some
> >> earlier version you won't be able to control it and SQLite will
> >> grow/shrink database page-by-page (maximum page size is 32Kb).
> >>
> >>
> >> Pavel
> >>
> >> On Tue, Feb 22, 2011 at 7:28 AM, Sven L  wrote:
> >> >
> >> > Can't seem to find a setting to control how the database file grows when
> >> > full. Is there such a setting?
> >> > It looks like the file increases by some < 100 kB when it is full. I
> >> > want to change this to around 10 MB (or even more) to avoid file
> >> > fragmentation.
> >> >
> >> > Any ideas?
> >> >
> >> > Thanks
> >> > ___
> >> > sqlite-users mailing list
> >> > sqlite-users@sqlite.org
> >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >> >
> >
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Auto-grow setting?

2011-02-22 Thread Sven L

Can't seem to find a setting to control how the database file grows when full. 
Is there such a setting?
It looks like the file increases by some < 100 kB when it is full. I want to 
change this to around 10 MB (or even more) to avoid file fragmentation.
 
Any ideas?
 
Thanks
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

Thanks :)
 
This did the trick:
First make a copy of the database: copy HugeDatabase.db HugeDatabase_copy.db
Then for each run, replace the database with its copy. This is why I thought 
the COUNT operation was somehow written to the database after its first run... 
:P

> From: slav...@bigfraud.org
> Date: Mon, 21 Feb 2011 16:56:01 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> 
> On 21 Feb 2011, at 4:35pm, Jim Wilcoxson wrote:
> 
> > On Mon, Feb 21, 2011 at 11:05 AM, Sven L  wrote:
> >> 
> >> Thank you for your detailed explanation!
> >> First, can you please tell me how to purge the cache in Windows 7? This 
> >> could be very useful for my tests!
> > 
> > Sorry, dunno for Windows. On Mac OSX it is the purge command, in the
> > development tools.
> 
> In current versions it's just /usr/bin/purge/, which should be in your path. 
> No idea why a non-programmer should need it, but there it is.
> 
> > On Linux, you do: echo 3 >
> > /prog/sys/vm/drop_caches
> 
> And in Windows it's ... almost impossible. You can sync to disk, using 
> fflush(), and there's no reason you can't do this from the command-line:
> 
> http://technet.microsoft.com/en-us/sysinternals/bb897438.aspx
> 
> But for a purge you have to invalidate the contents of cache, and there's no 
> way for an application to tell Windows to do this. For a start, each 
> application has its own user mode address cache, so if you run another 
> utility to flush cache, it just messes with its own space. Second, there's no 
> system call that allows you access to the cache because Microsoft considers 
> it private to the device level. I am not dissing Microsoft for this: there 
> are fair arguments that this is actually the Right Way to do it.
> 
> So the way to do it is to overrun the cache by yourself. If you know your 
> cache is 2Gig, find a 2Gig disk file that has nothing to do with your test 
> suite and read it. Or generate 2Gig of gibberish and write it to disk, then 
> delete that file. Pah.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

Thank you for your detailed explanation!
First, can you please tell me how to purge the cache in Windows 7? This could 
be very useful for my tests!
 
I'm quite sure my database itself is not fragmented, since I have only inserted 
data. The file system is in good shape too; Windows reports 0% fragmentation. 
Perhaps there is some other bottleneck, like disk performance in general (this 
is a 5400 rpm)? Or antivirus-related CPU holds? (I did turn real-time scanning 
off though.)
I have even turned Windows Search off (and got a 20% performance gain!). My 
32-bit application is running under Windows 7 (64-bit). Could WOW64 have 
something to do with this performance issue?
 
The size of the database is 1,98 GB, with 1.5M rows. Couldn't it be that the 
table scan simply is darn slow for huge tables?
 
In an ideal world the COUNT() would be performed on the primary key in RAM. 
That's not possible? :P
 
Thanks again!
 
 
> Date: Mon, 21 Feb 2011 10:17:03 -0500
> From: pri...@gmail.com
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> This is a common issue on the mailing list. The first time you do
> count(*), SQLite (actually your OS) has to load data into memory.
> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
> doesn't have to read from disk.
> 
> Here's a timing from my own system, after a purge command to clear the
> buffer cache:
> 
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real 0m0.580s
> user 0m0.190s
> sys 0m0.034s
> 
> Same command again, with the file cached:
> 
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real 0m0.189s
> user 0m0.165s
> sys 0m0.019s
> 
> This time is consistent no matter how many times I run it, because the
> file is still cached. Doing a purge command to clear the cache and
> re-running the query, we get:
> 
> $ purge
> $ time sqlite3 hb.db 'select count(*) from logs'
> -- Loading resources from /Users/jim/.sqliterc
> count(*)
> --
> 734909
> 
> real 0m0.427s
> user 0m0.175s
> sys 0m0.024s
> 
> On my system, there is not a huge difference, but it is consistent.
> Now, if you have a fragmented file system, you will see a much larger
> difference. There are many posts on the mailing list about both file
> system fragmentation and logical fragmentation within the SQLite file
> itself. Your first count(*) is subject to these fragmentation
> effects, while your 2nd usually is not, because the file is in memory.
> 
> Some people on the list believe fragmentation is an unimportant detail
> you shouldn't worry about, because you have little control over it.
> That may be true, but it's useful to understand how it can affect
> performance. I think you are seeing this first hand.
> 
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> 
> 
> 
> On Mon, Feb 21, 2011 at 9:37 AM, Sven L  wrote:
> >
> > Same result :(
> > Note that I have compiled SQLite with the following switches:
> > SQLITE_ENABLE_STAT2
> > SQLITE_THREADSAFE=2
> >
> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
> > engine has to traverse all columns and it might even return another value 
> > if there are NULL-values...
> >
> > Also, this is quite interesting:
> >
> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
> > 0|0|0|SCAN TABLE Items (~100 rows)
> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
> > sqlite>
> >
> > I would expect an index scan on the first statement. The second statement 
> > tells me nada?!
> >
> > Thanks for your help!
> >
> >
> >> From: slav...@bigfraud.org
> >> Date: Mon, 21 Feb 2011 14:24:50 +
> >> To: sqlite-users@sqlite.org
> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
> >>
> >>
> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
> >>
> >> > SELECT COUNT(ItemID) FROM Items;
> >> >
> >> > This takes around 40 seconds the first time! WHY?!
> >>
> >> Try again, doing everything identically except that instead of the above 
> >> line use
> >>
> >> SELECT COUNT(*) FROM Items;
> >>
> >> Simon.
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

Same result :(
Note that I have compiled SQLite with the following switches:
SQLITE_ENABLE_STAT2
SQLITE_THREADSAFE=2
 
I've learnt that COUNT(*) is slower than COUNT(ID), since * means the engine 
has to traverse all columns and it might even return another value if there are 
NULL-values...
 
Also, this is quite interesting:
 
sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
0|0|0|SCAN TABLE Items (~100 rows)
sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
sqlite>

I would expect an index scan on the first statement. The second statement tells 
me nada?!
 
Thanks for your help!

 
> From: slav...@bigfraud.org
> Date: Mon, 21 Feb 2011 14:24:50 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] COUNT() extremely slow first time!
> 
> 
> On 21 Feb 2011, at 2:23pm, Sven L wrote:
> 
> > SELECT COUNT(ItemID) FROM Items;
> > 
> > This takes around 40 seconds the first time! WHY?!
> 
> Try again, doing everything identically except that instead of the above line 
> use
> 
> SELECT COUNT(*) FROM Items;
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Sven L

SQLite 3.7.5.
 
I have a table (13 columns with 1 primary key, 1 index on a date column, 1 
UNIQUE-constraint on two columns).
I insert ~130 rows, 1000 at a time (transaction-wise).
I close the database, reopen it and immediately perform a COUNT-operation:
 
SELECT COUNT(ItemID) FROM Items;

This takes around 40 seconds the first time! WHY?!
I believe the value is somehow cached inside the database after the above call.
 
 
The table used:
 
CREATE TABLE IF NOT EXISTS Item
(
 ItemID INTEGER PRIMARY KEY,
 A INTEGER NOT NULL,
 B INTEGER NOT NULL,
 C INTEGER NOT NULL,
 D INTEGER NOT NULL,
 E INTEGER NOT NULL,
 F INTEGER NOT NULL,
 G INTEGER NOT NULL,
 H DATE NOT NULL,
 I CHAR(3) NOT NULL,
 J INTEGER NOT NULL,
 K INTEGER NOT NULL,
 L INTEGER NOT NULL,
 
 UNIQUE (B, A)
);
CREATE INDEX IF NOT EXISTS Item_H_Index ON Item (H);
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Question regarding SQLITE_CANTOPEN

2011-02-17 Thread Sven L

sqlite3* db = NULL;
if (sqlite3_open_v2(path, &db, SQLITE_OPEN_READWRITE, NULL) == SQLITE_CANTOPEN)
{
// Ok, database does not exist. Still, db != NULL...?
ASSERT(db != NULL);
}
 
Database cannot be opened, but we get an open handle to it... Please explain!
 
Thanks.   
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.7.6 testing

2011-02-13 Thread Sven L

Lol... we just had a long discussion about sqlite_stat2 and ANALYZE command.
I removed these from my code, since the command should not be used except for 
rare cases :P
 
Should I re-add my ANALYZE-code and ask my original question again? Confused :P
 
> From: d...@sqlite.org
> Date: Sat, 12 Feb 2011 10:27:40 -0500
> To: sqlite-...@sqlite.org; sqlite-users@sqlite.org
> Subject: [sqlite] SQLite version 3.7.6 testing
> 
> The scheduled release of SQLite version 3.7.6 is mid-April. We are still
> two months away. However, version 3.7.6 will contain a lot of pent-up
> changes and so your advance testing of this release will be appreciated.
> 
> The latest pre-3.7.6 code can be found here:
> 
> http://www.sqlite.org/draft/download.html
> 
> And sketchy information about the changes added (so far) can be seen here:
> 
> http://www.sqlite.org/draft/releaselog/3_7_6.html
> 
> Of particular concern in 3.7.6 are some tweaks we've done to the query
> planner. The query planner is the part of SQLite that decides on the best
> and fastest algorithm to use to satisfy your queries. SQLite uses a
> cost-based query planner. In other words, it tries to estimate the CPU and
> disk I/O load used by various alternative algorithms and it selects the
> algorithm with the lowest estimated cost. The problem with this approach is
> that the cost really is an estimate based on incomplete information, and so
> it sometimes comes out wrong. The cost functions for 3.7.6 have been
> enhanced to be more accurate (we believe) particularly if SQLite is compiled
> using SQLITE_ENABLE_STAT2 and you run ANALYZE on your database files to
> gather statistics. But there could be instances where the cost estimates
> are not as good as before, resulting in performance regressions.
> 
> So, if you have the ability to download an early snapshot of SQLite 3.7.6
> and compile it into your application for performance testing purposes, we
> would really appreciate it if you would do so. And do so sooner rather than
> later so that we can have plenty of time to fix an issues you discover prior
> to the 3.7.6 release. If the query planner in 3.7.6 improves the
> performance of your application, we'd also like to hear about that too.
> 
> I'll try to post snapshots of 3.7.6 on a regular basis so that you can
> download and compile a fairly recent version at any time. Please pester me
> if I forget.
> 
> Thanks for your help.
> -- 
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regarding "Manual Control Of Query Plans"

2011-02-10 Thread Sven L

I remember why I added the sqlite_stat2 flag together with the ANALYZE command. 
It makes certain illformed queries efficient.
 
CREATE TABLE IF NOT EXISTS TC -- 10 rows
(
C INTEGER PRIMARY KEY,
Y CHAR(255) NOT NULL UNIQUE,
);

CREATE TABLE IF NOT EXISTS TB -- 100 rows
(
B INTEGER PRIMARY KEY, -- primary key!
X CHAR(255) NOT NULL UNIQUE
);
 
CREATE TABLE IF NOT EXISTS TA -- 10 rows
(
A INTEGER PRIMARY KEY,
C INTEGER NOT NULL,
B INTEGER NOT NULL,

UNIQUE (C, B) -- unique together with C!
);

Now, consider this query:
SELECT * FROM TB NATURAL JOIN TA; -- slow
vs
SELECT * FROM TA NATURAL JOIN TB; -- fast
 
That's why I needed the statistics. Otherwise, SQLite failed to select the best 
index. I understand it would choose the primary key in this case, which is of 
course wrong.
 
(I haven't tested the above for a while now, but I remember the problem I 
encountered had something to do with those tables.)
 
Regards

 
> From: slav...@bigfraud.org
> Date: Tue, 8 Feb 2011 16:53:52 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> 
> 
> On 8 Feb 2011, at 4:22pm, Sven L wrote:
> 
> > Thank you very much for your detailed explanation!
> 
> You're welcome. I've actually never had to think out this feature of SQLite 
> before, so it was interesting for me too. I hope Richard or the rest of the 
> team will correct me if I got anything wrong.
> 
> > I will comment out my calls to ANALYZE, and see how my software performs.
> > 
> > The reason why I added it in the first place is that users are allowed to 
> > create their own queries, and since not all of them are SQL experts, I 
> > wanted the engine to be as tolerant as possible. Perhaps it's a better idea 
> > to simply crave decent SQL!
> 
> Allowing users to make up their own queries on the fly does make things a 
> little more difficult as you worked out: you can't pre-make good indexes. The 
> big server/client databases cache temporary indexes (and share them between 
> users), so they handle unexpected queries far better: if any query comes in 
> that doesn't suit any indexes it simply makes up a new temporary index and 
> keeps it in case it's needed later. SQLite can't do this because it's 
> designed for a tiny footprint and can't chew up lots of memory or disk space 
> without a good reason.
> 
> But your initial questions did sound a little like premature optimisation and 
> I think you'll get decent results without worrying too much about it. I'm 
> sure the overwhelming number of SQLite users have never used ANALYZE even 
> once.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regarding "Manual Control Of Query Plans"

2011-02-08 Thread Sven L

Thank you very much for your detailed explanation!
I will comment out my calls to ANALYZE, and see how my software performs.
 
The reason why I added it in the first place is that users are allowed to 
create their own queries, and since not all of them are SQL experts, I wanted 
the engine to be as tolerant as possible. Perhaps it's a better idea to simply 
crave decent SQL!

 
> From: slav...@bigfraud.org
> Date: Tue, 8 Feb 2011 15:16:58 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> 
> 
> On 8 Feb 2011, at 2:39pm, Sven L wrote:
> 
> > Is it reasonable to guess that sqlite_stat* are used only for NATURAL JOINs 
> > and other not-so-obvious joins?
> 
> No. Consider this:
> 
> SELECT * FROM myTable WHERE a=104 AND b=213
> 
> Suppose there are two indexes on myTable: one indexes only column a, and the 
> other indexes only column b. The query optimizer has to choose between them. 
> Which index would it be best to use ?
> 
> The answer depends on the chunkiness of each column. If only three different 
> values ever appear in column a, but column b can have a thousand different 
> values, then it will be more efficient to use the index on column b. This 
> will return fewer rows which have to be scanned one-by-one for values in a.
> 
> But you're a good programmer. Knowing that you were going to have SELECT 
> commands like that one above you chose to create an index on both columns 
> (either order, it doesn't matter). Since you have done this, the results that 
> ANALYZE gathers don't matter at all ! The query optimizer finds the good 
> index and never has to choose between two bad indexes because it has the one 
> perfect index which will always be best.
> 
> So generally speaking ANALYZE matters only if the query optimizer has two or 
> more equally bad options. If you do your job as a database designer well it 
> will never need to consider chunkiness. The above explanation is simplified 
> but gives you the general idea.
> 
> > In my software, the database sometimes grows up to 1-2 GB immediately due 
> > to the nature of the application. Hence, the guidelines for the ANALYZE 
> > command do not suffice for me. "Do it once every x months" they say. User 
> > chooses to import huge amounts of data on regular basis. He is also able to 
> > run custom queries, which is why I found the sqlite_stat-tables useful.
> 
> Doesn't matter. This new data won't change the relative chunkiness of the 
> values in the columns. Though the precise numbers change, the /relative/ 
> chunkiness probably won't, so the choice of indexes probably won't need to 
> change either. Even less likely that it'll change by enough to change which 
> index the query optimiser should pick.
> 
> Doing another ANALYZE only really matters if the character of your data 
> changes, which is usually when a column that used to have very big chunks 
> suddenly has tiny chunks. The only time this happens if when a business 
> changes what it does in a significant way: going from having five product 
> lines to 500, or from selling in three countries to selling in thirty 
> countries. After they've been putting in new data reflecting this new usage 
> for a few months, /then/ it might be useful to run ANALYZE again.
> 
> Even then, the only thing you're changing is what the query optimizer chooses 
> as the best index. It might make the wrong decision and take 12ms for a 
> SELECT instead of 4ms. Not really a huge problem: if a delay of 8ms is 
> mission-critical you're probably using hardware at the edge of its 
> capabilities.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regarding "Manual Control Of Query Plans"

2011-02-08 Thread Sven L

Is it reasonable to guess that sqlite_stat* are used only for NATURAL JOINs and 
other not-so-obvious joins?
 
In my software, the database sometimes grows up to 1-2 GB immediately due to 
the nature of the application. Hence, the guidelines for the ANALYZE command do 
not suffice for me. "Do it once every x months" they say. User chooses to 
import huge amounts of data on regular basis. He is also able to run custom 
queries, which is why I found the sqlite_stat-tables useful.
 
> From: slav...@bigfraud.org
> Date: Mon, 7 Feb 2011 22:43:13 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> 
> 
> On 7 Feb 2011, at 9:37pm, Sven L wrote:
> 
> > Thanks for input. So you're saying that creating the tables manually is not 
> > the same as issuing the "ANALYZE" command?
> 
> You have no idea what the ANALYZE command does because it's not documented. 
> For all you know different versions of SQLite put different things in this 
> table. So don't try to simulate it, let SQLite do it. It's hardly difficult: 
> just issue that one instruction. As I wrote upthread, you might want to do it 
> about as often as you'd run the integrity check routine.
> 
> By the way you may be overestimating the impact of doing this. It will matter 
> only in situations where the query optimizer will never have to guess how 
> best to do its searches. If you've created good indexes and specified your 
> WHERE and ORDER BY clauses well, use of this table will be very low. There is 
> definitely little point in doing this until the particular installation has 
> lots of data in its tables ... perhaps after months of having had data 
> entered.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regarding "Manual Control Of Query Plans"

2011-02-07 Thread Sven L

Thanks for input. So you're saying that creating the tables manually is not the 
same as issuing the "ANALYZE" command?
 
However, ".dump sqlite_stat2" does NOT work. And I have been using the 
SQLITE_ENABLE_STAT2 switch for decades now :P
This is what this command produces:
 
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
COMMIT;

-
A bug?
 
 
 
> From: d...@sqlite.org
> Date: Mon, 7 Feb 2011 14:42:06 -0500
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> 
> On Mon, Feb 7, 2011 at 1:48 PM, Sven L  wrote:
> 
> >
> > Yes, this is what I have read on the net too. But, tests show that without
> > the sqlite_stat* tables, my queries are ridiculously slow (probably unwanted
> > table scans etc).
> > Real-life data... can't I simply take my real-life database and extract the
> > data in sqlite_stat*?
> >
> 
> You can take a sample database that is representative of what to expect for
> most of your users, run ANALYZE, then do ".dump sqlite_stat1". Save the
> output. Then when creating a new database for a user, after doing all your
> CREATE TABLE and CREATE INDEX statements, but before adding actual content,
> run ANALYZE there. The ANALYZE command will take a millisecond or so since
> there is no data.
> 
> Then run the "INSERT INTO sqlite_stat1" statements you saved in order to
> fill the stat1 table with content from your "sample" database.
> 
> 
> >
> > Btw, this command produces nothing even though the table does contain
> > several rows: ".dump sqlite_stat2"
> >
> 
> Compile with SQLITE_ENABLE_STAT2 in order to get the stat2 information.
> stat2 is a 10-sample histogram of the content of each index. This enables
> SQLite to make smarter choices about when to use index lookup and when to do
> table scans. For example:
> 
> SELECT * FROM tab WHERE x=0;
> 
> The above might do a table scan if a majority of the tab.x values are 0.
> But if only 1% of the tab.x values are 0, then it will use an index.
> 
> 
> >
> > Thanks
> >
> > > From: slav...@bigfraud.org
> > > Date: Mon, 7 Feb 2011 16:44:00 +
> > > To: sqlite-users@sqlite.org
> > > Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> > >
> > >
> > > On 7 Feb 2011, at 4:38pm, Sven L wrote:
> > >
> > > > So, with this in mind, it makes sense to precompute the sqlite_stat*
> > tables. Right?
> > >
> > > Which you do by running ANALYZE, but since it needs real-life data to
> > work on there's no point doing it until your customer has put some data in.
> > I don't write this type of application any more, but I might put it in a
> > maintenance routine -- some obscure menu option near the Config Preferences
> > or something. Run it as part of your yearly maintenance procedure, after
> > you've run 'PRAGMA integrity_check'.
> > >
> > > Simon.
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users@sqlite.org
> > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> 
> 
> 
> -- 
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regarding "Manual Control Of Query Plans"

2011-02-07 Thread Sven L

Yes, this is what I have read on the net too. But, tests show that without the 
sqlite_stat* tables, my queries are ridiculously slow (probably unwanted table 
scans etc).
Real-life data... can't I simply take my real-life database and extract the 
data in sqlite_stat*?
 
Btw, this command produces nothing even though the table does contain several 
rows: ".dump sqlite_stat2"

Thanks
 
> From: slav...@bigfraud.org
> Date: Mon, 7 Feb 2011 16:44:00 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> 
> 
> On 7 Feb 2011, at 4:38pm, Sven L wrote:
> 
> > So, with this in mind, it makes sense to precompute the sqlite_stat* 
> > tables. Right?
> 
> Which you do by running ANALYZE, but since it needs real-life data to work on 
> there's no point doing it until your customer has put some data in. I don't 
> write this type of application any more, but I might put it in a maintenance 
> routine -- some obscure menu option near the Config Preferences or something. 
> Run it as part of your yearly maintenance procedure, after you've run 'PRAGMA 
> integrity_check'.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Regarding "Manual Control Of Query Plans"

2011-02-07 Thread Sven L

Sounds reasonable!
So, with this in mind, it makes sense to precompute the sqlite_stat* tables. 
Right?
 
> From: slav...@bigfraud.org
> Date: Mon, 7 Feb 2011 15:33:50 +
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Regarding "Manual Control Of Query Plans"
> 
> 
> On 7 Feb 2011, at 3:12pm, Sven L wrote:
> 
> > I've discovered that the ANALYZE command is extremely slow on my large 
> > databases. I read the documentation and it says that the command should be 
> > issued when the schema is changed "a lot". Hmm, when exactly could that be? 
> > Currently, I issue the command every time I have inserted ~1000 rows in my 
> > main table.
> 
> You should never need the ANALYZE command in a routine your users run. If 
> your first 1000 rows are representative of the 'chunkiness' of your data, do 
> it once, then there's no need to issue that command ever again.
> 
> The results are used only to help the query analyser choose which order to 
> consider columns in. In order to do this it's useful to know things like "The 
> column called 'currency' is almost always the same value, whereas the column 
> called 'totalAmount' changes a lot.". That's what that command does. If you 
> radically changed your data by, for example, moving from a national to an 
> international service then it might be worth running ANALYZE again once you 
> had a bunch of invoices in other currencies in, but only then, and only once.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
  
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Regarding "Manual Control Of Query Plans"

2011-02-07 Thread Sven L

I've discovered that the ANALYZE command is extremely slow on my large 
databases. I read the documentation and it says that the command should be 
issued when the schema is changed "a lot". Hmm, when exactly could that be? 
Currently, I issue the command every time I have inserted ~1000 rows in my main 
table.
 
Now I'm thinking of using precomputed sqlite_stat1 and sqlite_stat2 tables, 
since the relationship between my tables is pretty much (say 95%) constant in 
all database instances. I'm wondering the same as John did here: 
http://osdir.com/ml/sqlite-users/2009-02/msg00298.html

1) Will this work?
2) Is it "dangerous"? e.g. could this completely confuse the query optimiser, 
or is it the case that as long as the hard coded values are "realistic" it 
doesn't matter that they don't reflect the reality of the table (which is what 
the docs imply)?
2) Has anyone experience of trying this?

Thanks!   
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users