Re: [sqlite] INDEX Types

2011-11-09 Thread Jay A. Kreibich
On Thu, Nov 10, 2011 at 12:28:24AM +0100, GB scratched on the wall:

> Ok, just thought it worth mentioning. But a VACUUMed database may be
> more efficient if you have multiple columns with multiple indexes
> because you get a mixed sequence of data and index pages while
> inserting data. VACUUM rearranges them so that pages belonging to an
> object are grouped together which in turn helps drawing benefit from
> prefetching and caching.

  While that's true, the difference is more limited for indexes.
  VACUUM rebuilds tables in ROWID order, hence re-packing both 
  records into a page, and pages into the database file.
  
  Indexes, on the other hand, are rebuilt with a table scan, essentially
  the same as CREATE INDEX on an existing table.  This means that if
  the table rows (in ROWID order) are not already more or less in-order
  (according to the index) the rebuilt index will suffer from
  inserted values and re-balanced nodes.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Black, Michael (IS)
Doing a file copy has similar behavior.  So as long as the file is cached 
everything is copacetic.  That's what leads me to believe it's head thrashing 
causing this behavior.



 ./sqlite3 index2.db  However, reboot again and add "select count(*) from a;" as the first line of 
> gendat2.sql
>
> time sqlite3 index.db  110
> 2.389u 1.123s 0:07.39 47.3% 0+0k 0+0io 0pf+0w
> That's faster than the 2nd run before rebooting.

Out of interest, if you have the time, instead of doing "select count(*) from 
a;" can you just copy the file to another file ?  Possibly 
read-a-block-write-a-block ?  I'm curious to know whether this makes a 
difference.

I have tried some of these things on a Mac, but rebooting doesn't make anything 
like so much difference on a Mac.  It's slower without the reboot but faster 
with the reboot.  However, I don't have a Mac to play with which isn't also 
acting as a server.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread GB



Fabian schrieb am 09.11.2011 23:10:

I'm running these tests on a very simple database: 1 table and 1 column, so
ANALYZE shouldn't have any effect. And I already tested running VACUUM on
the database, but it didn't help (which seems logical, because I start with
a fresh db each time).

Ok, just thought it worth mentioning. But a VACUUMed database may be 
more efficient if you have multiple columns with multiple indexes 
because you get a mixed sequence of data and index pages while inserting 
data. VACUUM rearranges them so that pages belonging to an object are 
grouped together which in turn helps drawing benefit from prefetching 
and caching.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 GB 

> Maybe you could try to use a pagesize that matches the size of a disk
> allocation unit or memory page. For Windows since a typical NTFS partition
> has a clustersize of 4KiB - which happens to also be the size of a memory
> page - a pagesize of 4096 Byte seems to be a good compromise between
> pleasing the cache manager and waste of space.


I already used a page_size of 4096 for all the benchmarks, together with a
large (300MB) cache_size, but thanks for the suggestion anyway!


> You could also investigate the effect of VACUUMing and ANALYZEing the
> Database. If you have multiple indexes on tables try compiling sqlite
> defining the SQLITE_ENABLE_STAT3 switch to get even better results from
> ANALYZE.
>

I'm running these tests on a very simple database: 1 table and 1 column, so
ANALYZE shouldn't have any effect. And I already tested running VACUUM on
the database, but it didn't help (which seems logical, because I start with
a fresh db each time).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Petite Abeille

On Nov 9, 2011, at 10:24 PM, Fabian wrote:

> And I'd like to avoid to have some fuzzy logic
> that tries to predicts which of the two methods is going to be faster.

Perhaps an alternative to your conundrum is the good, old "divide and conquer" 
approach. In other words, you could partition your data among multiple database 
files, using a simple hash to write into one partition or another. That, in 
combination with some astute use of 'attach', is as good an approximation as 
any of table partitioning in SQLite.

http://en.wikipedia.org/wiki/Partition_(database)

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread GB
Maybe you could try to use a pagesize that matches the size of a disk 
allocation unit or memory page. For Windows since a typical NTFS 
partition has a clustersize of 4KiB - which happens to also be the size 
of a memory page - a pagesize of 4096 Byte seems to be a good compromise 
between pleasing the cache manager and waste of space. You could also 
investigate the effect of VACUUMing and ANALYZEing the Database. If you 
have multiple indexes on tables try compiling sqlite defining the 
SQLITE_ENABLE_STAT3 switch to get even better results from ANALYZE.


hth
gerd
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Nico Williams
On Wed, Nov 9, 2011 at 3:24 PM, Fabian  wrote:
> 2011/11/9 Nico Williams 
>> I don't get it.  You're reading practically the whole file in a random
>> manner, which is painfully slow, so why can't you read the file in one
>> fell swoop (i.e., sequential reads)??
>
> I'm only reading the whole file when the number of additional inserts is
> high enough to cause the whole index to be read from disk. But if I always
> pre-cache the database, it will downgrade performance for cases when only
> 10 inserts need to be done. And I'd like to avoid to have some fuzzy logic
> that tries to predicts which of the two methods is going to be faster.

I don't see how to avoid that.  Set N=100 inserts before you read the
whole thing into memory.  You'll need to be able to cache, somewhere,
whether the DB has been read since last reboot (you could use a table
in the same DB for this).

> Besides, pre-caching the file sounds easier than it is to accomplish,
> because all methods suggested on this list did not work on Windows (for
> example copying the file to null). Windows and the harddrive have their own
> logic to decide which data to cache, and I haven't found a simple way to
> force a certain file into cache.

On many operating systems copying a file to /dev/null or equivalent
can fail to read the file into cache.

On a Unix the cp(1) utility might mmap() in the file then write(2) the
file a page at a time to /dev/null, with the page fault deferred until
the last minute, but since /dev/null doesn't use the data, the page
fault never comes, thus the file is never read into memory.

If you want to read the file reliably you may need to use a SELECT, or
actually *read* the file into memory, not just mmap() it.

>> Or, if FTS really works better, then use that.
>
> I will, but I'm trying to understand the issue that i'm facing, not just
> workaround it. It seems that FTS doesn't need to read the whole index from
> disk, so I'm trying to pinpoint the difference. My best guess is that it
> creates a fresh b-tree for the additional inserts, causing the boost in
> performance.

Yes, it'd be nice to understand what FTS is doing.  I can imagine many
ways to implement an index that has the performance characteristic
you've observed, but with various trade-offs.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Black, Michael (IS)
I'm imagining that the index is only loading what it has to for each insert.

That results in head thrashing the disk when the file isn't cached.



I'm going to profile this and see what pops out.  I'm also going to use 3.7.9.





Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Fabian [fabianpi...@gmail.com]
Sent: Wednesday, November 09, 2011 3:30 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] INDEX Types

2011/11/9 Black, Michael (IS) <michael.bla...@ngc.com>

OK...you're right...a reboot kills it.
>

I'm glad someone was able to reproduce this on Linux, ruling out the
possibility it's a Windows-issue.


> However, reboot again and add "select count(*) from a;" as the first line
> of gendat2.sql
>

So if a simple SELECT COUNT(*) can speed up the same insert from 90 seconds
to 7 seconds (including the count), does this confirm it's a sequential vs
random reads problem?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Petite Abeille

On Nov 9, 2011, at 10:24 PM, Fabian wrote:

> It seems that FTS doesn't need to read the whole index from
> disk, so I'm trying to pinpoint the difference. My best guess is that it
> creates a fresh b-tree for the additional inserts, causing the boost in
> performance.

Indeed.

Quoting the fine manual:

"Multiple b-tree structures are used instead of a single b-tree to reduce the 
cost of inserting records into FTS tables. When a new record is inserted into 
an FTS table that already contains a lot of data, it is likely that many of the 
terms in the new record are already present in a large number of existing 
records. If a single b-tree were used, then large doclist structures would have 
to be loaded from the database, amended to include the new docid and 
term-offset list, then written back to the database. Using multiple b-tree 
tables allows this to be avoided by creating a new b-tree which can be merged 
with the existing b-tree (or b-trees) later on. Merging of b-tree structures 
can be performed as a background task, or once a certain number of separate 
b-tree structures have been accumulated. Of course, this scheme makes queries 
more expensive (as the FTS code may have to look up individual terms in more 
than one b-tree and merge the results), but it has been found that in 
 practice this overhead is often negligible."

http://www.sqlite.org/fts3.html#section_8
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Black, Michael (IS) 

OK...you're right...a reboot kills it.
>

I'm glad someone was able to reproduce this on Linux, ruling out the
possibility it's a Windows-issue.


> However, reboot again and add "select count(*) from a;" as the first line
> of gendat2.sql
>

So if a simple SELECT COUNT(*) can speed up the same insert from 90 seconds
to 7 seconds (including the count), does this confirm it's a sequential vs
random reads problem?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Nico Williams 

>
> I don't get it.  You're reading practically the whole file in a random
> manner, which is painfully slow, so why can't you read the file in one
> fell swoop (i.e., sequential reads)??
>

I'm only reading the whole file when the number of additional inserts is
high enough to cause the whole index to be read from disk. But if I always
pre-cache the database, it will downgrade performance for cases when only
10 inserts need to be done. And I'd like to avoid to have some fuzzy logic
that tries to predicts which of the two methods is going to be faster.

Besides, pre-caching the file sounds easier than it is to accomplish,
because all methods suggested on this list did not work on Windows (for
example copying the file to null). Windows and the harddrive have their own
logic to decide which data to cache, and I haven't found a simple way to
force a certain file into cache.

Or, if FTS really works better, then use that.


I will, but I'm trying to understand the issue that i'm facing, not just
workaround it. It seems that FTS doesn't need to read the whole index from
disk, so I'm trying to pinpoint the difference. My best guess is that it
creates a fresh b-tree for the additional inserts, causing the boost in
performance.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Black, Michael (IS)
OK...you're right...a reboot kills it.  Here's the program I generated the data 
with:

#include 
main()
{
char sql[4096];
char key[9];
long *lkey=key;
int i;
FILE *fp1,*fp2;
fp1=fopen("gendat1.sql","w");
fp2=fopen("gendat2.sql","w");
fputs("CREATE TABLE t (a text);",fp1);
fputs("CREATE INDEX tx on t(a);",fp1);
fputs("BEGIN;",fp1);
fputs("BEGIN;",fp2);
for(i=0;i<110;++i) {
int j;
for(j=0;j<8;++j) {
key[j]=(random()%26)+'a';
}
if (i < 100) {
fprintf(fp1,"insert into t values('%s');\n",key);
}
else {
fprintf(fp2,"insert into t values(%ld);\n",*lkey);
}
}
fputs("COMMIT;",fp1);
fputs("COMMIT;",fp2);
fclose(fp1);
fclose(fp2);
}


time sqlite3 index.db < gendat1.sql
21.594u 5.497s 0:27.87 97.1%0+0k 0+0io 0pf+0w
time sqlite3 index.db < gendat2.sql
12.302u 1.035s 0:04.46 74.6% 0+0k 0+0io 0pf+0w

rm index.db
time sqlite3 index.db < gendat1.sql
20.641u 6.477s 0:27.91 97.1%0+0k 0+0io 0pf+0w

reboot
time sqlite3 index.db < gendat2.sql
2.474u 1.219s 1:28.56 4.1%  0+0k 0+0io 0pf+0w

However, reboot again and add "select count(*) from a;" as the first line of 
gendat2.sql

time sqlite3 index.db /proc/sys/vm/drop_caches

You don't see the same performance drop.

So I'm wondering is this is disk cache instead of Linux cache.


Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems



From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Fabian [fabianpi...@gmail.com]
Sent: Wednesday, November 09, 2011 1:36 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] INDEX Types


2011/11/9 Luuk <luu...@gmail.com>

>
> no, i did not reboot, I created two NEW databases, so rebooting should not
> make a big difference because its not cached in both cases.


If you just created it, why wouldn't it be in cache? You just written all
the data, so it's highly likely to be in some kind of cache (OS,HD,etc.).

As already explained to Michael, I experience no problems when I create the
database, close it, re-open it, and add some more rows. The problem only
appears if I reboot in between those steps.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Nico Williams
On Wed, Nov 9, 2011 at 1:53 PM, Fabian  wrote:
> 2011/11/9 Nico Williams 
>> What's wrong with reading the whole file into memory at boot time as a
>> way to prime the cache?  Rebooting always takes some time, mostly the
>> time to read all sorts of files.
>>
> It's a desktop application, I cannot pre-cache anything before the user
> launches the app. After the launch a couple of inserts need to be done, but
> having to read the whole database file into memory, just to be able to
> perform those inserts, doesn't seem a scalable solution.

I don't get it.  You're reading practically the whole file in a random
manner, which is painfully slow, so why can't you read the file in one
fell swoop (i.e., sequential reads)??

You've picked a technology that scales only vertically with the
hardware.  If you've reached the limit of that hardware (no, you've
not)  then it's time to upgrade the hardware or switch to a technology
that scales horizontally.  But you're not there yet.  And this is for
a desktop app, so horizontal scaling would seem out of the question
(or else you'd have built a client/server app in the first place).
This leaves you with pre-fetching the file.  You could defer the
pre-fetching as much as possible (e.g., wait until the Nth operation,
for N<<10,000), but that's about as much as you can do.  Or, if FTS
really works better, then use that.

nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Nico Williams 

>
> What's wrong with reading the whole file into memory at boot time as a
> way to prime the cache?  Rebooting always takes some time, mostly the
> time to read all sorts of files.
>
>
It's a desktop application, I cannot pre-cache anything before the user
launches the app. After the launch a couple of inserts need to be done, but
having to read the whole database file into memory, just to be able to
perform those inserts, doesn't seem a scalable solution.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Nico Williams
Fabian,

What's wrong with reading the whole file into memory at boot time as a
way to prime the cache?  Rebooting always takes some time, mostly the
time to read all sorts of files.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Nico Williams
On Wed, Nov 9, 2011 at 12:04 PM, Fabian  wrote:
> 2011/11/9 Simon Slavin 
>> Didn't someone recently note that entering the first million records was
>> fast, but if he then closed and reopened the database, entering the next
>> 100,000 records was slow ?
>>
> Yes, and there is still no real explanation for it, other than slow disk
> reads. But even with very slow random disk I/O, 30 seconds seems still way
> too slow for a 100MB file.

I posted some math on this assuming typical disk seek times.  30
seconds is very much less than the worst case scenario.

I don't have an answer regarding FTS.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Luuk

On 09-11-2011 20:14, Fabian wrote:

2011/11/9 Luuk


On 09-11-2011 17:23, Black, Michael (IS) wrote:


  time sqlite3

$ time sqlite3
Did you do a reboot between the second insert? Because the difference I'm
seeing is much larger than 38%? Did you test it on Linux or Windows?


Linux (Linux opensuse 2.6.27.56-0.1-default #1 SMP 2010-12-01 16:57:58 
+0100 x86_64 x86_64 x86_64 GNU/Linux)


no, i did not reboot, I created two NEW databases, so rebooting should 
not make a big difference because its not cached in both cases.





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Luuk 

> On 09-11-2011 17:23, Black, Michael (IS) wrote:
>
>>  time sqlite3> 19.307u 0.082s 0:19.40 99.8%0+0k 0+0io 0pf+0w
>>
>> time sqlite3<  index2.sql
>> 19.266u 0.092s 0:19.37 99.8%0+0k 0+0io 0pf+0w
>>
>>
>
> $ time sqlite3 
> real0m21.094s
> user0m20.989s
> sys 0m0.104s
> $ time sqlite3 
> real0m20.898s
> user0m20.813s
> sys 0m0.084s
> $ time sqlite3 gendat1.sqlite 
> real2m32.701s
> user0m26.038s
> sys 0m5.256s
> $ time sqlite3 gendat2.sqlite 
> real1m50.452s
> user0m25.534s
> sys 0m5.360s
>
>
> The difference when creating a db on disk (size: 35Mb) is (152-110)/110=
> 38%
>
>
Did you do a reboot between the second insert? Because the difference I'm
seeing is much larger than 38%? Did you test it on Linux or Windows?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Luuk

On 09-11-2011 17:23, Black, Michael (IS) wrote:

  time sqlite3


$ time sqlite3 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Black, Michael (IS) 

> Are you sure you're using BEGIN/COMMIT on your transactions?
>

Yes


> I just used my benchmark data and inserted another 100,000 rows into the
> database in 2.3 seconds.


That is because you immediately insert those additional rows, after
creating the database. I get the same excellent performance as you when I
do that. But if you make sure the database is not in cache (for example,
reboot the machine), you will see that the additional inserts will take
MUCH longer than to be reasonally expected.

Could you repeat your benchmark, with a reboot between the two transactions?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Black, Michael (IS)
Are you sure you're using BEGIN/COMMIT on your transactions?

I just used my benchmark data and inserted another 100,000 rows into the 
database in 2.3 seconds.

I made 1,100,000 records and cut the last 100,000 into a seperate file with 
BEGIN/COMMIT on both.

time sqlite3 index.db 

>
> Didn't someone recently note that entering the first million records was
> fast, but if he then closed and reopened the database, entering the next
> 100,000 records was slow ?
>
>
Yes, and there is still no real explanation for it, other than slow disk
reads. But even with very slow random disk I/O, 30 seconds seems still way
too slow for a 100MB file.

But today I made a new observation: if I create the same table as a virtual
FTS4 table, I can add the additional rows within 1 second (even on an
un-cached database file).

So if the reason for the slowness is disk-reads, the FTS4 way of
creating/updating the index requires much less reads? Maybe because it
allows for seperate tree-branches?

FTS is overkill for my simple requirements, but if it's so much faster than
a regular index, why not? The only things that's holding me back from
switching to FTS for this table is:

 - I cannot use the UNIQUE constraint, to disallow duplicate values.
 - I cannot search efficiently for rows that DON'T match a certain value,
because FTS doesn't allow a single NOT operator.

So can someone explain what FTS is doing behind the scenes that makes these
additional inserts so much faster?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Simon Slavin 

>
> Didn't someone recently note that entering the first million records was
> fast, but if he then closed and reopened the database, entering the next
> 100,000 records was slow ?
>
>
Yes, and there is still no real explanation for it, other than slow disk
reads. But even with very slow random disk I/O, 30 seconds seems still way
too slow for a 100MB file.

But today I made a new observation: if I create the same table as a virtual
FTS4 table, I can add the additional rows within 1 second (even on an
un-cached database file).

So if the reason for the slowness is disk-reads, the FTS4 way of
creating/updating the index requires much less reads? Maybe because it
allows for seperate tree-branches?

FTS is overkill for my simple requirements, but if it's so much faster than
a regular index, why not? The only things that's holding me back from
switching to FTS for this table is:

 - I cannot use the UNIQUE constraint, to disallow duplicate values.
 - I cannot search efficiently for rows that DON'T match a certain value,
because FTS doesn't allow a single NOT operator.

So can someone explain what FTS is doing behind the scenes that makes these
additional inserts so much faster?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Simon Slavin

On 9 Nov 2011, at 4:42pm, Fabian wrote:

> 2011/11/9 Black, Michael (IS) 
> 
>> Hmmm...appears to be the same for this case which, I must say, I find
>> surprising.
> 
> Thanks for actually benchmarking it. I'm also a bit surprised, because I
> always thought SQLite handled INTEGER more efficiently than TEXT.

In order to save filespace, SQLite stores integers not as a fixed number of 
bytes per value, but as a different number of bytes depending on the value.  
This would make things slower than being able to read a fixed number of bytes 
from disk and using double-word maths for the values you've read.

> I also did some new benchmarks on the inserts-slowdown I experience, and to
> rule out VirtualBox, I performed them on real hardware, but the results are
> the same:
> 
> 10 secs to create a database with 1 million rows, and 30 secs to add an
> additional 10.000 rows, if the database file is un-cached.
> 
> Maybe this is normal behaviour, but how can it possibly be that it's faster
> to delete the whole database and re-create it, than just adding a
> (relatively) small amount of additional rows?

Didn't someone recently note that entering the first million records was fast, 
but if he then closed and reopened the database, entering the next 100,000 
records was slow ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Black, Michael (IS) 

> Hmmm...appears to be the same for this case which, I must say, I find
> surprising.
>
>
Thanks for actually benchmarking it. I'm also a bit surprised, because I
always thought SQLite handled INTEGER more efficiently than TEXT.

I also did some new benchmarks on the inserts-slowdown I experience, and to
rule out VirtualBox, I performed them on real hardware, but the results are
the same:

10 secs to create a database with 1 million rows, and 30 secs to add an
additional 10.000 rows, if the database file is un-cached.

Maybe this is normal behaviour, but how can it possibly be that it's faster
to delete the whole database and re-create it, than just adding a
(relatively) small amount of additional rows?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Pavel Ivanov
> But I'm wondering if SQLite can deal more efficiently with a INTEGER index
> (64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs
> require less disk-space because SQLite can store smaller values in fewer
> bytes, but are there any other differences that make them more preferable
> as an INDEX? If there is no difference in performance, I could just take
> the first 8 characters of the TEXT column as a hash-value, instead of
> calculating a CRC64 each time.

My first thought was "sure, text will be compared byte-by-byte, 64-bit
integers on a 64-bit CPU will be compared all in one operation". But
then SQLite should read integer from database first and I'm not sure
if it does some optimization to read the whole integer in one CPU
operation, maybe it does that also byte-by-byte. So probably you won't
notice much difference in performance of either way of indexing.


Pavel


On Wed, Nov 9, 2011 at 10:23 AM, Fabian  wrote:
> I'm having an issue where inserts on an un-cached database are very slow.
> The reason probably is that a large part of the existing index needs to be
> read from disk, to be able to insert new rows to the index. The length of
> the values in the indexed column are around 60 bytes, so I'm thinking about
> adding an extra column, containing a shorter hash (64bits) of the actual
> value, and move the index to that column instead. This way the total size
> of the index (and the database) will be much smaller, hopefully resulting
> in faster inserts.
>
> But I'm wondering if SQLite can deal more efficiently with a INTEGER index
> (64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs
> require less disk-space because SQLite can store smaller values in fewer
> bytes, but are there any other differences that make them more preferable
> as an INDEX? If there is no difference in performance, I could just take
> the first 8 characters of the TEXT column as a hash-value, instead of
> calculating a CRC64 each time.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Simon Slavin

On 9 Nov 2011, at 3:23pm, Fabian wrote:

> I'm having an issue where inserts on an un-cached database are very slow.
> The reason probably is that a large part of the existing index needs to be
> read from disk, to be able to insert new rows to the index.

Are you doing a lot of INSERT commands one after another ?  Things will go far 
faster if you wrap them in a transaction:

BEGIN TRANSACTION;
INSERT ...;
INSERT ...;
INSERT ...;
END TRANSACTION;

> The length of
> the values in the indexed column are around 60 bytes, so I'm thinking about
> adding an extra column, containing a shorter hash (64bits) of the actual
> value, and move the index to that column instead. This way the total size
> of the index (and the database) will be much smaller, hopefully resulting
> in faster inserts.
> 
> But I'm wondering if SQLite can deal more efficiently with a INTEGER index
> (64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs
> require less disk-space because SQLite can store smaller values in fewer
> bytes, but are there any other differences that make them more preferable
> as an INDEX?

Yes.  It is far faster to compare two integers than it is to compare two 
strings.  This will speed up everything to do with the index, and make the 
index take up less space on disk.  Of course, you will have to take the time to 
calculate a hash value for each string.

> If there is no difference in performance, I could just take
> the first 8 characters of the TEXT column as a hash-value, instead of
> calculating a CRC64 each time.

This would be a very fast HASHing calculation.  If it's easy to program, then 
it might be worth a try.

Simon.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users