[sqlite] Re: Optimising Large Tables by Splitting 1:1

2006-03-10 Thread James Austin
The question is more to do with theoretical optimisation rather than 
improving the speed of that specific database (which was used only for 
benchmarks), also note that the 700 record difference was based on around 
5000 queries per second (definitly not a margin of error (this is a problem 
with all tables that have a large number of fields, reguardless of the 
database or software)).


In theory lets say I have hundreds of databases running on the same server 
hence I want to optimise the methods in which the database is created and 
queries are performed (keep in mind that that the 14% increase in processing 
does eventually add up).


Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which 
require joining.


The problem here is that a join is one of the slowest queries, hence these 
1:1's must be combined.


2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not 
need to be joined)


Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1 
relationships you remove joins but create tables with a large number of 
fields.. which itself has the problems of seeking to fields in the tail end.


What I am asking is more to do with design procedure rather than the 
database system itself, is the solution I have shown for 1:1 the correct 
method for solving the seek problems or does the solution itself present 
unseen problems.


James Austin




[sqlite] Re: Optimising Large Tables by Splitting 1:1

2006-03-10 Thread James Austin
The question is more to do with theoretical optimisation rather than 
improving the speed of that specific database (which was used only for 
benchmarks), also note that the 700 record difference was based on around 
5000 queries per second (definitly not a margin of error (this is a problem 
with all tables that have a large number of fields, reguardless of the 
database or software)).


In theory lets say I have hundreds of databases running on the same server 
hence I want to optimise the methods in which the database is created and 
queries are performed (keep in mind that that the 14% increase in processing 
does eventually add up).


Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which 
require joining.


The problem here is that a join is one of the slowest queries, hence these 
1:1's must be combined.


2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not 
need to be joined)


Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1 
relationships you remove joins but create tables with a large number of 
fields.. which itself has the problems of seeking to fields in the tail end.


What I am asking is more to do with design procedure rather than the 
database system itself, is the solution I have shown for 1:1 the correct 
method for solving the seek problems or does the solution itself present 
unseen problems.


James Austin




[sqlite] Re: Optimising Large Tables by Splitting 1:1

2006-03-10 Thread James Austin
The question is more to do with theoretical optimisation rather than 
improving the speed of that specific database (which was used only for 
benchmarks), also note that the 700 record difference was based on around 
5000 queries per second (definitly not a margin of error (this is a problem 
with all tables that have a large number of fields, reguardless of the 
database or software)).


In theory lets say I have hundreds of databases running on the same server 
hence I want to optimise the methods in which the database is created and 
queries are performed (keep in mind that that the 14% increase in processing 
does eventually add up).


Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which 
require joining.


The problem here is that a join is one of the slowest queries, hence these 
1:1's must be combined.


2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not 
need to be joined)


Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1 
relationships you remove joins but create tables with a large number of 
fields.. which itself has the problems of seeking to fields in the tail end.


What I am asking is more to do with design procedure rather than the 
database system itself, is the solution I have shown for 1:1 the correct 
method for solving the seek problems or does the solution itself present 
unseen problems.


James Austin




[sqlite] Re: Optimising Large Tables by Splitting 1:1

2006-03-10 Thread James Austin
The question is more to do with theoretical optimisation rather than 
improving the speed of that specific database (which was used only for 
benchmarks), also note that the 700 record difference was based on around 
5000 queries per second (definitly not a margin of error (this is a problem 
with all tables that have a large number of fields, reguardless of the 
database or software)).


In theory lets say I have hundreds of databases running on the same server 
hence I want to optimise the methods in which the database is created and 
queries are performed (keep in mind that that the 14% increase in processing 
does eventually add up).


Consider the following as the steps taken to optimise an old database:
1. Start with a database that contains 1:1 Relationships, some of which 
require joining.


The problem here is that a join is one of the slowest queries, hence these 
1:1's must be combined.


2. Merge all 1:1 Relationships into 1 Table

Result: 1 Large table with n number of fields.
Problem: As a result fields in the tail end of the table are slow to query.

3. Break up the table as per how it is used (so long the new tables do not 
need to be joined)


Result: Small Tables that are do not require joining.
Problems: ??

As we all know seeking through any file takes time, by combining large 1:1 
relationships you remove joins but create tables with a large number of 
fields.. which itself has the problems of seeking to fields in the tail end.


What I am asking is more to do with design procedure rather than the 
database system itself, is the solution I have shown for 1:1 the correct 
method for solving the seek problems or does the solution itself present 
unseen problems.


James Austin




[sqlite] terrible behavior

2006-03-10 Thread emilia12
hi guys
i need help :-)

what is the problem :
(SQLite version 3.3.4, win XP)

i have a table:

CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid
INTEGER);
INSERT INTO plan VALUES(1,1,0);
INSERT INTO plan VALUES(2,1,0);
INSERT INTO plan VALUES(3,1,0);
INSERT INTO plan VALUES(4,1,NULL);
INSERT INTO plan VALUES(5,1,NULL);
INSERT INTO plan VALUES(6,1,NULL);
INSERT INTO plan VALUES(7,1,NULL);
INSERT INTO plan VALUES(8,1,NULL);
-- ...

and i need to select all rows with statid = 0
so the query is :
SELECT statid
FROM plan P1
WHERE P1.curseid = 0;

BUT what a surprise, i got an empty result (i expect to get
rows No 1 to 3) !

any suggestions?

regards
Emily




-

“БЯЛА КНИГА на българските телекомуникации, услуги и технологии”
Регистрирай се и я изтегли БЕЗПЛАТНО!
www.jnn-marketing.com/WPTelecom.htm 



Re: [sqlite] terrible behavior

2006-03-10 Thread Guillaume MAISON

[EMAIL PROTECTED] a écrit :

hi guys
i need help :-)

what is the problem :
(SQLite version 3.3.4, win XP)

i have a table:

CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid
INTEGER);
INSERT INTO plan VALUES(1,1,0);
INSERT INTO plan VALUES(2,1,0);
INSERT INTO plan VALUES(3,1,0);
INSERT INTO plan VALUES(4,1,NULL);
INSERT INTO plan VALUES(5,1,NULL);
INSERT INTO plan VALUES(6,1,NULL);
INSERT INTO plan VALUES(7,1,NULL);
INSERT INTO plan VALUES(8,1,NULL);
-- ...

and i need to select all rows with statid = 0
so the query is :
SELECT statid
FROM plan P1
WHERE P1.curseid = 0;

BUT what a surprise, i got an empty result (i expect to get
rows No 1 to 3) !

any suggestions?


Yes :
SELECT P1.statid
FROM plan P1
WHERE P1.statid = 0;

instead of your query...

but you'll get only what you asked for : only O ;)


--

Guillaume MAISON - [EMAIL PROTECTED]
83, Cours Victor Hugo
47000 AGEN
Tél : 05 53 87 91 48 - Fax : 05 53 68 73 50
e-mail : [EMAIL PROTECTED] - Web : http://nauteus.com



Re: [sqlite] sqlite performance with sizeable tables

2006-03-10 Thread Christian Smith
On Thu, 9 Mar 2006 [EMAIL PROTECTED] wrote:

- Original Message 
 From: Christian Smith [EMAIL PROTECTED]

 Is this a likely usage scenario? Will your application regularly
 umount/mount the filesystem between transactions? While sounding
 facetious, I'm not trying to. Your otherwise excellent example is let down
 by a probably unreasonable usage pattern.

My example was just that, a test case trying to explain why my actual app
goes to a crowl.
The umount/mount is there to insure that I am not benchmarking the OS
caching abilities.


But SQLite depends on the OS caching abilities for much of it's
performance. Removing it is like saying SQLite is rubbish on Intel
processors after testing on a i486.


What this does is that it gives a better idea on the performance of the
caching of sqlite.


The SQLite cache is very restricted. It only caches data over a single
transaction, discarding it on COMMIT. This is because another process may
update the file between transactions.

The file format has a file generation number, which could be used to delay
the discarding of the SQLite cache between transactions, but I believe it
is as yet unimplemented.



 The pages containing nodes closer to the root of the affected btrees will
 be cached pretty quickly, but the random access pattern and constant
 random updates will make for very non-localised updates to the DB file,
 which would require lots of seeks on sync.

 You might be well advised to look at how long it takes to do the updates,
 then the COMMIT seperately. I'd hazard a guess that you'll be spending the
 majority of the time in the COMMIT commands. Something like:

All right, so I added the intermediate timing to see where the time is spend.

My results show that on the typical 7 seconds operation (after the
numbers stabilise) 2 seconds are spend between the BEGIN / END and 5
seconds to perform the COMMIT. The first transactions show things like:


When I saw this, what I tried later on was to play around with
PRAGMA synchronous =OFF and PRAGMA synchronous = NORMAL

While synchronous = NORMAL didn't show any real improvement on the
timings compared to FULL full async mode was dramatically faster:
the random test that was taking 600+ seconds to execute now runs in 172
seconds (and running the global sync after the test only took a couple
seconds, meaning most data was already written on disk). I looked at
timings, and now the time is basically spend between the BEGIN/END, the
COMMIT is very fast.

I was very excited about this result, so I tried async access on my
actual app (that holds 8M+ records), but I was disapointed to see that
async actually even slow things down?!


By much? This surprises me. Perhaps there is another bottleneck in the app
that isn't covered by this test case?

Async access isn't recommended anyway, as it leaves data vulnerable to OS
crashes or power failure.



I also tried to mount the filesystem (reiser) in journal mode
(rw,noatime,data=journal), I had it in writeback, and things get slower
with this setting.

So what I tried next was to simply run my app after doing a
cat generaldb.db  /dev/null
(this takes about 15 seconds on my system, it's a 600Megs file)

This has the effect of preloading the db file in memory in the OS caches.

After I do this, the app becomes very fast (as in 4 times faster).

note: I noticed that even if I specify some ridiculous number like
PRAGMA cache_size = 45 (that should be enough to cache the whole db by the 
way)
for the cache, the memory footprint of my app stays pretty low, as if it was 
not caching much!?

So at this point, I think the caching/polling subsytem seems to be the problem.


See above. The cache lives as long as the transaction.



Oh also, a way to improve performance:
reads can be reordered, so if sqlite was doing async reads of several
places in the db file at the same time, the OS/drive would reorder things
the same way they are reordered when writing for a good boost in
performance.


This would add complication to SQLite. We depend on the OS cache to shield
us from having to do async IO. It's difficult to share SIGIO, so an app
that also did async IO would have problems with SQLite. In general, read
performance is not a problem in SQLite. The biggest bottleneck is
transaction COMMIT, due to the synchronous nature of the operation.


General observations/questions:
- What did you expect from SQLite? Can you give indication of performance
  expectations?
- What did you use previously? How does SQLite compare on performance?
- Have you tested other databases? What sort of performance did you get
  from those?
- If you're not batching, or can't batch updates, then SQLite may very
  well not be performant enough due to inherent restrictions of it's
  architecture.
- An idea of what your app does in general would be of assistance in
  helping you find your bottleneck. A specific test case is good if this
  is indeed the issue, but it doesn't sound like it is.


Nicolas



Re: [sqlite] terrible behavior

2006-03-10 Thread emilia12
thanks

actualy the problem was in the sqlite explorer because the
column was defined as integer and any text is interpreted
as zero :-)


Цитат на писмо от Guillaume MAISON [EMAIL PROTECTED]:

 [EMAIL PROTECTED] a ?crit :
  hi guys
  i need help :-)
 
  what is the problem :
  (SQLite version 3.3.4, win XP)
 
  i have a table:
 
  CREATE TABLE plan(personid INTEGER, curseid INTEGER,
 statid
  INTEGER);
  INSERT INTO plan VALUES(1,1,0);
  INSERT INTO plan VALUES(2,1,0);
  INSERT INTO plan VALUES(3,1,0);
  INSERT INTO plan VALUES(4,1,NULL);
  INSERT INTO plan VALUES(5,1,NULL);
  INSERT INTO plan VALUES(6,1,NULL);
  INSERT INTO plan VALUES(7,1,NULL);
  INSERT INTO plan VALUES(8,1,NULL);
  -- ...
 
  and i need to select all rows with statid = 0
  so the query is :
  SELECT statid
  FROM plan P1
  WHERE P1.curseid = 0;
 
  BUT what a surprise, i got an empty result (i expect to
 get
  rows No 1 to 3) !
 
  any suggestions?

 Yes :
 SELECT P1.statid
 FROM plan P1
 WHERE P1.statid = 0;

 instead of your query...

 but you'll get only what you asked for : only O ;)


 --

 Guillaume MAISON - [EMAIL PROTECTED]
 83, Cours Victor Hugo
 47000 AGEN
 T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50
 e-mail : [EMAIL PROTECTED] - Web : http://nauteus.com







-

Slon.bg ™
Симпатичният магазин за
книги, DVD, игри и музика
http://www.slon.bg




Re: [sqlite] terrible behavior

2006-03-10 Thread Reid Thompson

[EMAIL PROTECTED] wrote:

thanks

actualy the problem was in the sqlite explorer because the
column was defined as integer and any text is interpreted
as zero :-)

  

no, actually i think Guillaume was correct.
This is the query you posted:

SELECT statid
FROM plan P1
WHERE P1.curseid = 0;

Note in your data that NO curseid is equal to 0.  They are all set to 1.



[EMAIL PROTECTED] a ?crit :


hi guys
i need help :-)

what is the problem :
(SQLite version 3.3.4, win XP)

i have a table:

CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid INTEGER);
INSERT INTO plan VALUES(1,1,0);
INSERT INTO plan VALUES(2,1,0);
INSERT INTO plan VALUES(3,1,0);
INSERT INTO plan VALUES(4,1,NULL);
INSERT INTO plan VALUES(5,1,NULL);
INSERT INTO plan VALUES(6,1,NULL);
INSERT INTO plan VALUES(7,1,NULL);
INSERT INTO plan VALUES(8,1,NULL);
-- ...

and i need to select all rows with statid = 0
so the query is :
SELECT statid
FROM plan P1
WHERE P1.curseid = 0;

BUT what a surprise, i got an empty result (i expect to
  

get


rows No 1 to 3) !

any suggestions?
  

Yes :
SELECT P1.statid
FROM plan P1
WHERE P1.statid = 0;

instead of your query...

but you'll get only what you asked for : only O ;)


--

Guillaume MAISON - [EMAIL PROTECTED]
83, Cours Victor Hugo
47000 AGEN
T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50
e-mail : [EMAIL PROTECTED] - Web : http://nauteus.com









-

Slon.bg ™
Симпатичният магазин за
книги, DVD, игри и музика
http://www.slon.bg


  




Re: [sqlite] Optimising Large Tables by Splitting 1:1

2006-03-10 Thread Jay Sprenkle
On 3/10/06, James Austin [EMAIL PROTECTED] wrote:
 I've been doing a fair bit of research into optimising larger tables and
 just want a bit of feedback (most of my lecturers are giving conflicting
 answers).

Each database has it's own strengths and weeknesses and they may all
have different performances. So the correct answer may be different based on
what database is used.


Re: [sqlite] Optimising Large Tables by Splitting 1:1

2006-03-10 Thread Christian Smith
On Fri, 10 Mar 2006, James Austin wrote:

I've been doing a fair bit of research into optimising larger tables and
just want a bit of feedback (most of my lecturers are giving conflicting
answers).

Just say I have a table with 200+ fields (all relate uniquly to the primary
key), when querying the first 10 fields of a table I get just as fast a
response (about the same as if I were querying a table with the same number
of rows but only 10 fields) however as I get further into the table (i.e.
selecting fields 180..190 there seems to be a noticable decrease in speed
(by noticable I mean around 750 benchmark queries per second slower)).


Have a read of:
http://www.sqlite.org/php2004/page-001.html
http://www.sqlite.org/php2004/slides-all.html

Tables are implemented in a Btree, and described from slide 34 onwards.

The first portion of a row is stored along with other rows in a btree leaf
page. If the row doesn't fit, it spills into overflow pages, that are
chained using a singly linked list of page pointers (overflow pages store
data from a single row.) Thus, to find column 180..190 in the above
example, first the row must be located, then the linked list traversed to
find the pages with the desired columns.



As a measure to increase the speed I was thinking that a viable option would
be to break up the table based on how it is queried, such that sets of
fields will be divided into new tables that contain fields relating to the
query (so long as any fields that may be used in multiple queries (and
require rejoining later) are not separated as this is much slower than
keeping all fields in one table to begin with).

The main question is, is the method listed above the best way to improve the
speed of a large table or should they all remain in the same table as
splitting may cause other problems later on.


My question is, why do you need so many columns in a row? Is this a
hypothetical example, or a real world application?



One method suggested by one of my lectures was to leave the table as one and
use views, however after testing this out I found views slower than querying
the table large table directly (and seeing that they don't increase the
table size it leads me to believe that a view is simply an alias to a
query).


Views won't help if your data is so inefficiently laid out. Views are
indeed an alias to a query, and very useful for common queries and
transforming data.

You'd be better breaking up your rows into more managable tables, and
using views to create a compatible view of your old data. If this database
is already normalised, however, god help you I say! However, if this is
purely research, it may be better directed elsewhere, as 200+ column rows
are rare even in the most badly run government agencies.



Any feedback is appreciated,


My 2c (hmm, I'm spending a lot recently)


James Austin.


Christian


-- 
/\
\ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL
 X   - AGAINST MS ATTACHMENTS
/ \


[sqlite] Char versus Float versus Int as primary key

2006-03-10 Thread cstrader
I'm looking to enhance query speed.  Does it matter what variable type I use as 
the primary key?  I'm thinking numbers might be faster than strings, but I'm 
not sure.  

Thanks!


[sqlite] Another speed question....

2006-03-10 Thread cstrader

I assume it is faster to key on only one variable rather than two?

Thanks


Re: [sqlite] Char versus Float versus Int as primary key

2006-03-10 Thread Jay Sprenkle
On 3/10/06, cstrader [EMAIL PROTECTED] wrote:
 I'm looking to enhance query speed.  Does it matter what variable type I use 
 as the primary key?  I'm thinking numbers might be faster than strings, but 
 I'm not sure.

 Thanks!


Try it and see :)


[sqlite] Char versus Float versus Int as primary key

2006-03-10 Thread cstrader


Jay...  I could do that and maybe will.   But does anyone know whether it 
SHOULD matter?  Or are you saying that there are no general principles in this 
regard?

Thanks..

Re: [sqlite] Char versus Float versus Int as primary key

2006-03-10 Thread Jay Sprenkle

 Jay...  I could do that and maybe will.   But does anyone know whether it 
 SHOULD matter?  Or are you saying that there are no general principles in 
 this regard?

I would think the difference would be pretty minimal but I don't know for sure.
It would seem to be quicker to test than to wait for someone else to
look it up for you.


Re: [sqlite] terrible behavior

2006-03-10 Thread emilia12
yes you are correct,
but even when i rewrite the typo, there was no result (i got
an empty one)
i mean with the originald table, not the present in letter
(i got it with sqlite explorer export function)
but when i select all from the db with sqlite.exe i saw,
that in this column the data was ... NULLs and some strings
(garbage). Now i put corect data (numbers) and all is ok -
so 10x !

regards
e.

Цитат на писмо от Reid Thompson [EMAIL PROTECTED]:

 [EMAIL PROTECTED] wrote:
  thanks
 
  actualy the problem was in the sqlite explorer because
 the
  column was defined as integer and any text is
 interpreted
  as zero :-)
 
 
 no, actually i think Guillaume was correct.
 This is the query you posted:

 SELECT statid
 FROM plan P1
 WHERE P1.curseid = 0;

 Note in your data that NO curseid is equal to 0.  They
 are all set to 1.


  [EMAIL PROTECTED] a ?crit :
 
  hi guys
  i need help :-)
 
  what is the problem :
  (SQLite version 3.3.4, win XP)
 
  i have a table:
 
  CREATE TABLE plan(personid INTEGER, curseid INTEGER,
 statid INTEGER);
  INSERT INTO plan VALUES(1,1,0);
  INSERT INTO plan VALUES(2,1,0);
  INSERT INTO plan VALUES(3,1,0);
  INSERT INTO plan VALUES(4,1,NULL);
  INSERT INTO plan VALUES(5,1,NULL);
  INSERT INTO plan VALUES(6,1,NULL);
  INSERT INTO plan VALUES(7,1,NULL);
  INSERT INTO plan VALUES(8,1,NULL);
  -- ...
 
  and i need to select all rows with statid = 0
  so the query is :
  SELECT statid
  FROM plan P1
  WHERE P1.curseid = 0;
 
  BUT what a surprise, i got an empty result (i expect
 to
 
  get
 
  rows No 1 to 3) !
 
  any suggestions?
 
  Yes :
  SELECT P1.statid
  FROM plan P1
  WHERE P1.statid = 0;
 
  instead of your query...
 
  but you'll get only what you asked for : only O ;)
 
 
  --
 
  Guillaume MAISON - [EMAIL PROTECTED]
  83, Cours Victor Hugo
  47000 AGEN
  T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50
  e-mail : [EMAIL PROTECTED] - Web :
 http://nauteus.com
 
 
 
 
 
 
 
 
  -
 
  Slon.bg ™
  Симпатичният магазин за
  книги, DVD, игри и музика
  http://www.slon.bg
 
 
 











[sqlite] Copatibility whith MySQL

2006-03-10 Thread Bastian Haustein

Hello!

My name is Bastian from Germany and I am new to this List. I hope I 
respect all common rules by sending this message.


I usualy work whith MySQL. Now I whant to develope a software for 
testing reasons. This shall behave like the old MySQL software but work 
whith SQL Lite.


Now my question: How compatible are SQL Lite and MySQL? I sometimes use 
the MySQL dialect like


INSERT INTO sometable SET rowA='123', rowB='324'

which is no ANSI-SQL. Is there any tutorial for MySQL-Users that are 
new to SQL Lite?


Kind regards and many greetz!

Bastian


Re: [sqlite] recompiled source code, make test failed

2006-03-10 Thread drh
yklin [EMAIL PROTECTED] wrote:
 Hi,
 I recompiled sqlite (3.2.7) source in RedHat 9 (I upgrade kernel to 
 2.4.31).  After all, make test reports lock test failed, why ? Is 
 there any option in kernel need to turn on ? Do anyone have the same 
 problem ?
 

There are bugs in the way pthreads is implemented in
Linux 2.4.x and RedHat9.  These bugs cause problems
for SQLite when used in multi-threaded applications.
Various versions of SQLite attempt to work around 
those bugs, with varying degrees of success.  As far 
as I know, the latest versions of SQLite work (but 
I do not have RedHat9 to know for sure.) I would 
suggest that you upgrade.
--
D. Richard Hipp   [EMAIL PROTECTED]



Re: [sqlite] Optimization help requested

2006-03-10 Thread drh
Pam Greene [EMAIL PROTECTED] wrote:
 Hi all,
 
 I'm working on a system to add full-text indexing on top of SQLite in a
 semi-automated way.  The general idea is that users will call an API to
 register a document table for indexing, and the system will take care of
 everything from there.
 
 When a row is added to a registered document table, an SQLite trigger calls
 a C function that parses the text for that document and saves it in a token
 table, along with some meta-information that's used later by the querying
 system to retrieve documents matching a given search.
 
 Although all this is working, it's awfully slow.  I'm fairly new to SQLite,
 and I'm hoping that some of the gurus out there can give me advice on
 speeding it up a bit.
 

I did not take the time to read you code closely.  But I do
have some experience doing full-text search with SQLite.
See http://www.sqlite.org/cvstrac/wiki?p=ExperimentalMailUserAgent

Some general advice:

  *  Map documents into integers using one table.  Map words into
 integers using a separate vocubulary table.  Then create the
 linkage between documents and words using a single table that
 has only two integer columns.

  *  Use prepared statements.

  *  Wrap all updates inside a transaction.

If you still have performance problems, profile your code to
figure out where to optimize.
--
D. Richard Hipp   [EMAIL PROTECTED]



Re: [sqlite] File locking additions

2006-03-10 Thread drh
Helmut Tschemernjak [EMAIL PROTECTED] wrote:
 
 The current lock offset will not allow to copy open DB files if the 
 database gets larger than 0x4000 bytes. This is because locked 
 regions cannot be copied under Windows, we changed it to:
 
 #define PENDING_BYTE  I64CONST(0x7fff)
 #define RESERVED_BYTE I64CONST(0x7fff0001)
 #define SHARED_FIRST  I64CONST(0x7fff0002)
 

You are, of course, free to changes SQLite in any way you want
for your own use.  But this change cannot be made in the official
SQLite implementation for two reasons:  (1)  It would break
backwards compatibility.  (2) It will not work on systems that
do not support large files (file bigger than 2GiB) such as many
old Unix implementations and Win95.

--
D. Richard Hipp   [EMAIL PROTECTED]



Re: [sqlite] File locking additions

2006-03-10 Thread drh
Christian Smith [EMAIL PROTECTED] wrote:
 
 3. Extend the sqlite3_open commands to support URI style path
 references in order to specify the file system locking type (as
 opposed to modifying the arguments list).  After a little poking
 around on RFC 3986 http://www.ietf.org/rfc/rfc3986.txt I'm inclined
 to specify the locking choice via the query part of the URI.  For
 example:
 
  file:///mounts/myhost/dbfile.db?locktype=flock
  file:///localdisk/otherdbfile.db?locktype=automatic
 
 
 I'd be more inclined to add a PRAGMA. URIs are ugly and a pain to type in,

The reason for using a URI is so that you could specify
the locking style for databases that you ATTACH.  You are
not required to use a URI - an ordinary filename will continue
to work.  You could, perhaps, set the default locking mechanism
using a pragma.  But it is important to be able to specify
alternative locking mechanisms for ATTACHed databases.
--
D. Richard Hipp   [EMAIL PROTECTED]



Re: [sqlite] sqlite performance with sizeable tables

2006-03-10 Thread drh
[EMAIL PROTECTED] wrote:
 
 One question though: are the file access sorted, so that seeks are
 minimised when performing a transaction (making the assumption that 
 the file is not fragmented on disk)?
 

I'm not sure what you are asking.  Can you restate your question?
--
D. Richard Hipp   [EMAIL PROTECTED]



Re: [sqlite] Char versus Float versus Int as primary key

2006-03-10 Thread Dennis Cote

cstrader wrote:

I'm looking to enhance query speed.  Does it matter what variable type I use as the primary key?  I'm thinking numbers might be faster than strings, but I'm not sure.  


Thanks!

 

Integer primary keys will be the fastest since SQLite optimizes this 
case by not creating a separate index for the key. The integer key is 
stored directly as the rowid in the btree used to store the row data. To 
find a row it looks up the key value in the rowid of the table's btree.


For other primary key types, SQLite creates an index that stores the key 
field and the rowid of the row data in the btree. This index is stored 
in another btree. To find a row it looks in the index btree to find the 
rowid of the row that matches the key value, then it looks up the data 
in the table btree using the rowid it got from the index.


I believe that SQLite uses memcmp() to compare the key values for all 
key types. So there is no benefit to using numeric values over text as 
long as the text is usually different within the first few characters. 
It doesn't do floating point compares using floating point instructions 
for example (which could be done in a single instruction on most modern 
CPUs).


HTH
Dennis Cote


[sqlite] updating SQLite to implement The Third Manifesto

2006-03-10 Thread Darren Duncan

All, and particularly Dr. Hipp,

Lately, my own database project in Perl, named Rosetta, has evolved 
to officially be an implementation of Chris Date's and Hugh Darwen's 
proposal for relational databases called The Third Manifesto, which 
is talked about at http://www.thethirdmanifesto.com/ and in various 
books and papers of theirs.


Rosetta has its own API which defines a D relational database 
language, as apposed to a SQL language, and Rosetta's back-ends to 
implement this are interchangeable.  I am making a pure Perl 
reference implementation (called Rosetta::Engine::Example) which is 
coded simply for ease of understanding but that is slow.


Separately, I or third parties would be making alternate back-ends 
that either are self-implemented and better performing, or that 
constitute wrappers over existing database products, usually SQL 
based, since those are fairly mature and plentiful.  SQLite is one of 
the first such back-ends to be used.


Now, I would like to propose, and if necessary I will contribute 
significant targeted funding (when I have the money) to pay Dr. Hipp 
and/or other developers, some significant feature changes (as a fork 
if necessary) to SQLite such that it directly implements and gains 
the efficiencies of The Third Manifesto.  This includes both the 
addition of and the removal of current features, and certain 
behaviours would change.  Hopefully all for the better.


As a result of these changes, not only would SQLite better serve as a 
back-end of Rosetta, but non-Rosetta users of SQLite would get the 
most critical of the same benefits from it directly.


I anticipate that the changes would mainly affect the upper layers, 
which convert user commands into virtual machine code, but that the 
virtual machine and b-tree and OS layers would remain more or less 
unchanged (this depends, of course, on a few details).  Possibly, we 
would add a new command language.


I am hoping that, to keep longer term maintenance easier, these 
changes can be implemented in the trunk and activated using either 
run time pragmas or compile time options or both.  But if they would 
require a fork, then the forked product would have to be named 
something else that doesn't have 'SQL' in its name, since SQL does 
not satisfy The Third Manifesto.  Maybe 'TTMLite' or something that 
sounds better.


Here are some of the changes that I propose the pragma or compile 
time option or fork would have; they all refer to what the user sees, 
not to implementation details that should be hidden:


1.  Add a distinct logical BOOLEAN data type.  It is the data type of 
output from logical expressions like comparisons, and the input to 
'and', 'or', etc.


2.  Have strong and strict data typing for both variables and values.

2.1  Table columns are always declared to be of a specific type (eg: 
BOOLEAN, INTEGER, REAL, TEXT, BLOB) and nothing but values of the 
same type can be stored in them; attempting to do otherwise would 
fail with an exception.


2.2  The plain equality test is supported for all data types.

2.3  All operators/functions have strongly typed parameters and 
return values, and invoking them with arguments that aren't of the 
right type will fail with an exception.  The equality test likewise 
can only compare operands of the same type.


2.4  There is no implicit type conversion; data types must be 
explicitly converted from one type to another.


2.5  INTEGER and REAL data types have separate sets of operators, 
which do the expected thing with their types.  For example, each has 
a separate division operator whose input and output are all of that 
same type.  No worrying about when to round or not.


2.6  SQLite may already be this way, but:  All characters in a string 
are significant, including whitespace, so 'a' and 'a ' are always 
unequal.


3.  There is no such thing as a NULL.

3.1  All logic is 2VL (true, false) not 3VL (true, false, unknown).

3.2  Every variable of a particular type always contains a value that 
is valid for that type, so logic for dealing with it is simpler. 
Likewise, every with every literal value.


3.3  The code to implement operators is a lot simpler.

3.4  Missing data can be either represented with the data type's 
empty value, or a table column that may possibly be unknown can be 
split into a separate related table, that only has records when the 
value is known.


3.5  All variables default to a reasonable valid value for their type 
if not explicitly set, such as the number zero or the empty string.


4.  There is no significant hidden data.  A row id can only be an 
explicitly declared table column.  The implementation of a table can 
use hidden row ids, but the user wouldn't see them.


5.  No duplicate rows in tables or queries are allowed.

5.1  In SQL terms, every table has an implicit unique key constraint 
over all of its columns.  This is ignored if there are any actual 
explicit keys, whether primary or otherwise.  In TTM terms, it 

Re: [sqlite] sqlite performance with sizeable tables

2006-03-10 Thread spaminos-sqlite
- Original Message 
 From: [EMAIL PROTECTED]
 [EMAIL PROTECTED] wrote:
  
  One question though: are the file access sorted, so that seeks are
  minimised when performing a transaction (making the assumption that 
  the file is not fragmented on disk)?
  

 I'm not sure what you are asking.  Can you restate your question?

Basically, the db file is accessed with seek + read/write operations.
Given a set of such operations, it can be very beneficial to reorder those 
operations so that the file is accessed going from the begining to the end of 
the file (and not pure random).
This is a problem mostly for reads (writes are reordered when doing a 
combination of async writes + flush at the end).

I am guessing that given the nature of sql (a lot of iterations through 
keys/indexes), it might be possible to create batches of disk i/o queries:
create a table for the next N reads (with offset) and do those reads sorted by 
offset.

In the graphics world (where I come from), it's common practice to at least do 
prefetching of data (send in batch on the bus), and I believe that the benefits 
would be similar here.

Nicolas






Re: [sqlite] sqlite performance with sizeable tables

2006-03-10 Thread spaminos-sqlite
- Original Message 
 From: Christian Smith [EMAIL PROTECTED]

 But SQLite depends on the OS caching abilities for much of it's
 performance. Removing it is like saying SQLite is rubbish on Intel
 processors after testing on a i486.

yes and no: while it's nice to be able to rely somehow on the OS cache, the OS 
can not guess what the usage pattern of the app is going to be.
Most of the time, the os will guess wrong too: a typical case is random access 
of a file, the os does readahead for example that is not beneficial in that 
case (and actually can slow things down).
I always try to give the best hints possible to the underlying subsystem (like 
never rely 100% on the optimizer to know what to do with a bunch of code, 
because it won't).

When dealing with big databases (that don't typically fit in memory), the OS 
will most likely cache the wrong kind of data, where as the lib could cache 
important information such as some key offsets in the file (the cache would 
contain some elements from the btree so that we can resume directly from there).

 The SQLite cache is very restricted. It only caches data over a single
 transaction, discarding it on COMMIT. This is because another process may
 update the file between transactions.
Oh I didn't know that the db cache was that short lived!?


 The file format has a file generation number, which could be used to delay
 the discarding of the SQLite cache between transactions, but I believe it
 is as yet unimplemented.

Oh I see the cache problem is probably related to multiple process access.
In my case I have a multithreaded process using nice locks to access the db 
(that way I never get busy database too).

 I was very excited about this result, so I tried async access on my
 actual app (that holds 8M+ records), but I was disapointed to see that
 async actually even slow things down?!
 By much? This surprises me. Perhaps there is another bottleneck in the app
 that isn't covered by this test case?

Well, this test focuses on accessing one database, and my app accesses several.
It was not much slower, maybe a 10% penalty, but I was expecting a dramatic 
gain of performance :)

On the other hand, I tried to make better use of the cache: if I run my 1M 
inserts in 10 transactions of 100,000,  things get a bit slower than 100 
transactions of 10,000 inserts.
I tried one transaction of 1,000,000 inserts and the test app hangs at 100% cpu 
for over 30 minutes now, not sure about what is going on here.

 Oh also, a way to improve performance:
 reads can be reordered, so if sqlite was doing async reads of several
 places in the db file at the same time, the OS/drive would reorder things
 the same way they are reordered when writing for a good boost in
 performance.

 This would add complication to SQLite. We depend on the OS cache to shield
 us from having to do async IO. It's difficult to share SIGIO, so an app
 that also did async IO would have problems with SQLite. In general, read
 performance is not a problem in SQLite. The biggest bottleneck is
 transaction COMMIT, due to the synchronous nature of the operation.
On unix/linux systems SIGIO is one way of doing async stuff (I never use it 
because it doesn't play well with threads).
Using poll or select will not interfere with other parts of the application.
I do believe that batching requests can dramatically improve performance:
a typical join will read 2 tables on disk, causing a lot of seeks.
Read performance is important when dealing with large files (relative to RAM 
size), I believe that quite a bit of people use sqlite with multi GB files, so 
I don't think this is a stupid thing to look into.

 General observations/questions:
 - What did you expect from SQLite? Can you give indication of performance
   expectations?
 - What did you use previously? How does SQLite compare on performance?
I was expecting much better performance that what I am getting right now:
before I was using the filesystem as a DB (basically, folders + filenames - 
data)
Namely reiserfs performs pretty well for doing this kind of thing (millions of 
record).
But that was wasting a lot of disk space and hitting all sorts of limits in the 
various OSes.

Tried BDB a few years ago as a replacement, but wedged databases and 
licensing issues kept us away from it.
then I red about sqlite giving performance of the order of 25000 inserts/second.
I thought that I could probably get around 1 inserts/second on bigger 
databases.

Right now, sqlite shows performance that is on par with a simple filesystem 
structure (except it's much faster to backup because traversing a multimilion 
file structure takes several hours).
I was expecting a few things by moving to sqlite:
* getting a better ramp up (cold cache) performance than a dump filesystem 
structure.
* having a [much] higher throughput (so that I can combine multiple servers 
into one), as the records are small and there is no fragmentation of the 
folders here.
* having 

Re: [sqlite] updating SQLite to implement The Third Manifesto

2006-03-10 Thread Roger Binns
In doing all of the above, SQLite should actually be simpler to 
implement, and it will be easier to use, with more predictable 
results and fewer bugs.


Only for some applications.  It would be harder to use for
my apps.  Specifically manifest typing as currently implemented
in SQLite is a perfect match for apps writen in Python (which
also uses manifest typing).  It would require a lot more code
to go through and force the data type for each column.

Roger


Re: [sqlite] Issue with LIKE on BLOBs

2006-03-10 Thread drh
Boris Popov [EMAIL PROTECTED] wrote:
 There's a problem with LIKE in queries produces inconsistent results,
 whereas = works as expected,
 
 CREATE TABLE TYPETESTS (test blob NULL , id integer NOT NULL)
 INSERT INTO TYPETESTS (test,id) VALUES (X'616263646566',349)
 
 SELECT TEST,ID FROM TYPETESTS WHERE TEST = CAST (X'616263646566' AS blob)
 
 Proper results 100% of the times
 
 SELECT TEST,ID FROM TYPETESTS WHERE TEST = X'616263646566'
 
 Proper results 100% of the times
 
 SELECT TEST,ID FROM TYPETESTS WHERE TEST LIKE X'616263646566'
 
 Proper results 30-50% of the times, no results otherwise
 
 SELECT TEST,ID FROM TYPETESTS WHERE TEST LIKE CAST (X'616263646566' AS blob)
 
 Proper results 30-50% of the times, no results otherwise
 
 Would this be considered an issue worthy of a bug report?
 

The LIKE() function expects '\000'-terminated inputs.  So it
probably will work if you add 00 to the end of your LIKE constant.

Using a blob as the right argument to LIKE seems kind of an odd
thing to do.  I'm not much motivated to fix this.
--
D. Richard Hipp   [EMAIL PROTECTED]



Re: [sqlite] updating SQLite to implement The Third Manifesto

2006-03-10 Thread Darren Duncan

At 6:52 PM -0800 3/10/06, Roger Binns wrote:

Only for some applications.  It would be harder to use for
my apps.  Specifically manifest typing as currently implemented
in SQLite is a perfect match for apps writen in Python (which
also uses manifest typing).  It would require a lot more code
to go through and force the data type for each column.


If that is so, then I would argue that any need to write more code 
isn't tied to manifest typed programming languages themselves, but 
specific programs themselves; depending on how you code your 
applications, it wouldn't require any significant amount more code. 
In fact, particularly for queries (and reading data tends to be more 
common than writing), there should be less code.  Or, looking at this 
another way, perhaps the Python bindings for SQLite should be taking 
care of this for you.  Or, put another way, I would say this is a 
small price to pay for what is gained.  Or, I doubt there actually is 
more work.


(But I don't really want to get into an argument on this point, as 
there are many other points in my proposal which I see as being of 
greater importance.)


But regardless, I have an additional idea which may help bridge the 
gap and work well for people.


That is, while the database itself is strongly typed, you can have a 
specific type which is defined to manifestly be able to store values 
from any of a variety of simpler types.


So for example, SQLite could have these types:

- Boolean
- Integer
- Real
- Text
- Blob
- Scalar

The first 5 are simple types that store just numbers or text or whatever.

The last, new 1, Scalar, is an actually-strong type which is defined 
as being able to store any of the first 5 types (just one at a time), 
and hence acts like a weak type.


In a conceptual sense, a Scalar value is like a composite type with 6 
member elements, each of the last 5 being strongly typed as one of 
the first 5 simple types, and the first element being an enum which 
says which of the other 5 holds the over-all current value.


I believe something like this is what manifestly typed languages 
actually do behind the scenes, having a multi-element struct where 
one element says how to treat the other one(s).  I know Perl does 
this, with its SV C-struct, and I'm sure other languages do similar. 
I know SQLite does something similar, if you look at its design spec.


(Sure, that sounds more complicated, but then the actual work being 
done to support manifest typing *is* more complicated.  Things look 
more like they are.)


So if SQLite does it this way, then you can declare columns to be the 
Scalar type when you want them to hold anything, and one of the other 
types if you don't.  Moreover, the effectively manifest typed Scalar 
is what you would get if you don't explicitly declare a type for a 
column.  This happens already, but now the what you get actually 
has a name.


The point is that you still get well defined behaviour that is 
specific to the declared data type(s) you choose to use, and you can 
count on its being consistent.


-- Darren Duncan


Re: [sqlite] updating SQLite to implement The Third Manifesto

2006-03-10 Thread Roger Binns

Just for the record:

it wouldn't require any significant amount more code. 


Yes it would.  My code currently approximates to this:

 cursor.execute(insert into foo (x,y,z) values(?,?,?), x,y,z)

It would have to change into this:

 # column x is defined as string
 if isinstance(x, string): storex=x
 elif isinstance(x, int): storex=`x`
 elif isinstance(x, bool): 
   if x: storex=1 else: storex=0

 else # various other types and conditions for this context
 # repeat for y and z
 
 # add in values
 cursor.execute(insert into foo (x,y,z) values(?,?,?), storex, storey, storez)
  
In fact, particularly for queries (and reading data tends to be more 
common than writing), there should be less code.  


 Currently:

 x,y,z=cursor.execute(select x,y,z from foo)

 Change to:
 
 resultx, resulty, resultz=cursor.execute(select x,y,z from foo)

 # now do type conversion back to the correct types
 if # somehow know it may have been an int:
x=int(resultx)
 elif # somehow know it may have been a bool:
x=bool(int(resultx))
 ... you get the picture ...

Not having manifest types in the database throws away information 
when you store values and requires restituting them when reading.

My main app happens to store phone numbers.  You won't believe
how irritating it is when I find things automatically assume they
are integers.

Or, looking at this 
another way, perhaps the Python bindings for SQLite should be taking 
care of this for you.  


They can't, unless they do something like silenty add an extra
column that stores the types of the values in the other columns
and attempt to transparently modify the SQL as it flys by 
to get or update that column.  (BTW I also happen to be an author

of wrappers for Python).  (Your proposal sort of does this
by introducing a manifest type.)

The solution you discussed will make code that has to deal with
random SQLite databases produced by other applications, or earlier
versions of the same app significantly more complicated.  However
I don't think there are too many of them to be concerned about.

I would suggest finding an open source application that uses
SQLite and see if you would indeed make it simpler.  One good
example I would suggest is Trac which was originally written
to use SQLite.

Roger


RE: [sqlite] updating SQLite to implement The Third Manifesto

2006-03-10 Thread Marian Olteanu
You're right Darren, but the problem is that we're not in a DB class. We
cannot tell people who have a solution for their problems that your
solution is wrong. You need to reimplement your stuff to make it right.
Most of SQLite users are practical people, and all they want is their
problem to be solved. They don't really care if the SQL language is
implementing correctly relational algebra or not.


-Original Message-
From: Darren Duncan [mailto:[EMAIL PROTECTED] 
Sent: Friday, March 10, 2006 11:21 PM
To: sqlite-users@sqlite.org
Cc: [EMAIL PROTECTED]
Subject: Re: [sqlite] updating SQLite to implement The Third Manifesto

At 6:52 PM -0800 3/10/06, Roger Binns wrote:
Only for some applications.  It would be harder to use for
my apps.  Specifically manifest typing as currently implemented
in SQLite is a perfect match for apps writen in Python (which
also uses manifest typing).  It would require a lot more code
to go through and force the data type for each column.

If that is so, then I would argue that any need to write more code 
isn't tied to manifest typed programming languages themselves, but 
specific programs themselves; depending on how you code your 
applications, it wouldn't require any significant amount more code. 
In fact, particularly for queries (and reading data tends to be more 
common than writing), there should be less code.  Or, looking at this 
another way, perhaps the Python bindings for SQLite should be taking 
care of this for you.  Or, put another way, I would say this is a 
small price to pay for what is gained.  Or, I doubt there actually is 
more work.

(But I don't really want to get into an argument on this point, as 
there are many other points in my proposal which I see as being of 
greater importance.)

But regardless, I have an additional idea which may help bridge the 
gap and work well for people.

That is, while the database itself is strongly typed, you can have a 
specific type which is defined to manifestly be able to store values 
from any of a variety of simpler types.

So for example, SQLite could have these types:

- Boolean
- Integer
- Real
- Text
- Blob
- Scalar

The first 5 are simple types that store just numbers or text or whatever.

The last, new 1, Scalar, is an actually-strong type which is defined 
as being able to store any of the first 5 types (just one at a time), 
and hence acts like a weak type.

In a conceptual sense, a Scalar value is like a composite type with 6 
member elements, each of the last 5 being strongly typed as one of 
the first 5 simple types, and the first element being an enum which 
says which of the other 5 holds the over-all current value.

I believe something like this is what manifestly typed languages 
actually do behind the scenes, having a multi-element struct where 
one element says how to treat the other one(s).  I know Perl does 
this, with its SV C-struct, and I'm sure other languages do similar. 
I know SQLite does something similar, if you look at its design spec.

(Sure, that sounds more complicated, but then the actual work being 
done to support manifest typing *is* more complicated.  Things look 
more like they are.)

So if SQLite does it this way, then you can declare columns to be the 
Scalar type when you want them to hold anything, and one of the other 
types if you don't.  Moreover, the effectively manifest typed Scalar 
is what you would get if you don't explicitly declare a type for a 
column.  This happens already, but now the what you get actually 
has a name.

The point is that you still get well defined behaviour that is 
specific to the declared data type(s) you choose to use, and you can 
count on its being consistent.

-- Darren Duncan



Re: [sqlite] updating SQLite to implement The Third Manifesto

2006-03-10 Thread Andrew Piskorski
On Fri, Mar 10, 2006 at 04:37:36PM -0800, Darren Duncan wrote:

 3.  There is no such thing as a NULL.
 
 3.1  All logic is 2VL (true, false) not 3VL (true, false, unknown).

There is no such thing as null, really?  So, when you do an outer join
between two tables, which in SQL would produce null columns in the
result set, what do YOU propose producing instead of those nulls?

Perhaps I missed it, but in my brief reading of some of Date's work, I
never saw him answer that question.

 3.4  Missing data can be either represented with the data type's 
 empty value, or a table column that may possibly be unknown can be 
 split into a separate related table, that only has records when the 
 value is known.
 
 3.5  All variables default to a reasonable valid value for their type 
 if not explicitly set, such as the number zero or the empty string.

-- 
Andrew Piskorski [EMAIL PROTECTED]
http://www.piskorski.com/