[sqlite] Re: Optimising Large Tables by Splitting 1:1
The question is more to do with theoretical optimisation rather than improving the speed of that specific database (which was used only for benchmarks), also note that the 700 record difference was based on around 5000 queries per second (definitly not a margin of error (this is a problem with all tables that have a large number of fields, reguardless of the database or software)). In theory lets say I have hundreds of databases running on the same server hence I want to optimise the methods in which the database is created and queries are performed (keep in mind that that the 14% increase in processing does eventually add up). Consider the following as the steps taken to optimise an old database: 1. Start with a database that contains 1:1 Relationships, some of which require joining. The problem here is that a join is one of the slowest queries, hence these 1:1's must be combined. 2. Merge all 1:1 Relationships into 1 Table Result: 1 Large table with n number of fields. Problem: As a result fields in the tail end of the table are slow to query. 3. Break up the table as per how it is used (so long the new tables do not need to be joined) Result: Small Tables that are do not require joining. Problems: ?? As we all know seeking through any file takes time, by combining large 1:1 relationships you remove joins but create tables with a large number of fields.. which itself has the problems of seeking to fields in the tail end. What I am asking is more to do with design procedure rather than the database system itself, is the solution I have shown for 1:1 the correct method for solving the seek problems or does the solution itself present unseen problems. James Austin
[sqlite] Re: Optimising Large Tables by Splitting 1:1
The question is more to do with theoretical optimisation rather than improving the speed of that specific database (which was used only for benchmarks), also note that the 700 record difference was based on around 5000 queries per second (definitly not a margin of error (this is a problem with all tables that have a large number of fields, reguardless of the database or software)). In theory lets say I have hundreds of databases running on the same server hence I want to optimise the methods in which the database is created and queries are performed (keep in mind that that the 14% increase in processing does eventually add up). Consider the following as the steps taken to optimise an old database: 1. Start with a database that contains 1:1 Relationships, some of which require joining. The problem here is that a join is one of the slowest queries, hence these 1:1's must be combined. 2. Merge all 1:1 Relationships into 1 Table Result: 1 Large table with n number of fields. Problem: As a result fields in the tail end of the table are slow to query. 3. Break up the table as per how it is used (so long the new tables do not need to be joined) Result: Small Tables that are do not require joining. Problems: ?? As we all know seeking through any file takes time, by combining large 1:1 relationships you remove joins but create tables with a large number of fields.. which itself has the problems of seeking to fields in the tail end. What I am asking is more to do with design procedure rather than the database system itself, is the solution I have shown for 1:1 the correct method for solving the seek problems or does the solution itself present unseen problems. James Austin
[sqlite] Re: Optimising Large Tables by Splitting 1:1
The question is more to do with theoretical optimisation rather than improving the speed of that specific database (which was used only for benchmarks), also note that the 700 record difference was based on around 5000 queries per second (definitly not a margin of error (this is a problem with all tables that have a large number of fields, reguardless of the database or software)). In theory lets say I have hundreds of databases running on the same server hence I want to optimise the methods in which the database is created and queries are performed (keep in mind that that the 14% increase in processing does eventually add up). Consider the following as the steps taken to optimise an old database: 1. Start with a database that contains 1:1 Relationships, some of which require joining. The problem here is that a join is one of the slowest queries, hence these 1:1's must be combined. 2. Merge all 1:1 Relationships into 1 Table Result: 1 Large table with n number of fields. Problem: As a result fields in the tail end of the table are slow to query. 3. Break up the table as per how it is used (so long the new tables do not need to be joined) Result: Small Tables that are do not require joining. Problems: ?? As we all know seeking through any file takes time, by combining large 1:1 relationships you remove joins but create tables with a large number of fields.. which itself has the problems of seeking to fields in the tail end. What I am asking is more to do with design procedure rather than the database system itself, is the solution I have shown for 1:1 the correct method for solving the seek problems or does the solution itself present unseen problems. James Austin
[sqlite] Re: Optimising Large Tables by Splitting 1:1
The question is more to do with theoretical optimisation rather than improving the speed of that specific database (which was used only for benchmarks), also note that the 700 record difference was based on around 5000 queries per second (definitly not a margin of error (this is a problem with all tables that have a large number of fields, reguardless of the database or software)). In theory lets say I have hundreds of databases running on the same server hence I want to optimise the methods in which the database is created and queries are performed (keep in mind that that the 14% increase in processing does eventually add up). Consider the following as the steps taken to optimise an old database: 1. Start with a database that contains 1:1 Relationships, some of which require joining. The problem here is that a join is one of the slowest queries, hence these 1:1's must be combined. 2. Merge all 1:1 Relationships into 1 Table Result: 1 Large table with n number of fields. Problem: As a result fields in the tail end of the table are slow to query. 3. Break up the table as per how it is used (so long the new tables do not need to be joined) Result: Small Tables that are do not require joining. Problems: ?? As we all know seeking through any file takes time, by combining large 1:1 relationships you remove joins but create tables with a large number of fields.. which itself has the problems of seeking to fields in the tail end. What I am asking is more to do with design procedure rather than the database system itself, is the solution I have shown for 1:1 the correct method for solving the seek problems or does the solution itself present unseen problems. James Austin
[sqlite] terrible behavior
hi guys i need help :-) what is the problem : (SQLite version 3.3.4, win XP) i have a table: CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid INTEGER); INSERT INTO plan VALUES(1,1,0); INSERT INTO plan VALUES(2,1,0); INSERT INTO plan VALUES(3,1,0); INSERT INTO plan VALUES(4,1,NULL); INSERT INTO plan VALUES(5,1,NULL); INSERT INTO plan VALUES(6,1,NULL); INSERT INTO plan VALUES(7,1,NULL); INSERT INTO plan VALUES(8,1,NULL); -- ... and i need to select all rows with statid = 0 so the query is : SELECT statid FROM plan P1 WHERE P1.curseid = 0; BUT what a surprise, i got an empty result (i expect to get rows No 1 to 3) ! any suggestions? regards Emily - “БЯЛА КНИГА на българските телекомуникации, услуги и технологии” Регистрирай се и я изтегли БЕЗПЛАТНО! www.jnn-marketing.com/WPTelecom.htm
Re: [sqlite] terrible behavior
[EMAIL PROTECTED] a écrit : hi guys i need help :-) what is the problem : (SQLite version 3.3.4, win XP) i have a table: CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid INTEGER); INSERT INTO plan VALUES(1,1,0); INSERT INTO plan VALUES(2,1,0); INSERT INTO plan VALUES(3,1,0); INSERT INTO plan VALUES(4,1,NULL); INSERT INTO plan VALUES(5,1,NULL); INSERT INTO plan VALUES(6,1,NULL); INSERT INTO plan VALUES(7,1,NULL); INSERT INTO plan VALUES(8,1,NULL); -- ... and i need to select all rows with statid = 0 so the query is : SELECT statid FROM plan P1 WHERE P1.curseid = 0; BUT what a surprise, i got an empty result (i expect to get rows No 1 to 3) ! any suggestions? Yes : SELECT P1.statid FROM plan P1 WHERE P1.statid = 0; instead of your query... but you'll get only what you asked for : only O ;) -- Guillaume MAISON - [EMAIL PROTECTED] 83, Cours Victor Hugo 47000 AGEN Tél : 05 53 87 91 48 - Fax : 05 53 68 73 50 e-mail : [EMAIL PROTECTED] - Web : http://nauteus.com
Re: [sqlite] sqlite performance with sizeable tables
On Thu, 9 Mar 2006 [EMAIL PROTECTED] wrote: - Original Message From: Christian Smith [EMAIL PROTECTED] Is this a likely usage scenario? Will your application regularly umount/mount the filesystem between transactions? While sounding facetious, I'm not trying to. Your otherwise excellent example is let down by a probably unreasonable usage pattern. My example was just that, a test case trying to explain why my actual app goes to a crowl. The umount/mount is there to insure that I am not benchmarking the OS caching abilities. But SQLite depends on the OS caching abilities for much of it's performance. Removing it is like saying SQLite is rubbish on Intel processors after testing on a i486. What this does is that it gives a better idea on the performance of the caching of sqlite. The SQLite cache is very restricted. It only caches data over a single transaction, discarding it on COMMIT. This is because another process may update the file between transactions. The file format has a file generation number, which could be used to delay the discarding of the SQLite cache between transactions, but I believe it is as yet unimplemented. The pages containing nodes closer to the root of the affected btrees will be cached pretty quickly, but the random access pattern and constant random updates will make for very non-localised updates to the DB file, which would require lots of seeks on sync. You might be well advised to look at how long it takes to do the updates, then the COMMIT seperately. I'd hazard a guess that you'll be spending the majority of the time in the COMMIT commands. Something like: All right, so I added the intermediate timing to see where the time is spend. My results show that on the typical 7 seconds operation (after the numbers stabilise) 2 seconds are spend between the BEGIN / END and 5 seconds to perform the COMMIT. The first transactions show things like: When I saw this, what I tried later on was to play around with PRAGMA synchronous =OFF and PRAGMA synchronous = NORMAL While synchronous = NORMAL didn't show any real improvement on the timings compared to FULL full async mode was dramatically faster: the random test that was taking 600+ seconds to execute now runs in 172 seconds (and running the global sync after the test only took a couple seconds, meaning most data was already written on disk). I looked at timings, and now the time is basically spend between the BEGIN/END, the COMMIT is very fast. I was very excited about this result, so I tried async access on my actual app (that holds 8M+ records), but I was disapointed to see that async actually even slow things down?! By much? This surprises me. Perhaps there is another bottleneck in the app that isn't covered by this test case? Async access isn't recommended anyway, as it leaves data vulnerable to OS crashes or power failure. I also tried to mount the filesystem (reiser) in journal mode (rw,noatime,data=journal), I had it in writeback, and things get slower with this setting. So what I tried next was to simply run my app after doing a cat generaldb.db /dev/null (this takes about 15 seconds on my system, it's a 600Megs file) This has the effect of preloading the db file in memory in the OS caches. After I do this, the app becomes very fast (as in 4 times faster). note: I noticed that even if I specify some ridiculous number like PRAGMA cache_size = 45 (that should be enough to cache the whole db by the way) for the cache, the memory footprint of my app stays pretty low, as if it was not caching much!? So at this point, I think the caching/polling subsytem seems to be the problem. See above. The cache lives as long as the transaction. Oh also, a way to improve performance: reads can be reordered, so if sqlite was doing async reads of several places in the db file at the same time, the OS/drive would reorder things the same way they are reordered when writing for a good boost in performance. This would add complication to SQLite. We depend on the OS cache to shield us from having to do async IO. It's difficult to share SIGIO, so an app that also did async IO would have problems with SQLite. In general, read performance is not a problem in SQLite. The biggest bottleneck is transaction COMMIT, due to the synchronous nature of the operation. General observations/questions: - What did you expect from SQLite? Can you give indication of performance expectations? - What did you use previously? How does SQLite compare on performance? - Have you tested other databases? What sort of performance did you get from those? - If you're not batching, or can't batch updates, then SQLite may very well not be performant enough due to inherent restrictions of it's architecture. - An idea of what your app does in general would be of assistance in helping you find your bottleneck. A specific test case is good if this is indeed the issue, but it doesn't sound like it is. Nicolas
Re: [sqlite] terrible behavior
thanks actualy the problem was in the sqlite explorer because the column was defined as integer and any text is interpreted as zero :-) Цитат на писмо от Guillaume MAISON [EMAIL PROTECTED]: [EMAIL PROTECTED] a ?crit : hi guys i need help :-) what is the problem : (SQLite version 3.3.4, win XP) i have a table: CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid INTEGER); INSERT INTO plan VALUES(1,1,0); INSERT INTO plan VALUES(2,1,0); INSERT INTO plan VALUES(3,1,0); INSERT INTO plan VALUES(4,1,NULL); INSERT INTO plan VALUES(5,1,NULL); INSERT INTO plan VALUES(6,1,NULL); INSERT INTO plan VALUES(7,1,NULL); INSERT INTO plan VALUES(8,1,NULL); -- ... and i need to select all rows with statid = 0 so the query is : SELECT statid FROM plan P1 WHERE P1.curseid = 0; BUT what a surprise, i got an empty result (i expect to get rows No 1 to 3) ! any suggestions? Yes : SELECT P1.statid FROM plan P1 WHERE P1.statid = 0; instead of your query... but you'll get only what you asked for : only O ;) -- Guillaume MAISON - [EMAIL PROTECTED] 83, Cours Victor Hugo 47000 AGEN T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50 e-mail : [EMAIL PROTECTED] - Web : http://nauteus.com - Slon.bg ™ Симпатичният магазин за книги, DVD, игри и музика http://www.slon.bg
Re: [sqlite] terrible behavior
[EMAIL PROTECTED] wrote: thanks actualy the problem was in the sqlite explorer because the column was defined as integer and any text is interpreted as zero :-) no, actually i think Guillaume was correct. This is the query you posted: SELECT statid FROM plan P1 WHERE P1.curseid = 0; Note in your data that NO curseid is equal to 0. They are all set to 1. [EMAIL PROTECTED] a ?crit : hi guys i need help :-) what is the problem : (SQLite version 3.3.4, win XP) i have a table: CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid INTEGER); INSERT INTO plan VALUES(1,1,0); INSERT INTO plan VALUES(2,1,0); INSERT INTO plan VALUES(3,1,0); INSERT INTO plan VALUES(4,1,NULL); INSERT INTO plan VALUES(5,1,NULL); INSERT INTO plan VALUES(6,1,NULL); INSERT INTO plan VALUES(7,1,NULL); INSERT INTO plan VALUES(8,1,NULL); -- ... and i need to select all rows with statid = 0 so the query is : SELECT statid FROM plan P1 WHERE P1.curseid = 0; BUT what a surprise, i got an empty result (i expect to get rows No 1 to 3) ! any suggestions? Yes : SELECT P1.statid FROM plan P1 WHERE P1.statid = 0; instead of your query... but you'll get only what you asked for : only O ;) -- Guillaume MAISON - [EMAIL PROTECTED] 83, Cours Victor Hugo 47000 AGEN T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50 e-mail : [EMAIL PROTECTED] - Web : http://nauteus.com - Slon.bg ™ Симпатичният магазин за книги, DVD, игри и музика http://www.slon.bg
Re: [sqlite] Optimising Large Tables by Splitting 1:1
On 3/10/06, James Austin [EMAIL PROTECTED] wrote: I've been doing a fair bit of research into optimising larger tables and just want a bit of feedback (most of my lecturers are giving conflicting answers). Each database has it's own strengths and weeknesses and they may all have different performances. So the correct answer may be different based on what database is used.
Re: [sqlite] Optimising Large Tables by Splitting 1:1
On Fri, 10 Mar 2006, James Austin wrote: I've been doing a fair bit of research into optimising larger tables and just want a bit of feedback (most of my lecturers are giving conflicting answers). Just say I have a table with 200+ fields (all relate uniquly to the primary key), when querying the first 10 fields of a table I get just as fast a response (about the same as if I were querying a table with the same number of rows but only 10 fields) however as I get further into the table (i.e. selecting fields 180..190 there seems to be a noticable decrease in speed (by noticable I mean around 750 benchmark queries per second slower)). Have a read of: http://www.sqlite.org/php2004/page-001.html http://www.sqlite.org/php2004/slides-all.html Tables are implemented in a Btree, and described from slide 34 onwards. The first portion of a row is stored along with other rows in a btree leaf page. If the row doesn't fit, it spills into overflow pages, that are chained using a singly linked list of page pointers (overflow pages store data from a single row.) Thus, to find column 180..190 in the above example, first the row must be located, then the linked list traversed to find the pages with the desired columns. As a measure to increase the speed I was thinking that a viable option would be to break up the table based on how it is queried, such that sets of fields will be divided into new tables that contain fields relating to the query (so long as any fields that may be used in multiple queries (and require rejoining later) are not separated as this is much slower than keeping all fields in one table to begin with). The main question is, is the method listed above the best way to improve the speed of a large table or should they all remain in the same table as splitting may cause other problems later on. My question is, why do you need so many columns in a row? Is this a hypothetical example, or a real world application? One method suggested by one of my lectures was to leave the table as one and use views, however after testing this out I found views slower than querying the table large table directly (and seeing that they don't increase the table size it leads me to believe that a view is simply an alias to a query). Views won't help if your data is so inefficiently laid out. Views are indeed an alias to a query, and very useful for common queries and transforming data. You'd be better breaking up your rows into more managable tables, and using views to create a compatible view of your old data. If this database is already normalised, however, god help you I say! However, if this is purely research, it may be better directed elsewhere, as 200+ column rows are rare even in the most badly run government agencies. Any feedback is appreciated, My 2c (hmm, I'm spending a lot recently) James Austin. Christian -- /\ \ /ASCII RIBBON CAMPAIGN - AGAINST HTML MAIL X - AGAINST MS ATTACHMENTS / \
[sqlite] Char versus Float versus Int as primary key
I'm looking to enhance query speed. Does it matter what variable type I use as the primary key? I'm thinking numbers might be faster than strings, but I'm not sure. Thanks!
[sqlite] Another speed question....
I assume it is faster to key on only one variable rather than two? Thanks
Re: [sqlite] Char versus Float versus Int as primary key
On 3/10/06, cstrader [EMAIL PROTECTED] wrote: I'm looking to enhance query speed. Does it matter what variable type I use as the primary key? I'm thinking numbers might be faster than strings, but I'm not sure. Thanks! Try it and see :)
[sqlite] Char versus Float versus Int as primary key
Jay... I could do that and maybe will. But does anyone know whether it SHOULD matter? Or are you saying that there are no general principles in this regard? Thanks..
Re: [sqlite] Char versus Float versus Int as primary key
Jay... I could do that and maybe will. But does anyone know whether it SHOULD matter? Or are you saying that there are no general principles in this regard? I would think the difference would be pretty minimal but I don't know for sure. It would seem to be quicker to test than to wait for someone else to look it up for you.
Re: [sqlite] terrible behavior
yes you are correct, but even when i rewrite the typo, there was no result (i got an empty one) i mean with the originald table, not the present in letter (i got it with sqlite explorer export function) but when i select all from the db with sqlite.exe i saw, that in this column the data was ... NULLs and some strings (garbage). Now i put corect data (numbers) and all is ok - so 10x ! regards e. Цитат на писмо от Reid Thompson [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: thanks actualy the problem was in the sqlite explorer because the column was defined as integer and any text is interpreted as zero :-) no, actually i think Guillaume was correct. This is the query you posted: SELECT statid FROM plan P1 WHERE P1.curseid = 0; Note in your data that NO curseid is equal to 0. They are all set to 1. [EMAIL PROTECTED] a ?crit : hi guys i need help :-) what is the problem : (SQLite version 3.3.4, win XP) i have a table: CREATE TABLE plan(personid INTEGER, curseid INTEGER, statid INTEGER); INSERT INTO plan VALUES(1,1,0); INSERT INTO plan VALUES(2,1,0); INSERT INTO plan VALUES(3,1,0); INSERT INTO plan VALUES(4,1,NULL); INSERT INTO plan VALUES(5,1,NULL); INSERT INTO plan VALUES(6,1,NULL); INSERT INTO plan VALUES(7,1,NULL); INSERT INTO plan VALUES(8,1,NULL); -- ... and i need to select all rows with statid = 0 so the query is : SELECT statid FROM plan P1 WHERE P1.curseid = 0; BUT what a surprise, i got an empty result (i expect to get rows No 1 to 3) ! any suggestions? Yes : SELECT P1.statid FROM plan P1 WHERE P1.statid = 0; instead of your query... but you'll get only what you asked for : only O ;) -- Guillaume MAISON - [EMAIL PROTECTED] 83, Cours Victor Hugo 47000 AGEN T?l : 05 53 87 91 48 - Fax : 05 53 68 73 50 e-mail : [EMAIL PROTECTED] - Web : http://nauteus.com - Slon.bg ™ Симпатичният магазин за книги, DVD, игри и музика http://www.slon.bg
[sqlite] Copatibility whith MySQL
Hello! My name is Bastian from Germany and I am new to this List. I hope I respect all common rules by sending this message. I usualy work whith MySQL. Now I whant to develope a software for testing reasons. This shall behave like the old MySQL software but work whith SQL Lite. Now my question: How compatible are SQL Lite and MySQL? I sometimes use the MySQL dialect like INSERT INTO sometable SET rowA='123', rowB='324' which is no ANSI-SQL. Is there any tutorial for MySQL-Users that are new to SQL Lite? Kind regards and many greetz! Bastian
Re: [sqlite] recompiled source code, make test failed
yklin [EMAIL PROTECTED] wrote: Hi, I recompiled sqlite (3.2.7) source in RedHat 9 (I upgrade kernel to 2.4.31). After all, make test reports lock test failed, why ? Is there any option in kernel need to turn on ? Do anyone have the same problem ? There are bugs in the way pthreads is implemented in Linux 2.4.x and RedHat9. These bugs cause problems for SQLite when used in multi-threaded applications. Various versions of SQLite attempt to work around those bugs, with varying degrees of success. As far as I know, the latest versions of SQLite work (but I do not have RedHat9 to know for sure.) I would suggest that you upgrade. -- D. Richard Hipp [EMAIL PROTECTED]
Re: [sqlite] Optimization help requested
Pam Greene [EMAIL PROTECTED] wrote: Hi all, I'm working on a system to add full-text indexing on top of SQLite in a semi-automated way. The general idea is that users will call an API to register a document table for indexing, and the system will take care of everything from there. When a row is added to a registered document table, an SQLite trigger calls a C function that parses the text for that document and saves it in a token table, along with some meta-information that's used later by the querying system to retrieve documents matching a given search. Although all this is working, it's awfully slow. I'm fairly new to SQLite, and I'm hoping that some of the gurus out there can give me advice on speeding it up a bit. I did not take the time to read you code closely. But I do have some experience doing full-text search with SQLite. See http://www.sqlite.org/cvstrac/wiki?p=ExperimentalMailUserAgent Some general advice: * Map documents into integers using one table. Map words into integers using a separate vocubulary table. Then create the linkage between documents and words using a single table that has only two integer columns. * Use prepared statements. * Wrap all updates inside a transaction. If you still have performance problems, profile your code to figure out where to optimize. -- D. Richard Hipp [EMAIL PROTECTED]
Re: [sqlite] File locking additions
Helmut Tschemernjak [EMAIL PROTECTED] wrote: The current lock offset will not allow to copy open DB files if the database gets larger than 0x4000 bytes. This is because locked regions cannot be copied under Windows, we changed it to: #define PENDING_BYTE I64CONST(0x7fff) #define RESERVED_BYTE I64CONST(0x7fff0001) #define SHARED_FIRST I64CONST(0x7fff0002) You are, of course, free to changes SQLite in any way you want for your own use. But this change cannot be made in the official SQLite implementation for two reasons: (1) It would break backwards compatibility. (2) It will not work on systems that do not support large files (file bigger than 2GiB) such as many old Unix implementations and Win95. -- D. Richard Hipp [EMAIL PROTECTED]
Re: [sqlite] File locking additions
Christian Smith [EMAIL PROTECTED] wrote: 3. Extend the sqlite3_open commands to support URI style path references in order to specify the file system locking type (as opposed to modifying the arguments list). After a little poking around on RFC 3986 http://www.ietf.org/rfc/rfc3986.txt I'm inclined to specify the locking choice via the query part of the URI. For example: file:///mounts/myhost/dbfile.db?locktype=flock file:///localdisk/otherdbfile.db?locktype=automatic I'd be more inclined to add a PRAGMA. URIs are ugly and a pain to type in, The reason for using a URI is so that you could specify the locking style for databases that you ATTACH. You are not required to use a URI - an ordinary filename will continue to work. You could, perhaps, set the default locking mechanism using a pragma. But it is important to be able to specify alternative locking mechanisms for ATTACHed databases. -- D. Richard Hipp [EMAIL PROTECTED]
Re: [sqlite] sqlite performance with sizeable tables
[EMAIL PROTECTED] wrote: One question though: are the file access sorted, so that seeks are minimised when performing a transaction (making the assumption that the file is not fragmented on disk)? I'm not sure what you are asking. Can you restate your question? -- D. Richard Hipp [EMAIL PROTECTED]
Re: [sqlite] Char versus Float versus Int as primary key
cstrader wrote: I'm looking to enhance query speed. Does it matter what variable type I use as the primary key? I'm thinking numbers might be faster than strings, but I'm not sure. Thanks! Integer primary keys will be the fastest since SQLite optimizes this case by not creating a separate index for the key. The integer key is stored directly as the rowid in the btree used to store the row data. To find a row it looks up the key value in the rowid of the table's btree. For other primary key types, SQLite creates an index that stores the key field and the rowid of the row data in the btree. This index is stored in another btree. To find a row it looks in the index btree to find the rowid of the row that matches the key value, then it looks up the data in the table btree using the rowid it got from the index. I believe that SQLite uses memcmp() to compare the key values for all key types. So there is no benefit to using numeric values over text as long as the text is usually different within the first few characters. It doesn't do floating point compares using floating point instructions for example (which could be done in a single instruction on most modern CPUs). HTH Dennis Cote
[sqlite] updating SQLite to implement The Third Manifesto
All, and particularly Dr. Hipp, Lately, my own database project in Perl, named Rosetta, has evolved to officially be an implementation of Chris Date's and Hugh Darwen's proposal for relational databases called The Third Manifesto, which is talked about at http://www.thethirdmanifesto.com/ and in various books and papers of theirs. Rosetta has its own API which defines a D relational database language, as apposed to a SQL language, and Rosetta's back-ends to implement this are interchangeable. I am making a pure Perl reference implementation (called Rosetta::Engine::Example) which is coded simply for ease of understanding but that is slow. Separately, I or third parties would be making alternate back-ends that either are self-implemented and better performing, or that constitute wrappers over existing database products, usually SQL based, since those are fairly mature and plentiful. SQLite is one of the first such back-ends to be used. Now, I would like to propose, and if necessary I will contribute significant targeted funding (when I have the money) to pay Dr. Hipp and/or other developers, some significant feature changes (as a fork if necessary) to SQLite such that it directly implements and gains the efficiencies of The Third Manifesto. This includes both the addition of and the removal of current features, and certain behaviours would change. Hopefully all for the better. As a result of these changes, not only would SQLite better serve as a back-end of Rosetta, but non-Rosetta users of SQLite would get the most critical of the same benefits from it directly. I anticipate that the changes would mainly affect the upper layers, which convert user commands into virtual machine code, but that the virtual machine and b-tree and OS layers would remain more or less unchanged (this depends, of course, on a few details). Possibly, we would add a new command language. I am hoping that, to keep longer term maintenance easier, these changes can be implemented in the trunk and activated using either run time pragmas or compile time options or both. But if they would require a fork, then the forked product would have to be named something else that doesn't have 'SQL' in its name, since SQL does not satisfy The Third Manifesto. Maybe 'TTMLite' or something that sounds better. Here are some of the changes that I propose the pragma or compile time option or fork would have; they all refer to what the user sees, not to implementation details that should be hidden: 1. Add a distinct logical BOOLEAN data type. It is the data type of output from logical expressions like comparisons, and the input to 'and', 'or', etc. 2. Have strong and strict data typing for both variables and values. 2.1 Table columns are always declared to be of a specific type (eg: BOOLEAN, INTEGER, REAL, TEXT, BLOB) and nothing but values of the same type can be stored in them; attempting to do otherwise would fail with an exception. 2.2 The plain equality test is supported for all data types. 2.3 All operators/functions have strongly typed parameters and return values, and invoking them with arguments that aren't of the right type will fail with an exception. The equality test likewise can only compare operands of the same type. 2.4 There is no implicit type conversion; data types must be explicitly converted from one type to another. 2.5 INTEGER and REAL data types have separate sets of operators, which do the expected thing with their types. For example, each has a separate division operator whose input and output are all of that same type. No worrying about when to round or not. 2.6 SQLite may already be this way, but: All characters in a string are significant, including whitespace, so 'a' and 'a ' are always unequal. 3. There is no such thing as a NULL. 3.1 All logic is 2VL (true, false) not 3VL (true, false, unknown). 3.2 Every variable of a particular type always contains a value that is valid for that type, so logic for dealing with it is simpler. Likewise, every with every literal value. 3.3 The code to implement operators is a lot simpler. 3.4 Missing data can be either represented with the data type's empty value, or a table column that may possibly be unknown can be split into a separate related table, that only has records when the value is known. 3.5 All variables default to a reasonable valid value for their type if not explicitly set, such as the number zero or the empty string. 4. There is no significant hidden data. A row id can only be an explicitly declared table column. The implementation of a table can use hidden row ids, but the user wouldn't see them. 5. No duplicate rows in tables or queries are allowed. 5.1 In SQL terms, every table has an implicit unique key constraint over all of its columns. This is ignored if there are any actual explicit keys, whether primary or otherwise. In TTM terms, it
Re: [sqlite] sqlite performance with sizeable tables
- Original Message From: [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: One question though: are the file access sorted, so that seeks are minimised when performing a transaction (making the assumption that the file is not fragmented on disk)? I'm not sure what you are asking. Can you restate your question? Basically, the db file is accessed with seek + read/write operations. Given a set of such operations, it can be very beneficial to reorder those operations so that the file is accessed going from the begining to the end of the file (and not pure random). This is a problem mostly for reads (writes are reordered when doing a combination of async writes + flush at the end). I am guessing that given the nature of sql (a lot of iterations through keys/indexes), it might be possible to create batches of disk i/o queries: create a table for the next N reads (with offset) and do those reads sorted by offset. In the graphics world (where I come from), it's common practice to at least do prefetching of data (send in batch on the bus), and I believe that the benefits would be similar here. Nicolas
Re: [sqlite] sqlite performance with sizeable tables
- Original Message From: Christian Smith [EMAIL PROTECTED] But SQLite depends on the OS caching abilities for much of it's performance. Removing it is like saying SQLite is rubbish on Intel processors after testing on a i486. yes and no: while it's nice to be able to rely somehow on the OS cache, the OS can not guess what the usage pattern of the app is going to be. Most of the time, the os will guess wrong too: a typical case is random access of a file, the os does readahead for example that is not beneficial in that case (and actually can slow things down). I always try to give the best hints possible to the underlying subsystem (like never rely 100% on the optimizer to know what to do with a bunch of code, because it won't). When dealing with big databases (that don't typically fit in memory), the OS will most likely cache the wrong kind of data, where as the lib could cache important information such as some key offsets in the file (the cache would contain some elements from the btree so that we can resume directly from there). The SQLite cache is very restricted. It only caches data over a single transaction, discarding it on COMMIT. This is because another process may update the file between transactions. Oh I didn't know that the db cache was that short lived!? The file format has a file generation number, which could be used to delay the discarding of the SQLite cache between transactions, but I believe it is as yet unimplemented. Oh I see the cache problem is probably related to multiple process access. In my case I have a multithreaded process using nice locks to access the db (that way I never get busy database too). I was very excited about this result, so I tried async access on my actual app (that holds 8M+ records), but I was disapointed to see that async actually even slow things down?! By much? This surprises me. Perhaps there is another bottleneck in the app that isn't covered by this test case? Well, this test focuses on accessing one database, and my app accesses several. It was not much slower, maybe a 10% penalty, but I was expecting a dramatic gain of performance :) On the other hand, I tried to make better use of the cache: if I run my 1M inserts in 10 transactions of 100,000, things get a bit slower than 100 transactions of 10,000 inserts. I tried one transaction of 1,000,000 inserts and the test app hangs at 100% cpu for over 30 minutes now, not sure about what is going on here. Oh also, a way to improve performance: reads can be reordered, so if sqlite was doing async reads of several places in the db file at the same time, the OS/drive would reorder things the same way they are reordered when writing for a good boost in performance. This would add complication to SQLite. We depend on the OS cache to shield us from having to do async IO. It's difficult to share SIGIO, so an app that also did async IO would have problems with SQLite. In general, read performance is not a problem in SQLite. The biggest bottleneck is transaction COMMIT, due to the synchronous nature of the operation. On unix/linux systems SIGIO is one way of doing async stuff (I never use it because it doesn't play well with threads). Using poll or select will not interfere with other parts of the application. I do believe that batching requests can dramatically improve performance: a typical join will read 2 tables on disk, causing a lot of seeks. Read performance is important when dealing with large files (relative to RAM size), I believe that quite a bit of people use sqlite with multi GB files, so I don't think this is a stupid thing to look into. General observations/questions: - What did you expect from SQLite? Can you give indication of performance expectations? - What did you use previously? How does SQLite compare on performance? I was expecting much better performance that what I am getting right now: before I was using the filesystem as a DB (basically, folders + filenames - data) Namely reiserfs performs pretty well for doing this kind of thing (millions of record). But that was wasting a lot of disk space and hitting all sorts of limits in the various OSes. Tried BDB a few years ago as a replacement, but wedged databases and licensing issues kept us away from it. then I red about sqlite giving performance of the order of 25000 inserts/second. I thought that I could probably get around 1 inserts/second on bigger databases. Right now, sqlite shows performance that is on par with a simple filesystem structure (except it's much faster to backup because traversing a multimilion file structure takes several hours). I was expecting a few things by moving to sqlite: * getting a better ramp up (cold cache) performance than a dump filesystem structure. * having a [much] higher throughput (so that I can combine multiple servers into one), as the records are small and there is no fragmentation of the folders here. * having
Re: [sqlite] updating SQLite to implement The Third Manifesto
In doing all of the above, SQLite should actually be simpler to implement, and it will be easier to use, with more predictable results and fewer bugs. Only for some applications. It would be harder to use for my apps. Specifically manifest typing as currently implemented in SQLite is a perfect match for apps writen in Python (which also uses manifest typing). It would require a lot more code to go through and force the data type for each column. Roger
Re: [sqlite] Issue with LIKE on BLOBs
Boris Popov [EMAIL PROTECTED] wrote: There's a problem with LIKE in queries produces inconsistent results, whereas = works as expected, CREATE TABLE TYPETESTS (test blob NULL , id integer NOT NULL) INSERT INTO TYPETESTS (test,id) VALUES (X'616263646566',349) SELECT TEST,ID FROM TYPETESTS WHERE TEST = CAST (X'616263646566' AS blob) Proper results 100% of the times SELECT TEST,ID FROM TYPETESTS WHERE TEST = X'616263646566' Proper results 100% of the times SELECT TEST,ID FROM TYPETESTS WHERE TEST LIKE X'616263646566' Proper results 30-50% of the times, no results otherwise SELECT TEST,ID FROM TYPETESTS WHERE TEST LIKE CAST (X'616263646566' AS blob) Proper results 30-50% of the times, no results otherwise Would this be considered an issue worthy of a bug report? The LIKE() function expects '\000'-terminated inputs. So it probably will work if you add 00 to the end of your LIKE constant. Using a blob as the right argument to LIKE seems kind of an odd thing to do. I'm not much motivated to fix this. -- D. Richard Hipp [EMAIL PROTECTED]
Re: [sqlite] updating SQLite to implement The Third Manifesto
At 6:52 PM -0800 3/10/06, Roger Binns wrote: Only for some applications. It would be harder to use for my apps. Specifically manifest typing as currently implemented in SQLite is a perfect match for apps writen in Python (which also uses manifest typing). It would require a lot more code to go through and force the data type for each column. If that is so, then I would argue that any need to write more code isn't tied to manifest typed programming languages themselves, but specific programs themselves; depending on how you code your applications, it wouldn't require any significant amount more code. In fact, particularly for queries (and reading data tends to be more common than writing), there should be less code. Or, looking at this another way, perhaps the Python bindings for SQLite should be taking care of this for you. Or, put another way, I would say this is a small price to pay for what is gained. Or, I doubt there actually is more work. (But I don't really want to get into an argument on this point, as there are many other points in my proposal which I see as being of greater importance.) But regardless, I have an additional idea which may help bridge the gap and work well for people. That is, while the database itself is strongly typed, you can have a specific type which is defined to manifestly be able to store values from any of a variety of simpler types. So for example, SQLite could have these types: - Boolean - Integer - Real - Text - Blob - Scalar The first 5 are simple types that store just numbers or text or whatever. The last, new 1, Scalar, is an actually-strong type which is defined as being able to store any of the first 5 types (just one at a time), and hence acts like a weak type. In a conceptual sense, a Scalar value is like a composite type with 6 member elements, each of the last 5 being strongly typed as one of the first 5 simple types, and the first element being an enum which says which of the other 5 holds the over-all current value. I believe something like this is what manifestly typed languages actually do behind the scenes, having a multi-element struct where one element says how to treat the other one(s). I know Perl does this, with its SV C-struct, and I'm sure other languages do similar. I know SQLite does something similar, if you look at its design spec. (Sure, that sounds more complicated, but then the actual work being done to support manifest typing *is* more complicated. Things look more like they are.) So if SQLite does it this way, then you can declare columns to be the Scalar type when you want them to hold anything, and one of the other types if you don't. Moreover, the effectively manifest typed Scalar is what you would get if you don't explicitly declare a type for a column. This happens already, but now the what you get actually has a name. The point is that you still get well defined behaviour that is specific to the declared data type(s) you choose to use, and you can count on its being consistent. -- Darren Duncan
Re: [sqlite] updating SQLite to implement The Third Manifesto
Just for the record: it wouldn't require any significant amount more code. Yes it would. My code currently approximates to this: cursor.execute(insert into foo (x,y,z) values(?,?,?), x,y,z) It would have to change into this: # column x is defined as string if isinstance(x, string): storex=x elif isinstance(x, int): storex=`x` elif isinstance(x, bool): if x: storex=1 else: storex=0 else # various other types and conditions for this context # repeat for y and z # add in values cursor.execute(insert into foo (x,y,z) values(?,?,?), storex, storey, storez) In fact, particularly for queries (and reading data tends to be more common than writing), there should be less code. Currently: x,y,z=cursor.execute(select x,y,z from foo) Change to: resultx, resulty, resultz=cursor.execute(select x,y,z from foo) # now do type conversion back to the correct types if # somehow know it may have been an int: x=int(resultx) elif # somehow know it may have been a bool: x=bool(int(resultx)) ... you get the picture ... Not having manifest types in the database throws away information when you store values and requires restituting them when reading. My main app happens to store phone numbers. You won't believe how irritating it is when I find things automatically assume they are integers. Or, looking at this another way, perhaps the Python bindings for SQLite should be taking care of this for you. They can't, unless they do something like silenty add an extra column that stores the types of the values in the other columns and attempt to transparently modify the SQL as it flys by to get or update that column. (BTW I also happen to be an author of wrappers for Python). (Your proposal sort of does this by introducing a manifest type.) The solution you discussed will make code that has to deal with random SQLite databases produced by other applications, or earlier versions of the same app significantly more complicated. However I don't think there are too many of them to be concerned about. I would suggest finding an open source application that uses SQLite and see if you would indeed make it simpler. One good example I would suggest is Trac which was originally written to use SQLite. Roger
RE: [sqlite] updating SQLite to implement The Third Manifesto
You're right Darren, but the problem is that we're not in a DB class. We cannot tell people who have a solution for their problems that your solution is wrong. You need to reimplement your stuff to make it right. Most of SQLite users are practical people, and all they want is their problem to be solved. They don't really care if the SQL language is implementing correctly relational algebra or not. -Original Message- From: Darren Duncan [mailto:[EMAIL PROTECTED] Sent: Friday, March 10, 2006 11:21 PM To: sqlite-users@sqlite.org Cc: [EMAIL PROTECTED] Subject: Re: [sqlite] updating SQLite to implement The Third Manifesto At 6:52 PM -0800 3/10/06, Roger Binns wrote: Only for some applications. It would be harder to use for my apps. Specifically manifest typing as currently implemented in SQLite is a perfect match for apps writen in Python (which also uses manifest typing). It would require a lot more code to go through and force the data type for each column. If that is so, then I would argue that any need to write more code isn't tied to manifest typed programming languages themselves, but specific programs themselves; depending on how you code your applications, it wouldn't require any significant amount more code. In fact, particularly for queries (and reading data tends to be more common than writing), there should be less code. Or, looking at this another way, perhaps the Python bindings for SQLite should be taking care of this for you. Or, put another way, I would say this is a small price to pay for what is gained. Or, I doubt there actually is more work. (But I don't really want to get into an argument on this point, as there are many other points in my proposal which I see as being of greater importance.) But regardless, I have an additional idea which may help bridge the gap and work well for people. That is, while the database itself is strongly typed, you can have a specific type which is defined to manifestly be able to store values from any of a variety of simpler types. So for example, SQLite could have these types: - Boolean - Integer - Real - Text - Blob - Scalar The first 5 are simple types that store just numbers or text or whatever. The last, new 1, Scalar, is an actually-strong type which is defined as being able to store any of the first 5 types (just one at a time), and hence acts like a weak type. In a conceptual sense, a Scalar value is like a composite type with 6 member elements, each of the last 5 being strongly typed as one of the first 5 simple types, and the first element being an enum which says which of the other 5 holds the over-all current value. I believe something like this is what manifestly typed languages actually do behind the scenes, having a multi-element struct where one element says how to treat the other one(s). I know Perl does this, with its SV C-struct, and I'm sure other languages do similar. I know SQLite does something similar, if you look at its design spec. (Sure, that sounds more complicated, but then the actual work being done to support manifest typing *is* more complicated. Things look more like they are.) So if SQLite does it this way, then you can declare columns to be the Scalar type when you want them to hold anything, and one of the other types if you don't. Moreover, the effectively manifest typed Scalar is what you would get if you don't explicitly declare a type for a column. This happens already, but now the what you get actually has a name. The point is that you still get well defined behaviour that is specific to the declared data type(s) you choose to use, and you can count on its being consistent. -- Darren Duncan
Re: [sqlite] updating SQLite to implement The Third Manifesto
On Fri, Mar 10, 2006 at 04:37:36PM -0800, Darren Duncan wrote: 3. There is no such thing as a NULL. 3.1 All logic is 2VL (true, false) not 3VL (true, false, unknown). There is no such thing as null, really? So, when you do an outer join between two tables, which in SQL would produce null columns in the result set, what do YOU propose producing instead of those nulls? Perhaps I missed it, but in my brief reading of some of Date's work, I never saw him answer that question. 3.4 Missing data can be either represented with the data type's empty value, or a table column that may possibly be unknown can be split into a separate related table, that only has records when the value is known. 3.5 All variables default to a reasonable valid value for their type if not explicitly set, such as the number zero or the empty string. -- Andrew Piskorski [EMAIL PROTECTED] http://www.piskorski.com/