RE: [sqlite] An explanation?
Thanks Dennis, Is it that when a Integer column of a table is defined as primary key, the it will be part of every index table (rather than rowid) defined on that table? How does it work when we define a non integer as primary key. Assuming in the example given below if we make the Title column as primary key and create index on Id, how does it affect the performance of the two queries? Regards, Phanisekahr From: Dennis Cote [mailto:[EMAIL PROTECTED] Sent: Thu 4/26/2007 7:52 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] An explanation? B V, Phanisekhar wrote: > Thanks for that Info. > > I have another question: > > Assume I have a table given below > "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName > String)" > "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON TitleName" > > Now since Id is an integer and a primary key, this will work as rowid > internally. > > I have two queries that needs to be optimized: > > 1 Select TitleName from Title where Id = ? > 2 Select Id from Title where TitleName = ? > > In order to make the previous two queries optimized, how should I > declare my Table and Index? > > Should it be: > > 1 > "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName > String)" > "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON (TitleName, Id)" > > 2 > The one which I assumed > > Which one of these will give the better performance for the two queries? > Or is there any other alternative that will give even better > performance? > > Regards, > Phanisekhar > Phanisekhar, Your original index definition is all that is needed. The index already contains the rowid for the table record, which happens to be the column id because of the integer primary key optimization. There is nothing to be gained by adding it to the index again. Your first query will be satisfied by a binary search in the title table looking for the id. It won't use the index. Your second query will be satisfied by a binary search in the TitleIdx index looking for a matching title. It won't use the Title table. HTH Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
B V, Phanisekhar wrote: Thanks for that Info. I have another question: Assume I have a table given below "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName String)" "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON TitleName" Now since Id is an integer and a primary key, this will work as rowid internally. I have two queries that needs to be optimized: 1 Select TitleName from Title where Id = ? 2 Select Id from Title where TitleName = ? In order to make the previous two queries optimized, how should I declare my Table and Index? Should it be: 1 "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName String)" "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON (TitleName, Id)" 2 The one which I assumed Which one of these will give the better performance for the two queries? Or is there any other alternative that will give even better performance? Regards, Phanisekhar Phanisekhar, Your original index definition is all that is needed. The index already contains the rowid for the table record, which happens to be the column id because of the integer primary key optimization. There is nothing to be gained by adding it to the index again. Your first query will be satisfied by a binary search in the title table looking for the id. It won't use the index. Your second query will be satisfied by a binary search in the TitleIdx index looking for a matching title. It won't use the Title table. HTH Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] An explanation?
Thanks for that Info. I have another question: Assume I have a table given below "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName String)" "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON TitleName" Now since Id is an integer and a primary key, this will work as rowid internally. I have two queries that needs to be optimized: 1 Select TitleName from Title where Id = ? 2 Select Id from Title where TitleName = ? In order to make the previous two queries optimized, how should I declare my Table and Index? Should it be: 1 "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName String)" "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON (TitleName, Id)" 2 The one which I assumed Which one of these will give the better performance for the two queries? Or is there any other alternative that will give even better performance? Regards, Phanisekhar -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, April 26, 2007 3:42 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] An explanation? "B V, Phanisekhar" <[EMAIL PROTECTED]> wrote: > How does the index table looks? > > Assume the main table to be: > CREATE TABLE table1 (a INTEGER, b INTEGER) > Assume there is an index on column a: > CREATE INDEX index1 ON table1 (a); > > Now let's suppose the entries in table1 be: > 10, 91 >9, 56 > 89, 78 > 34, 12 > 99, 26 > 19, 77 > 44, 62 > 59, 55 Each table entry also has a hidden ROWID. Let's assume that the rowids are sequential. Then your data is really this: 1, 10, 91 2, 9, 56 3, 89, 78 4, 34, 12 5, 99, 26 6, 19, 77 7, 44, 62 8, 59, 55 Here the rowids are sequential. That do not have to be. But they do have to be unique and in increasing order. Because the rowids are ordered, we can do a binary search to quickly find an entry with a particular rowid. > > Corresponding to this table1 how will index table be? > The index on table1(a) consists of all table1.a values followed by their corresponding rowid, in increasing order: 9, 2 10, 1 19, 6 34, 4 44, 7 59, 8 89, 3 99, 5 > If each data value was unique, then one index lookup would find the > matching record. Can you explain how this is? Doesn't it will do binary > search on index table? > When you do: SELECT b FROM table1 WHERE a=34; SQLite first does a binary search on the index to find the entry where a==34. From this entry it discovers the rowid. rowid=4. Then it does a binary search on the table using rowid=4 to find the corresponding entry in the table. From that entry it sees that b=12. So in this case, SQLite has to do two separate binary searches, one on the index and another on the table. If, however, you declare your index like this: CREATE INDEX index1 ON table1(a, b); Then the index will look like this: 9, 56, 2 10, 91, 1 19, 77, 6 34, 12, 4 44, 62, 7 59, 55, 8 89, 78, 3 99, 26, 5 With this two-column index, if you repeat the same query SELECT b FROM table1 WHERE a=34 Then SQLite begins as it did before by doing a binary search on the index to find the row of the index where a==34. But having found that index row, it can read out the value of b=12 directly, without having to do a second binary search on the table. The original table is never consulted and the query runs twice as fast. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
"B V, Phanisekhar" <[EMAIL PROTECTED]> wrote: > How does the index table looks? > > Assume the main table to be: > CREATE TABLE table1 (a INTEGER, b INTEGER) > Assume there is an index on column a: > CREATE INDEX index1 ON table1 (a); > > Now let's suppose the entries in table1 be: > 10, 91 >9, 56 > 89, 78 > 34, 12 > 99, 26 > 19, 77 > 44, 62 > 59, 55 Each table entry also has a hidden ROWID. Let's assume that the rowids are sequential. Then your data is really this: 1, 10, 91 2, 9, 56 3, 89, 78 4, 34, 12 5, 99, 26 6, 19, 77 7, 44, 62 8, 59, 55 Here the rowids are sequential. That do not have to be. But they do have to be unique and in increasing order. Because the rowids are ordered, we can do a binary search to quickly find an entry with a particular rowid. > > Corresponding to this table1 how will index table be? > The index on table1(a) consists of all table1.a values followed by their corresponding rowid, in increasing order: 9, 2 10, 1 19, 6 34, 4 44, 7 59, 8 89, 3 99, 5 > If each data value was unique, then one index lookup would find the > matching record. Can you explain how this is? Doesn't it will do binary > search on index table? > When you do: SELECT b FROM table1 WHERE a=34; SQLite first does a binary search on the index to find the entry where a==34. From this entry it discovers the rowid. rowid=4. Then it does a binary search on the table using rowid=4 to find the corresponding entry in the table. From that entry it sees that b=12. So in this case, SQLite has to do two separate binary searches, one on the index and another on the table. If, however, you declare your index like this: CREATE INDEX index1 ON table1(a, b); Then the index will look like this: 9, 56, 2 10, 91, 1 19, 77, 6 34, 12, 4 44, 62, 7 59, 55, 8 89, 78, 3 99, 26, 5 With this two-column index, if you repeat the same query SELECT b FROM table1 WHERE a=34 Then SQLite begins as it did before by doing a binary search on the index to find the row of the index where a==34. But having found that index row, it can read out the value of b=12 directly, without having to do a second binary search on the table. The original table is never consulted and the query runs twice as fast. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] An explanation?
Dennis, How does the index table looks? Assume the main table to be: CREATE TABLE table1 (a INTEGER, b INTEGER) Assume there is an index on column a: CREATE INDEX index1 ON table1 (a); Now let's suppose the entries in table1 be: 10, 91 9, 56 89, 78 34, 12 99, 26 19, 77 44, 62 59, 55 Corresponding to this table1 how will index table be? If each data value was unique, then one index lookup would find the matching record. Can you explain how this is? Doesn't it will do binary search on index table? Regards, Phani -Original Message- From: Dennis Cote [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 24, 2007 4:06 AM To: sqlite-users@sqlite.org Subject: Re: [sqlite] An explanation? Marco Bambini wrote: > > Database is uniformly distributed, I created it ad hoc just for my > test (sqlite 3.3.12): Marco, Another way to think of this is that if your database contained random numbers in the range 1-100 for both a and b, then an index on either of those values would allow sqlite to ignore all but the requested value, or 99% of the entries. It would only have to examine 1% of the records and would run in perhaps 2% of the time of a full table scan. If your data had even more distinct values, things would be even faster. Ultimately, if each data value was unique, then one index lookup would find the matching record, and the lookup time would only be about 2/300,000 or 0.0007% of the time for a full table scan. Indexes are not a magical cure all, they only speed up lookups if you enough different values to let them to reduce the search space to a small enough portion of the entire database to pay for their overhead. Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
Thanks a lot for the explanation Dennis, I really appreciate. --- Marco On Apr 24, 2007, at 12:35 AM, Dennis Cote wrote: Marco Bambini wrote: Database is uniformly distributed, I created it ad hoc just for my test (sqlite 3.3.12): Marco, Another way to think of this is that if your database contained random numbers in the range 1-100 for both a and b, then an index on either of those values would allow sqlite to ignore all but the requested value, or 99% of the entries. It would only have to examine 1% of the records and would run in perhaps 2% of the time of a full table scan. If your data had even more distinct values, things would be even faster. Ultimately, if each data value was unique, then one index lookup would find the matching record, and the lookup time would only be about 2/300,000 or 0.0007% of the time for a full table scan. Indexes are not a magical cure all, they only speed up lookups if you enough different values to let them to reduce the search space to a small enough portion of the entire database to pay for their overhead. Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
Marco Bambini wrote: Database is uniformly distributed, I created it ad hoc just for my test (sqlite 3.3.12): Marco, Another way to think of this is that if your database contained random numbers in the range 1-100 for both a and b, then an index on either of those values would allow sqlite to ignore all but the requested value, or 99% of the entries. It would only have to examine 1% of the records and would run in perhaps 2% of the time of a full table scan. If your data had even more distinct values, things would be even faster. Ultimately, if each data value was unique, then one index lookup would find the matching record, and the lookup time would only be about 2/300,000 or 0.0007% of the time for a full table scan. Indexes are not a magical cure all, they only speed up lookups if you enough different values to let them to reduce the search space to a small enough portion of the entire database to pay for their overhead. Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
Marco Bambini wrote: I know that I can use the ANALYZE command or that I can index both columns. I was making some tests and I found that with one index the query is slower that without any index, so I just trying to understand the reason... I do not want to run it faster, I already know that it is possible. Database is uniformly distributed, I created it ad hoc just for my test (sqlite 3.3.12): CREATE TABLE table1 (a INTEGER, b INTEGER) 150,000 rows are inserted with: INSERT INTO table1 (a,b) VALUES (5,10) 200 rows are inserted with: INSERT INTO table1 (a,b) VALUES (5,11) 150,000 rows are inserted with: INSERT INTO table1 (a,b) VALUES (4,11) And the query was: SELECT * FROM table1 WHERE a=5 AND b=11; New benchmarks: WITHOUT INDEX: 0.281 secs WITH TWO INDEXes: 0.463 secs WITH TWO INDEXes and the ANALYZE command: 0.480 secs INDEXes are: CREATE INDEX index1 ON table1(a); CREATE INDEX index2 ON table1(b); Marco, You have a kind of pathological case here. Your data is not uniformly distributed. You have only three kinds of records. 150,200 records match the index on a for a = 5, and 150,200 records match the index on b for b=11. So it doesn't matter which single index sqlite chooses, it will have to scan through 150,200 records comparing the un-indexed field's value. To do this it must extract the rowid from the index and then locate that row in the main table, and then compare the value of the un-indexed field. Scanning an index for 150,200 records and then looking up 150,200 records in the main table is simply more work than scanning the entire table of 300,000 records once. This case does not benefit from indexing, and in fact it is slowed down on both lookups and inserts. Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
I know that I can use the ANALYZE command or that I can index both columns. I was making some tests and I found that with one index the query is slower that without any index, so I just trying to understand the reason... I do not want to run it faster, I already know that it is possible. Database is uniformly distributed, I created it ad hoc just for my test (sqlite 3.3.12): CREATE TABLE table1 (a INTEGER, b INTEGER) 150,000 rows are inserted with: INSERT INTO table1 (a,b) VALUES (5,10) 200 rows are inserted with: INSERT INTO table1 (a,b) VALUES (5,11) 150,000 rows are inserted with: INSERT INTO table1 (a,b) VALUES (4,11) And the query was: SELECT * FROM table1 WHERE a=5 AND b=11; New benchmarks: WITHOUT INDEX: 0.281 secs WITH TWO INDEXes: 0.463 secs WITH TWO INDEXes and the ANALYZE command: 0.480 secs INDEXes are: CREATE INDEX index1 ON table1(a); CREATE INDEX index2 ON table1(b); --- Marco Bambini On Apr 23, 2007, at 9:36 PM, [EMAIL PROTECTED] wrote: Marco Bambini <[EMAIL PROTECTED]> wrote: Yes, I know that it is faster ... I just wonder why with one index the query is slower that without any index... Probably because most of the entries in your table match the term being indexed. In your case, this likely means that a large fraction of the table entries have a=5. When searching from an index, SQLite first finds the index entry, then has to do a binary search for the table entry. The table entry lookup is O(logN) where N is the number of entries in the table. If the number of rows in the result set is proportional to N, then the total runtime is O(NlogN). On the other hand, the total runtime of a full table scan (which is what happens if you omit the index) is O(N). N -- --- To unsubscribe, send email to [EMAIL PROTECTED] -- --- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
Marco Bambini <[EMAIL PROTECTED]> wrote: > Yes, I know that it is faster ... I just wonder why with one index > the query is slower that without any index... Probably because most of the entries in your table match the term being indexed. In your case, this likely means that a large fraction of the table entries have a=5. When searching from an index, SQLite first finds the index entry, then has to do a binary search for the table entry. The table entry lookup is O(logN) where N is the number of entries in the table. If the number of rows in the result set is proportional to N, then the total runtime is O(NlogN). On the other hand, the total runtime of a full table scan (which is what happens if you omit the index) is O(N). N - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
Marco Bambini wrote: As a performance test I created a db with 300,000 records, table is: CREATE TABLE table1 (a INTEGER, b INTEGER) a query like: SELECT * FROM table1 WHERE a=5 AND b=11; takes 0.281 secs. if I add two indexes: CREATE INDEX index1 ON table1(a); CREATE INDEX index2 ON table1(b); the same query is about two times slower, it takes 0.463 secs. (I know that only one index is used by the query). I repeated the test several times and results are confirmed... Anyone have an explanation? Marco, I suspect that sqlite is arbitrarily choosing the wrong index for your data. Can you retry the test after running an ANALYZE command? Are the a and b values in your sample code uniformly distributed? Dennis Cote - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
Yes, I know that it is faster ... I just wonder why with one index the query is slower that without any index... --- Marco Bambini On Apr 23, 2007, at 6:31 PM, P Kishor wrote: On 4/23/07, Marco Bambini <[EMAIL PROTECTED]> wrote: As a performance test I created a db with 300,000 records, table is: CREATE TABLE table1 (a INTEGER, b INTEGER) a query like: SELECT * FROM table1 WHERE a=5 AND b=11; takes 0.281 secs. if I add two indexes: CREATE INDEX index1 ON table1(a); CREATE INDEX index2 ON table1(b); the same query is about two times slower, it takes 0.463 secs. (I know that only one index is used by the query). Try CREATE INDEX index_ab ON table1 (a, b); and test. I repeated the test several times and results are confirmed... Anyone have an explanation? --- Marco Bambini - To unsubscribe, send email to [EMAIL PROTECTED] - -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ - collaborate, communicate, compete = -- --- To unsubscribe, send email to [EMAIL PROTECTED] -- --- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] An explanation?
On 4/23/07, Marco Bambini <[EMAIL PROTECTED]> wrote: As a performance test I created a db with 300,000 records, table is: CREATE TABLE table1 (a INTEGER, b INTEGER) a query like: SELECT * FROM table1 WHERE a=5 AND b=11; takes 0.281 secs. if I add two indexes: CREATE INDEX index1 ON table1(a); CREATE INDEX index2 ON table1(b); the same query is about two times slower, it takes 0.463 secs. (I know that only one index is used by the query). Try CREATE INDEX index_ab ON table1 (a, b); and test. I repeated the test several times and results are confirmed... Anyone have an explanation? --- Marco Bambini - To unsubscribe, send email to [EMAIL PROTECTED] - -- Puneet Kishor http://punkish.eidesis.org/ Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/ Open Source Geospatial Foundation http://www.osgeo.org/education/ - collaborate, communicate, compete = - To unsubscribe, send email to [EMAIL PROTECTED] -