RE: [sqlite] An explanation?

2007-04-26 Thread B V, Phanisekhar
 
Thanks Dennis,
 
Is it that when a Integer column of a table is defined as primary key, the it 
will be part of every index table (rather than rowid) defined on that table? 
How does it work when we define a non integer as primary key. Assuming in the 
example given below if we make the Title column as primary key and create index 
on Id, how does it affect the performance of the two queries?
 
Regards,
Phanisekahr
 

 


From: Dennis Cote [mailto:[EMAIL PROTECTED]
Sent: Thu 4/26/2007 7:52 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] An explanation?



B V, Phanisekhar wrote:
> Thanks for that Info.
>
> I have another question:
>
> Assume I have a table given below
> "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName
> String)"
> "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON TitleName"
>
> Now since Id is an integer and a primary key, this will work as rowid
> internally.
>
> I have two queries that needs to be optimized:
>
> 1 Select TitleName from Title where Id = ?
> 2 Select Id from Title where TitleName = ?
>
> In order to make the previous two queries optimized, how should I
> declare my Table and Index?
>
> Should it be:
>
> 1
> "CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName
> String)"
> "CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON (TitleName, Id)"
>
> 2
>   The one which I assumed
>
> Which one of these will give the better performance for the two queries?
> Or is there any other alternative that will give even better
> performance?
>
> Regards,
> Phanisekhar
>  

Phanisekhar,

Your original index definition is all that is needed. The index already 
contains the rowid for the table record, which happens to be the column id 
because of the integer primary key optimization. There is nothing to be gained 
by adding it to the index again.

Your first query will be satisfied by a binary search in the title table 
looking for the id. It won't use the index.

Your second query will be satisfied by a binary search in the TitleIdx index 
looking for a matching title. It won't use the Title table.

HTH
Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] An explanation?

2007-04-26 Thread Dennis Cote

B V, Phanisekhar wrote:

Thanks for that Info.

I have another question:

Assume I have a table given below
"CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName
String)"
"CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON TitleName"

Now since Id is an integer and a primary key, this will work as rowid
internally. 


I have two queries that needs to be optimized:

1   Select TitleName from Title where Id = ?
2   Select Id from Title where TitleName = ?

In order to make the previous two queries optimized, how should I
declare my Table and Index?

Should it be: 

1 
"CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName

String)"
"CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON (TitleName, Id)"

2
The one which I assumed

Which one of these will give the better performance for the two queries?
Or is there any other alternative that will give even better
performance?

Regards,
Phanisekhar
  


Phanisekhar,

Your original index definition is all that is needed. The index already 
contains the rowid for the table record, which happens to be the column id 
because of the integer primary key optimization. There is nothing to be gained 
by adding it to the index again.

Your first query will be satisfied by a binary search in the title table 
looking for the id. It won't use the index.

Your second query will be satisfied by a binary search in the TitleIdx index 
looking for a matching title. It won't use the Title table.

HTH
Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] An explanation?

2007-04-26 Thread B V, Phanisekhar
Thanks for that Info.

I have another question:

Assume I have a table given below
"CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName
String)"
"CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON TitleName"

Now since Id is an integer and a primary key, this will work as rowid
internally. 

I have two queries that needs to be optimized:

1   Select TitleName from Title where Id = ?
2   Select Id from Title where TitleName = ?

In order to make the previous two queries optimized, how should I
declare my Table and Index?

Should it be: 

1 
"CREATE TABLE IF NOT EXISTS Title(Id INTEGER PRIMARY KEY, TitleName
String)"
"CREATE UNIQUE INDEX IF NOT EXISTS TitleIdx ON (TitleName, Id)"

2
The one which I assumed

Which one of these will give the better performance for the two queries?
Or is there any other alternative that will give even better
performance?

Regards,
Phanisekhar




-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 26, 2007 3:42 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] An explanation?

"B V, Phanisekhar" <[EMAIL PROTECTED]> wrote:
> How does the index table looks?
> 
> Assume the main table to be:
>   CREATE TABLE table1 (a INTEGER, b INTEGER)
> Assume there is an index on column a:
>   CREATE INDEX index1 ON table1 (a);
> 
> Now let's suppose the entries in table1 be:
>   10, 91
>9, 56
>   89, 78
>   34, 12
>   99, 26
>   19, 77
>   44, 62
>   59, 55

Each table entry also has a hidden ROWID.  Let's assume that the
rowids are sequential.  Then your data is really this:

  1,  10,  91
  2,   9,  56
  3,  89,  78
  4,  34,  12
  5,  99,  26
  6,  19,  77
  7,  44,  62
  8,  59,  55

Here the rowids are sequential.  That do not have to be.  But they
do have to be unique and in increasing order.  Because the rowids
are ordered, we can do a binary search to quickly find an entry
with a particular rowid.

>   
> Corresponding to this table1 how will index table be?
> 

The index on table1(a) consists of all table1.a values followed
by their corresponding rowid, in increasing order:

   9,  2
  10,  1
  19,  6
  34,  4
  44,  7
  59,  8
  89,  3
  99,  5


> If each data value was unique, then one index lookup would find the
> matching record. Can you explain how this is? Doesn't it will do
binary
> search on index table?
> 

When you do:

SELECT b FROM table1 WHERE a=34;

SQLite first does a binary search on the index to find the entry
where a==34.  From this entry it discovers the rowid.  rowid=4.
Then it does a binary search on the table using rowid=4 to find
the corresponding entry in the table.  From that entry it sees
that b=12.

So in this case, SQLite has to do two separate binary searches,
one on the index and another on the table.

If, however, you declare your index like this:

   CREATE INDEX index1 ON table1(a, b);

Then the index will look like this:

   9,  56,  2
  10,  91,  1
  19,  77,  6
  34,  12,  4
  44,  62,  7
  59,  55,  8
  89,  78,  3
  99,  26,  5

With this two-column index, if you repeat the same query

SELECT b FROM table1 WHERE a=34

Then SQLite begins as it did before by doing a binary search
on the index to find the row of the index where a==34.  But
having found that index row, it can read out the value of b=12
directly, without having to do a second binary search on the
table.  The original table is never consulted and the query
runs twice as fast.

--
D. Richard Hipp  <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-26 Thread drh
"B V, Phanisekhar" <[EMAIL PROTECTED]> wrote:
> How does the index table looks?
> 
> Assume the main table to be:
>   CREATE TABLE table1 (a INTEGER, b INTEGER)
> Assume there is an index on column a:
>   CREATE INDEX index1 ON table1 (a);
> 
> Now let's suppose the entries in table1 be:
>   10, 91
>9, 56
>   89, 78
>   34, 12
>   99, 26
>   19, 77
>   44, 62
>   59, 55

Each table entry also has a hidden ROWID.  Let's assume that the
rowids are sequential.  Then your data is really this:

  1,  10,  91
  2,   9,  56
  3,  89,  78
  4,  34,  12
  5,  99,  26
  6,  19,  77
  7,  44,  62
  8,  59,  55

Here the rowids are sequential.  That do not have to be.  But they
do have to be unique and in increasing order.  Because the rowids
are ordered, we can do a binary search to quickly find an entry
with a particular rowid.

>   
> Corresponding to this table1 how will index table be?
> 

The index on table1(a) consists of all table1.a values followed
by their corresponding rowid, in increasing order:

   9,  2
  10,  1
  19,  6
  34,  4
  44,  7
  59,  8
  89,  3
  99,  5


> If each data value was unique, then one index lookup would find the
> matching record. Can you explain how this is? Doesn't it will do binary
> search on index table?
> 

When you do:

SELECT b FROM table1 WHERE a=34;

SQLite first does a binary search on the index to find the entry
where a==34.  From this entry it discovers the rowid.  rowid=4.
Then it does a binary search on the table using rowid=4 to find
the corresponding entry in the table.  From that entry it sees
that b=12.

So in this case, SQLite has to do two separate binary searches,
one on the index and another on the table.

If, however, you declare your index like this:

   CREATE INDEX index1 ON table1(a, b);

Then the index will look like this:

   9,  56,  2
  10,  91,  1
  19,  77,  6
  34,  12,  4
  44,  62,  7
  59,  55,  8
  89,  78,  3
  99,  26,  5

With this two-column index, if you repeat the same query

SELECT b FROM table1 WHERE a=34

Then SQLite begins as it did before by doing a binary search
on the index to find the row of the index where a==34.  But
having found that index row, it can read out the value of b=12
directly, without having to do a second binary search on the
table.  The original table is never consulted and the query
runs twice as fast.

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] An explanation?

2007-04-26 Thread B V, Phanisekhar
Dennis,
How does the index table looks?

Assume the main table to be:
CREATE TABLE table1 (a INTEGER, b INTEGER)
Assume there is an index on column a:
CREATE INDEX index1 ON table1 (a);

Now let's suppose the entries in table1 be:
10, 91
 9, 56
89, 78
34, 12
99, 26
19, 77
44, 62
59, 55

Corresponding to this table1 how will index table be?

If each data value was unique, then one index lookup would find the
matching record. Can you explain how this is? Doesn't it will do binary
search on index table?

Regards,
Phani



-Original Message-
From: Dennis Cote [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 24, 2007 4:06 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] An explanation?

Marco Bambini wrote:
>
> Database is uniformly distributed, I created it ad hoc just for my 
> test (sqlite 3.3.12):
Marco,

Another way to think of this is that if your database contained random 
numbers in the range 1-100 for both a and b, then an index on either of 
those values would allow sqlite to ignore all but the requested value, 
or 99% of the entries. It would only have to examine 1% of the records 
and would run in perhaps 2% of the time of a full table scan. If your 
data had even more distinct values, things would be even faster. 
Ultimately, if each data value was unique, then one index lookup would 
find the matching record, and the lookup time would only be about 
2/300,000 or 0.0007% of the time for a full table scan.

Indexes are not a magical cure all, they only speed up lookups if you 
enough different values to let them to reduce the search space to a 
small enough portion of the entire database to pay for their overhead.

Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-24 Thread Marco Bambini

Thanks a lot for the explanation Dennis, I really appreciate.

---
Marco


On Apr 24, 2007, at 12:35 AM, Dennis Cote wrote:


Marco Bambini wrote:


Database is uniformly distributed, I created it ad hoc just for my  
test (sqlite 3.3.12):

Marco,

Another way to think of this is that if your database contained  
random numbers in the range 1-100 for both a and b, then an index  
on either of those values would allow sqlite to ignore all but the  
requested value, or 99% of the entries. It would only have to  
examine 1% of the records and would run in perhaps 2% of the time  
of a full table scan. If your data had even more distinct values,  
things would be even faster. Ultimately, if each data value was  
unique, then one index lookup would find the matching record, and  
the lookup time would only be about 2/300,000 or 0.0007% of the  
time for a full table scan.


Indexes are not a magical cure all, they only speed up lookups if  
you enough different values to let them to reduce the search space  
to a small enough portion of the entire database to pay for their  
overhead.


Dennis Cote



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-23 Thread Dennis Cote

Marco Bambini wrote:


Database is uniformly distributed, I created it ad hoc just for my 
test (sqlite 3.3.12):

Marco,

Another way to think of this is that if your database contained random 
numbers in the range 1-100 for both a and b, then an index on either of 
those values would allow sqlite to ignore all but the requested value, 
or 99% of the entries. It would only have to examine 1% of the records 
and would run in perhaps 2% of the time of a full table scan. If your 
data had even more distinct values, things would be even faster. 
Ultimately, if each data value was unique, then one index lookup would 
find the matching record, and the lookup time would only be about 
2/300,000 or 0.0007% of the time for a full table scan.


Indexes are not a magical cure all, they only speed up lookups if you 
enough different values to let them to reduce the search space to a 
small enough portion of the entire database to pay for their overhead.


Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-23 Thread Dennis Cote

Marco Bambini wrote:
I know that I can use the ANALYZE command or that I can index both 
columns.
I was making some tests and I found that with one index  the query is 
slower that without any index, so I just trying to understand the 
reason... I do not want to run it faster, I already know that it is 
possible.


Database is uniformly distributed, I created it ad hoc just for my 
test (sqlite 3.3.12):

CREATE TABLE table1 (a INTEGER, b INTEGER)

150,000 rows are inserted with:
INSERT INTO table1 (a,b) VALUES (5,10)

200 rows are inserted with:
INSERT INTO table1 (a,b) VALUES (5,11)

150,000 rows are inserted with:
INSERT INTO table1 (a,b) VALUES (4,11)

And the query was:
SELECT * FROM table1 WHERE a=5 AND b=11;

New benchmarks:
WITHOUT INDEX: 0.281 secs
WITH TWO INDEXes:  0.463 secs
WITH TWO INDEXes and the ANALYZE command: 0.480 secs

INDEXes are:
CREATE INDEX index1 ON table1(a);
CREATE INDEX index2 ON table1(b);


Marco,

You have a kind of pathological case here. Your data is not uniformly 
distributed. You have only three kinds of records. 150,200 records match 
the index on a for a = 5, and 150,200 records match the index on b for 
b=11. So it doesn't matter which single index sqlite chooses, it will 
have to scan through 150,200 records comparing the un-indexed field's 
value. To do this it must extract the rowid from the index and then 
locate that row in the main table, and then compare the value of the 
un-indexed field.


Scanning an index for 150,200 records and then looking up 150,200 
records in the main table is simply more work than scanning the entire 
table of 300,000 records once.


This case does not benefit from indexing, and in fact it is slowed down 
on both lookups and inserts.


Dennis Cote

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-23 Thread Marco Bambini
I know that I can use the ANALYZE command or that I can index both  
columns.
I was making some tests and I found that with one index  the query is  
slower that without any index, so I just trying to understand the  
reason... I do not want to run it faster, I already know that it is  
possible.


Database is uniformly distributed, I created it ad hoc just for my  
test (sqlite 3.3.12):

CREATE TABLE table1 (a INTEGER, b INTEGER)

150,000 rows are inserted with:
INSERT INTO table1 (a,b) VALUES (5,10)

200 rows are inserted with:
INSERT INTO table1 (a,b) VALUES (5,11)

150,000 rows are inserted with:
INSERT INTO table1 (a,b) VALUES (4,11)

And the query was:
SELECT * FROM table1 WHERE a=5 AND b=11;

New benchmarks:
WITHOUT INDEX: 0.281 secs
WITH TWO INDEXes:  0.463 secs
WITH TWO INDEXes and the ANALYZE command: 0.480 secs

INDEXes are:
CREATE INDEX index1 ON table1(a);
CREATE INDEX index2 ON table1(b);

---
Marco Bambini


On Apr 23, 2007, at 9:36 PM, [EMAIL PROTECTED] wrote:


Marco Bambini <[EMAIL PROTECTED]> wrote:

Yes, I know that it is faster ... I just wonder why with one index
the query is slower that without any index...


Probably because most of the entries in your table
match the term being indexed.  In your case, this
likely means that a large fraction of the table
entries have a=5.

When searching from an index, SQLite first finds
the index entry, then has to do a binary search
for the table entry.  The table entry lookup
is O(logN) where N is the number of entries in
the table.  If the number of rows in the result
set is proportional to N, then the total runtime
is O(NlogN).  On the other hand, the total runtime
of a full table scan (which is what happens if you
omit the index) is O(N).  N


-- 
---

To unsubscribe, send email to [EMAIL PROTECTED]
-- 
---





-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-23 Thread drh
Marco Bambini <[EMAIL PROTECTED]> wrote:
> Yes, I know that it is faster ... I just wonder why with one index  
> the query is slower that without any index...

Probably because most of the entries in your table
match the term being indexed.  In your case, this
likely means that a large fraction of the table
entries have a=5.  

When searching from an index, SQLite first finds
the index entry, then has to do a binary search
for the table entry.  The table entry lookup
is O(logN) where N is the number of entries in
the table.  If the number of rows in the result
set is proportional to N, then the total runtime
is O(NlogN).  On the other hand, the total runtime
of a full table scan (which is what happens if you
omit the index) is O(N).  N


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-23 Thread Dennis Cote

Marco Bambini wrote:

As a performance test I created a db with 300,000 records, table is:

CREATE TABLE table1 (a INTEGER, b INTEGER)

a query like:
SELECT * FROM table1 WHERE a=5 AND b=11;
takes 0.281 secs.

if I add two indexes:
CREATE INDEX index1 ON table1(a);
CREATE INDEX index2 ON table1(b);

the same query is about two times slower, it takes 0.463 secs.
(I know that only one index is used by the query).

I repeated the test several times and results are confirmed...

Anyone have an explanation?


Marco,

I suspect that sqlite is arbitrarily choosing the wrong index for your 
data. Can you retry the test after running an ANALYZE command? Are the a 
and b values in your sample code uniformly distributed?


Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-23 Thread Marco Bambini
Yes, I know that it is faster ... I just wonder why with one index  
the query is slower that without any index...

---
Marco Bambini


On Apr 23, 2007, at 6:31 PM, P Kishor wrote:


On 4/23/07, Marco Bambini <[EMAIL PROTECTED]> wrote:

As a performance test I created a db with 300,000 records, table is:

CREATE TABLE table1 (a INTEGER, b INTEGER)

a query like:
SELECT * FROM table1 WHERE a=5 AND b=11;
takes 0.281 secs.

if I add two indexes:
CREATE INDEX index1 ON table1(a);
CREATE INDEX index2 ON table1(b);

the same query is about two times slower, it takes 0.463 secs.
(I know that only one index is used by the query).


Try

CREATE INDEX index_ab ON table1 (a, b);

and test.




I repeated the test several times and results are confirmed...

Anyone have an explanation?
---
Marco Bambini


- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 







--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
-
collaborate, communicate, compete
=

-- 
---

To unsubscribe, send email to [EMAIL PROTECTED]
-- 
---





-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] An explanation?

2007-04-23 Thread P Kishor

On 4/23/07, Marco Bambini <[EMAIL PROTECTED]> wrote:

As a performance test I created a db with 300,000 records, table is:

CREATE TABLE table1 (a INTEGER, b INTEGER)

a query like:
SELECT * FROM table1 WHERE a=5 AND b=11;
takes 0.281 secs.

if I add two indexes:
CREATE INDEX index1 ON table1(a);
CREATE INDEX index2 ON table1(b);

the same query is about two times slower, it takes 0.463 secs.
(I know that only one index is used by the query).


Try

CREATE INDEX index_ab ON table1 (a, b);

and test.




I repeated the test several times and results are confirmed...

Anyone have an explanation?
---
Marco Bambini


-
To unsubscribe, send email to [EMAIL PROTECTED]
-





--
Puneet Kishor http://punkish.eidesis.org/
Nelson Inst. for Env. Studies, UW-Madison http://www.nelson.wisc.edu/
Open Source Geospatial Foundation http://www.osgeo.org/education/
-
collaborate, communicate, compete
=

-
To unsubscribe, send email to [EMAIL PROTECTED]
-