Re: [sqlite] Optimize Table Access Efficiency

2012-03-21 Thread Tim Morton
When I did the index test I did not have the second table in the DB. 
Just the complete large table (id, topics, definition). I also didn't have


order by random() limit 1

In the select statment. The data is alread in alphabetical order, but I'll try 
that anyway.

I am going to be away from the office over the next week or so but I will look 
into this some more as I get time.

Thank you both again for your time and patience.

Tim



On 3/21/2012 2:53 AM, Max Vlasov wrote:

On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton  wrote:


So it seems the index is no help;



Are you sure you did as Simon explained?
Becausem my tests shows the trick works... (sqlite 3.7.10)

Create table [TestTable] ([Id] integer primary key, [topics] Text,
[Definition] Text);
CREATE TABLE [SourceTable] ([Id] integer primary key, [Title] Text);

Insert into SourceTable (Title) VALUES ('bar')
Insert into SourceTable (Title) VALUES ('bar')
Insert into SourceTable (Title) VALUES ('foo bar')
Insert into SourceTable (Title) VALUES ('bar foo')

insert into TestTable (topics, definition) values ((Select Title from
SourceTable order by random() limit 1), 'abcdefghij ...') (about 500
symbols)
100,000 times so we have about 100,000 records

select topics, id from TestTable
:Sqlite read 103 Mb (I have a special "hack" into vfs to monitor how much
sqlite read during the query)
:Time: 4,9 sec

...After adding  the index

CREATE INDEX [idx_Trick] ON [TestTable] ([Topics], [Id])

select topics, id from TestTable order by topics,id
:Sqlite read 2 Mb
:Time: 1,3 sec

The timing was unaffected by sqlite and system cache since the db was
reopened with a special trick on Windows forcing it to clear cache for this
file

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-21 Thread Eduardo Morras

At 07:53 21/03/2012, Max Vlasov wrote:

On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton  wrote:

> So it seems the index is no help;
>


Are you sure you did as Simon explained?
Becausem my tests shows the trick works... (sqlite 3.7.10)


The trick works, but i think he didn't timed what he should timed.

He did:

cur.execute('SELECT topics FROM dict')
start = time.time()
for item in cur:
tmp.append(item[0])
print time.time() - start, 'get WL'

So he timed getting the data from the python cursor and moving to a 
python array. I think he should time:


start = time.time()
cur.execute('SELECT topics FROM dict')
for item in cur:
tmp.append(item[0])
print time.time() - start, 'get WL'

Including the SELECT because that's where the trick work.


The timing was unaffected by sqlite and system cache since the db was
reopened with a special trick on Windows forcing it to clear cache for this
file

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Max Vlasov
On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton  wrote:

> So it seems the index is no help;
>


Are you sure you did as Simon explained?
Becausem my tests shows the trick works... (sqlite 3.7.10)

Create table [TestTable] ([Id] integer primary key, [topics] Text,
[Definition] Text);
CREATE TABLE [SourceTable] ([Id] integer primary key, [Title] Text);

Insert into SourceTable (Title) VALUES ('bar')
Insert into SourceTable (Title) VALUES ('bar')
Insert into SourceTable (Title) VALUES ('foo bar')
Insert into SourceTable (Title) VALUES ('bar foo')

insert into TestTable (topics, definition) values ((Select Title from
SourceTable order by random() limit 1), 'abcdefghij ...') (about 500
symbols)
100,000 times so we have about 100,000 records

select topics, id from TestTable
:Sqlite read 103 Mb (I have a special "hack" into vfs to monitor how much
sqlite read during the query)
:Time: 4,9 sec

...After adding  the index

CREATE INDEX [idx_Trick] ON [TestTable] ([Topics], [Id])

select topics, id from TestTable order by topics,id
:Sqlite read 2 Mb
:Time: 1,3 sec

The timing was unaffected by sqlite and system cache since the db was
reopened with a special trick on Windows forcing it to clear cache for this
file

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Simon Slavin

On 20 Mar 2012, at 9:33pm, Tim Morton  wrote:

> So it seems the index is no help; a second topics table is a significant 
> help; and a separate file with topics table a negligible help; and creation 
> order is no help.

Interesting and a little unexpected.  Good to see a user doing experimentation.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Tim Morton

I did some quick tests:

I took a 100,000 entry dictionary and made a single table

"create table dict (id INTEGER PRIMARY KEY, topics, def)"

I ran a "for loop" on the cursor object to extract all the topic column 
data and put in in a Python list .


cur.execute('SELECT topics FROM dict')
start = time.time()
for item in cur:
tmp.append(item[0])
print time.time() - start, 'get WL'

The time was around .035 sec.

With an index created,

 'CREATE INDEX dctidx ON dict (id, topics)'

There was no difference in time. The file was around 2MB larger.

I made a second table with just the topic data,

"create table topics (topic)"

with this select,

'SELECT topic FROM topics'

The time was roughly half at around .017 sec.

I made a separate file with just the topic data table as above and the 
access time was only slightly less .014-.015 sec.


Also, I populated the topics table completely and then populated the def 
table, with no difference in speed if they were populated in the same loop.


So it seems the index is no help; a second topics table is a significant 
help; and a separate file with topics table a negligible help; and 
creation order is no help.


They were all good ideas to try, though.

Tim




On 3/20/2012 2:36 PM, Simon Slavin wrote:

On 20 Mar 2012, at 6:11pm, Tim Morton  wrote:


I may try this, but the topics are usually just one word each so I doubt an 
index could reduce it much.

Max's trick is something that takes advantage of how SQLite works entirely.  If 
you do a SELECT like

SELECT id,topics FROM dictionary ORDER BY topics

and there's an index like

CREATE INDEX dti ON dictionary (topics,id)

then SQLite will use that index to find the right order to retrieve the records 
in.  But having figured out what rows it needs it realises that it has all the 
information -- both the id and the topic -- right there in the index.  So it 
doesn't bother then looking up the TABLE data to fetch the other columns, it 
just returns all the information you asked for right from the index it's 
already looking at.  It's fast and, because the index contains only the data 
you want, it's equivalent to making up a separate table which has just the id 
and topics columns.

I forgot about that trick when I replied but Max didn't.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Tim Morton

Ah, very good. Thanks to you both.

I will definitly try this. Sounds like it may help.

Tim




On 3/20/2012 2:36 PM, Simon Slavin wrote:

On 20 Mar 2012, at 6:11pm, Tim Morton  wrote:


I may try this, but the topics are usually just one word each so I doubt an 
index could reduce it much.

Max's trick is something that takes advantage of how SQLite works entirely.  If 
you do a SELECT like

SELECT id,topics FROM dictionary ORDER BY topics

and there's an index like

CREATE INDEX dti ON dictionary (topics,id)

then SQLite will use that index to find the right order to retrieve the records 
in.  But having figured out what rows it needs it realises that it has all the 
information -- both the id and the topic -- right there in the index.  So it 
doesn't bother then looking up the TABLE data to fetch the other columns, it 
just returns all the information you asked for right from the index it's 
already looking at.  It's fast and, because the index contains only the data 
you want, it's equivalent to making up a separate table which has just the id 
and topics columns.

I forgot about that trick when I replied but Max didn't.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Simon Slavin

On 20 Mar 2012, at 6:11pm, Tim Morton  wrote:

> I may try this, but the topics are usually just one word each so I doubt an 
> index could reduce it much.

Max's trick is something that takes advantage of how SQLite works entirely.  If 
you do a SELECT like

SELECT id,topics FROM dictionary ORDER BY topics

and there's an index like

CREATE INDEX dti ON dictionary (topics,id)

then SQLite will use that index to find the right order to retrieve the records 
in.  But having figured out what rows it needs it realises that it has all the 
information -- both the id and the topic -- right there in the index.  So it 
doesn't bother then looking up the TABLE data to fetch the other columns, it 
just returns all the information you asked for right from the index it's 
already looking at.  It's fast and, because the index contains only the data 
you want, it's equivalent to making up a separate table which has just the id 
and topics columns.

I forgot about that trick when I replied but Max didn't.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Tim Morton

Thanks,

I may try this, but the topics are usually just one word each so I doubt 
an index could reduce it much.


Tim





On 3/20/2012 1:52 PM, Max Vlasov wrote:

Hi, Tim

On Tue, Mar 20, 2012 at 6:21 PM, Tim Morton  wrote:


Is there a way to read only the part of the file that has the queried
table/column thus saving time and hard dive grinding?



There is also a trick. You can create an index on 'topics' column and
perform your first reading forcing this column order and not touching
definition at the same time. Since any index is just another b-tree but
most of the time smaller you will end up reading a smaller part of file.
Although there's a price to pay, your db will become larger.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Max Vlasov
Hi, Tim

On Tue, Mar 20, 2012 at 6:21 PM, Tim Morton  wrote:

>
> Is there a way to read only the part of the file that has the queried
> table/column thus saving time and hard dive grinding?
>
>
There is also a trick. You can create an index on 'topics' column and
perform your first reading forcing this column order and not touching
definition at the same time. Since any index is just another b-tree but
most of the time smaller you will end up reading a smaller part of file.
Although there's a price to pay, your db will become larger.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Tim Morton

Thanks, Simon, for your detailed answer.

I will try the suggestions you and Michael supplied and see if there is 
any significant inprovement.


Tim

On 3/20/2012 11:13 AM, Simon Slavin wrote:

On 20 Mar 2012, at 2:21pm, Tim Morton  wrote:


My app reads dozens of SQLite databases ranging in size from 1MB to 100MB with 
a simple table structure like,

"create table dictionary(id INTEGER PRIMARY KEY, topics, definition)"

On startup the app reads all the databases to extract the "topics" column data. With the 
DB configured as above extracting the topic data is a lengthy process. It seems the whole DB file 
is being read to just access the relatively small "topics" column.

If I add another table with just the topics data,

"create table dictionary(id INTEGER PRIMARY KEY, topics)"

access is quicker. However, if I make another DB file with just the topics 
column, access is much quicker. I don't really want to use two files to access 
a single DB.

Also, the DB is only read from; not written to by the app.

Thank you for your detailed description which helps a lot in considering your 
situation.


My questions,

Does table creation order affect the speed of access?

These things depend mostly on your hardware.  Windows deals much better with 
defragmented files than fragmented files.  A lot of its speed comes from 
assumptions like if you're looking at sector s, you're shortly going to want 
sector s+1.  To speed up your situation, after your database file has been 
written, execute a VACUUM command in SQL then (if you're running Windows) 
defragment your hard disk.

If the above does not give you fast enough access, then it may be possible to 
speed things up still further but writing one TABLE then the other.  You can do 
this when you originally make the file, or you can use the sqlite shell tool to 
.dump the file to SQL commands and .read the SQL commands back in to form 
another database file.  After doing this, once again, if you're running Windows 
do another defragment.  I don't expect that to make much difference but it 
might be worth trying.  You should also look at the PRAGMA command Michael 
recommended.


Is there a way to read only the part of the file that has the queried 
table/column thus saving time and hard dive grinding?

Is it possible to read a small table/column from a huge file nearly as quick as 
the small table alone from a small file? And if so, how?

Your TABLE definition is as efficient as it can be.  Your most efficient 
reading code in each situation is

SELECT id,topics FROM dictionary

Any other optimization can't be done inside the SQL command.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Simon Slavin

On 20 Mar 2012, at 2:21pm, Tim Morton  wrote:

> My app reads dozens of SQLite databases ranging in size from 1MB to 100MB 
> with a simple table structure like,
> 
> "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)"
> 
> On startup the app reads all the databases to extract the "topics" column 
> data. With the DB configured as above extracting the topic data is a lengthy 
> process. It seems the whole DB file is being read to just access the 
> relatively small "topics" column.
> 
> If I add another table with just the topics data,
> 
> "create table dictionary(id INTEGER PRIMARY KEY, topics)"
> 
> access is quicker. However, if I make another DB file with just the topics 
> column, access is much quicker. I don't really want to use two files to 
> access a single DB.
> 
> Also, the DB is only read from; not written to by the app.

Thank you for your detailed description which helps a lot in considering your 
situation.

> My questions,
> 
> Does table creation order affect the speed of access?

These things depend mostly on your hardware.  Windows deals much better with 
defragmented files than fragmented files.  A lot of its speed comes from 
assumptions like if you're looking at sector s, you're shortly going to want 
sector s+1.  To speed up your situation, after your database file has been 
written, execute a VACUUM command in SQL then (if you're running Windows) 
defragment your hard disk.

If the above does not give you fast enough access, then it may be possible to 
speed things up still further but writing one TABLE then the other.  You can do 
this when you originally make the file, or you can use the sqlite shell tool to 
.dump the file to SQL commands and .read the SQL commands back in to form 
another database file.  After doing this, once again, if you're running Windows 
do another defragment.  I don't expect that to make much difference but it 
might be worth trying.  You should also look at the PRAGMA command Michael 
recommended.

> Is there a way to read only the part of the file that has the queried 
> table/column thus saving time and hard dive grinding?
> 
> Is it possible to read a small table/column from a huge file nearly as quick 
> as the small table alone from a small file? And if so, how?

Your TABLE definition is as efficient as it can be.  Your most efficient 
reading code in each situation is

SELECT id,topics FROM dictionary

Any other optimization can't be done inside the SQL command.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Black, Michael (IS)
You may also want to try pragma cache_size and bump it up a LOT just to see 
what happens to your timings.



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Tim Morton [t...@preservedwords.com]
Sent: Tuesday, March 20, 2012 9:44 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Optimize Table Access Efficiency

Thanks for the reply,

"Lengthy" varies according to the system, but from a fresh, initial
start on this one it can be 15 to 20 seconds. The system cache speeds up
subsequent starts to around 3-4 seconds.

As mentioned in the post, I have two tables, one for topics and one for
definitions. I read the topics table and get the index and then use it
to get the definition. The one thing I didn't do is populate the two
tables on creation separately. They both are populated in the same "for
loop." I'll try that, but with my previous texts I could always get
better speed with the topics in a separate DB on a fresh start.

Tim



On 3/20/2012 10:26 AM, Black, Michael (IS) wrote:
> Try creating 2 tables, one for topics, one for definitions.
>
>
>
> Then insert all the topics at once followed by all the definitions.
>
> That should give you the same disk layout as two databases.
>
>
>
> And you don't say what "lengthy" means.
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> Advanced Analytics Directorate
>
> Advanced GEOINT Solutions Operating Unit
>
> Northrop Grumman Information Systems
>
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
> behalf of Tim Morton [t...@preservedwords.com]
> Sent: Tuesday, March 20, 2012 9:21 AM
> To: sqlite-users@sqlite.org
> Subject: EXT :[sqlite] Optimize Table Access Efficiency
>
> Greetings,
>
> My app reads dozens of SQLite databases ranging in size from 1MB to
> 100MB with a simple table structure like,
>
> "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)"
>
> On startup the app reads all the databases to extract the "topics"
> column data. With the DB configured as above extracting the topic data
> is a lengthy process. It seems the whole DB file is being read to just
> access the relatively small "topics" column.
>
> If I add another table with just the topics data,
>
> "create table dictionary(id INTEGER PRIMARY KEY, topics)"
>
> access is quicker. However, if I make another DB file with just the
> topics column, access is much quicker. I don't really want to use two
> files to access a single DB.
>
> Also, the DB is only read from; not written to by the app.
>
> My questions,
>
> Does table creation order affect the speed of access?
>
> Is there a way to read only the part of the file that has the queried
> table/column thus saving time and hard dive grinding?
>
> Is it possible to read a small table/column from a huge file nearly as
> quick as the small table alone from a small file? And if so, how?
>
> Thanks,
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Tim Morton

Thanks for the reply,

"Lengthy" varies according to the system, but from a fresh, initial 
start on this one it can be 15 to 20 seconds. The system cache speeds up 
subsequent starts to around 3-4 seconds.


As mentioned in the post, I have two tables, one for topics and one for 
definitions. I read the topics table and get the index and then use it 
to get the definition. The one thing I didn't do is populate the two 
tables on creation separately. They both are populated in the same "for 
loop." I'll try that, but with my previous texts I could always get 
better speed with the topics in a separate DB on a fresh start.


Tim



On 3/20/2012 10:26 AM, Black, Michael (IS) wrote:

Try creating 2 tables, one for topics, one for definitions.



Then insert all the topics at once followed by all the definitions.

That should give you the same disk layout as two databases.



And you don't say what "lengthy" means.



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Tim Morton [t...@preservedwords.com]
Sent: Tuesday, March 20, 2012 9:21 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Optimize Table Access Efficiency

Greetings,

My app reads dozens of SQLite databases ranging in size from 1MB to
100MB with a simple table structure like,

"create table dictionary(id INTEGER PRIMARY KEY, topics, definition)"

On startup the app reads all the databases to extract the "topics"
column data. With the DB configured as above extracting the topic data
is a lengthy process. It seems the whole DB file is being read to just
access the relatively small "topics" column.

If I add another table with just the topics data,

"create table dictionary(id INTEGER PRIMARY KEY, topics)"

access is quicker. However, if I make another DB file with just the
topics column, access is much quicker. I don't really want to use two
files to access a single DB.

Also, the DB is only read from; not written to by the app.

My questions,

Does table creation order affect the speed of access?

Is there a way to read only the part of the file that has the queried
table/column thus saving time and hard dive grinding?

Is it possible to read a small table/column from a huge file nearly as
quick as the small table alone from a small file? And if so, how?

Thanks,

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimize Table Access Efficiency

2012-03-20 Thread Black, Michael (IS)
Try creating 2 tables, one for topics, one for definitions.



Then insert all the topics at once followed by all the definitions.

That should give you the same disk layout as two databases.



And you don't say what "lengthy" means.



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Tim Morton [t...@preservedwords.com]
Sent: Tuesday, March 20, 2012 9:21 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Optimize Table Access Efficiency

Greetings,

My app reads dozens of SQLite databases ranging in size from 1MB to
100MB with a simple table structure like,

"create table dictionary(id INTEGER PRIMARY KEY, topics, definition)"

On startup the app reads all the databases to extract the "topics"
column data. With the DB configured as above extracting the topic data
is a lengthy process. It seems the whole DB file is being read to just
access the relatively small "topics" column.

If I add another table with just the topics data,

"create table dictionary(id INTEGER PRIMARY KEY, topics)"

access is quicker. However, if I make another DB file with just the
topics column, access is much quicker. I don't really want to use two
files to access a single DB.

Also, the DB is only read from; not written to by the app.

My questions,

Does table creation order affect the speed of access?

Is there a way to read only the part of the file that has the queried
table/column thus saving time and hard dive grinding?

Is it possible to read a small table/column from a huge file nearly as
quick as the small table alone from a small file? And if so, how?

Thanks,

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: Re: [sqlite] optimize table

2005-04-15 Thread Jay Sprenkle
Since you produce the statistics once per month on a schedule
it would be more efficient to not have an index. The index will slow down daily
operations each time a record is inserted. Copy the database file once per
month to a 'snapshot' file. Add the index to the snapshot database and
then produce your statistics. Make it a scheduled task that runs the night
before you need it.

On 4/15/05, msaka msaka <[EMAIL PROTECTED]> wrote:
> have you some feeling on my problem?
> 
> cut or not cut db files?
> 
> >-Pôvodná správa-
> >Od: D. Richard Hipp [mailto:[EMAIL PROTECTED]
> >Odoslané: 15. apríla 2005 11:15
> >Komu: sqlite-users@sqlite.org
> >Predmet: Re: [sqlite] optimize table
> >
> >
> >On Fri, 2005-04-15 at 06:15 -0400, Ken & Deb Allen wrote:
> >> I cannot speak for SQLITE for certain, but I know with a number of
> >> commercial RDBMS packages the index is not used unless it contains a
> >> sufficient degree of distribution.
> >>
> >
> >SQLite does not do this.  It makes no effort to keep track of the
> >"degree of distribution" or other statistics associated with indices.
> >It just uses them if they are available.
> >
> >This can be either a feature or a bug, depending on your point
> >of view.  I have had people tell me that they prefer the SQLite
> >way.  With other databases, they tell me, they are constantly
> >in a struggle trying to outwit and trick the database engine
> >into doing what they want.  SQLite just does what you tell it
> >to do, for better or for worse.
> >--
> >D. Richard Hipp <[EMAIL PROTECTED]>
> 
> 


-- 
---
You a Gamer? If you're near Kansas City:
Conquest 36
https://events.reddawn.net

The Castles of Dereth Calendar: a tour of the art and architecture of
Asheron's Call
http://www.lulu.com/content/77264


Re: Re: [sqlite] optimize table

2005-04-15 Thread msaka msaka
have you some feeling on my problem?

cut or not cut db files?


>-PÃvodnà sprÃva-
>Od: D. Richard Hipp [mailto:[EMAIL PROTECTED]
>OdoslanÃ: 15. aprÃla 2005 11:15
>Komu: sqlite-users@sqlite.org
>Predmet: Re: [sqlite] optimize table
>
>
>On Fri, 2005-04-15 at 06:15 -0400, Ken & Deb Allen wrote:
>> I cannot speak for SQLITE for certain, but I know with a number of
>> commercial RDBMS packages the index is not used unless it contains a
>> sufficient degree of distribution.
>>
>
>SQLite does not do this.  It makes no effort to keep track of the
>"degree of distribution" or other statistics associated with indices.
>It just uses them if they are available.
>
>This can be either a feature or a bug, depending on your point
>of view.  I have had people tell me that they prefer the SQLite
>way.  With other databases, they tell me, they are constantly
>in a struggle trying to outwit and trick the database engine
>into doing what they want.  SQLite just does what you tell it
>to do, for better or for worse.
>--
>D. Richard Hipp <[EMAIL PROTECTED]>






Re: [sqlite] optimize table

2005-04-15 Thread D. Richard Hipp
On Fri, 2005-04-15 at 06:15 -0400, Ken & Deb Allen wrote:
> I cannot speak for SQLITE for certain, but I know with a number of 
> commercial RDBMS packages the index is not used unless it contains a 
> sufficient degree of distribution. 
> 

SQLite does not do this.  It makes no effort to keep track of the
"degree of distribution" or other statistics associated with indices.
It just uses them if they are available.

This can be either a feature or a bug, depending on your point
of view.  I have had people tell me that they prefer the SQLite
way.  With other databases, they tell me, they are constantly
in a struggle trying to outwit and trick the database engine
into doing what they want.  SQLite just does what you tell it
to do, for better or for worse.
-- 
D. Richard Hipp <[EMAIL PROTECTED]>



Re: [sqlite] optimize table

2005-04-15 Thread Ken & Deb Allen
P.S. When experimenting with this, try naming this 'second' table first 
to reduce the work the database has to do in selecting/rejecting 
records. Ypu can experiment with this by creating this new table, 
populating it with a SELECT from the existing table (only the records 
without statistics), and then timing the SELECT/JOIN.

-ken
On 15-Apr-05, at 6:15 AM, Ken & Deb Allen wrote:
I cannot speak for SQLITE for certain, but I know with a number of 
commercial RDBMS packages the index is not used unless it contains a 
sufficient degree of distribution. Microsoft SQL Server, for example, 
is quite explicit in stating that if the distribution of values across 
the range of records in an index falls below a certain level, then the 
index is ignored, which is why you are used to put the most dynamic 
column first in a multi-column index. I do not know if SQLITE performs 
any of this analysis and makes decisions on whether to use the index 
or not, but it is possible, especially if a sort is involved, that the 
index is not helping much if all records have one of two values, and 
as the majority of he records have the same value, you are looking 
only for the minority records. I do not know if using a date/time for 
this files will speed it up or not, since your statistical inclusion 
query will be for records "WHERE statistic IS NULL" or "WHERE 
statistic = 0", and as the database grows in size this will be the 
minority set of records.

One 'trick' I have seen in the past was to create a second table that 
contained an entry for new records, also keyed by the same "id"; as 
each sale is created, an entry is also made in that table, possibly by 
a trigger. You select the records for statistics using a JOIN (inner), 
which will only return records that are in both tables. As the records 
are processed, the record is removed from the second table, and 
therefore the records will not be part of the JOIN in the future. In 
some databases this may be faster than using an index; I have not 
experimented with this in SQLITE, however.

-ken
On 15-Apr-05, at 4:30 AM, msaka msaka wrote:
i dont need working with dates and times..
i only set 1 records on which was done..
which index is better to create for this problem?
must i cut db files? or speed of statistic will be same now and after 
1000 000 records?


-Pôvodná správa-
Od: Gert Rijs [mailto:[EMAIL PROTECTED]
Odoslané: 15. apríla 2005 7:13
Komu: sqlite-users@sqlite.org
Predmet: Re: [sqlite] optimize table
msaka msaka wrote:
CREATE TABLE sale(
   id INTEGER NOT NULL UNIQUE PRIMARY KEY,
   bill_no integer,
   bill_item_no integer,
   item_name varchar(20),
   qty integer,
   amount DOUBLE,
   statistic integer
)

Why don't you replace "statistic" with a timestamp formatted like
mmdd, perhaps add the time as well. If you add an index on it you
can efficiently retrieve the rows for certain date periods without
needing to update the table.
Gert






Re: [sqlite] optimize table

2005-04-15 Thread Ken & Deb Allen
I cannot speak for SQLITE for certain, but I know with a number of 
commercial RDBMS packages the index is not used unless it contains a 
sufficient degree of distribution. Microsoft SQL Server, for example, 
is quite explicit in stating that if the distribution of values across 
the range of records in an index falls below a certain level, then the 
index is ignored, which is why you are used to put the most dynamic 
column first in a multi-column index. I do not know if SQLITE performs 
any of this analysis and makes decisions on whether to use the index or 
not, but it is possible, especially if a sort is involved, that the 
index is not helping much if all records have one of two values, and as 
the majority of he records have the same value, you are looking only 
for the minority records. I do not know if using a date/time for this 
files will speed it up or not, since your statistical inclusion query 
will be for records "WHERE statistic IS NULL" or "WHERE statistic = 0", 
and as the database grows in size this will be the minority set of 
records.

One 'trick' I have seen in the past was to create a second table that 
contained an entry for new records, also keyed by the same "id"; as 
each sale is created, an entry is also made in that table, possibly by 
a trigger. You select the records for statistics using a JOIN (inner), 
which will only return records that are in both tables. As the records 
are processed, the record is removed from the second table, and 
therefore the records will not be part of the JOIN in the future. In 
some databases this may be faster than using an index; I have not 
experimented with this in SQLITE, however.

-ken
On 15-Apr-05, at 4:30 AM, msaka msaka wrote:
i dont need working with dates and times..
i only set 1 records on which was done..
which index is better to create for this problem?
must i cut db files? or speed of statistic will be same now and after 
1000 000 records?


-Pôvodná správa-
Od: Gert Rijs [mailto:[EMAIL PROTECTED]
Odoslané: 15. apríla 2005 7:13
Komu: sqlite-users@sqlite.org
Predmet: Re: [sqlite] optimize table
msaka msaka wrote:
CREATE TABLE sale(
   id INTEGER NOT NULL UNIQUE PRIMARY KEY,
   bill_no integer,
   bill_item_no integer,
   item_name varchar(20),
   qty integer,
   amount DOUBLE,
   statistic integer
)

Why don't you replace "statistic" with a timestamp formatted like
mmdd, perhaps add the time as well. If you add an index on it you
can efficiently retrieve the rows for certain date periods without
needing to update the table.
Gert





Re: Re: [sqlite] optimize table

2005-04-15 Thread msaka msaka
i dont need working with dates and times..
i only set 1 records on which was done..

which index is better to create for this problem?
must i cut db files? or speed of statistic will be same now and after 1000 000 
records?


>-Pôvodná správa-
>Od: Gert Rijs [mailto:[EMAIL PROTECTED]
>Odoslané: 15. apríla 2005 7:13
>Komu: sqlite-users@sqlite.org
>Predmet: Re: [sqlite] optimize table
>
>
>msaka msaka wrote:
>>
>> CREATE TABLE sale(
>>id INTEGER NOT NULL UNIQUE PRIMARY KEY,
>>bill_no integer,
>>bill_item_no integer,
>>item_name varchar(20),
>>qty integer,
>>amount DOUBLE,
>>statistic integer
>> )
>>
>>
>
>Why don't you replace "statistic" with a timestamp formatted like
>mmdd, perhaps add the time as well. If you add an index on it you
>can efficiently retrieve the rows for certain date periods without
>needing to update the table.
>
>Gert






Re: [sqlite] optimize table

2005-04-15 Thread Gert Rijs
msaka msaka wrote:
CREATE TABLE sale(
   id INTEGER NOT NULL UNIQUE PRIMARY KEY,
   bill_no integer,
   bill_item_no integer,
   item_name varchar(20),
   qty integer,
   amount DOUBLE,
   statistic integer
)

Why don't you replace "statistic" with a timestamp formatted like 
mmdd, perhaps add the time as well. If you add an index on it you 
can efficiently retrieve the rows for certain date periods without 
needing to update the table.

Gert