Re: [sqlite] Join on same table and performance.

2006-03-24 Thread Elcin Recebli
Hi.

You're joining the table with itself using 'id'. However, there's no index just 
on that field. I'm not sure how exactly SQLite utilises indices, but it might 
be unable to use index on (id,key,source) triple to optimise calculation of 
"m1.id = m2.id".

Does this sound sensible?

Cheers.

--- =?ISO-8859-1?Q?Tobias_Rundstr=F6m?= <[EMAIL PROTECTED]> wrote:
> Hello,
> 
> I guess this subject is a bit worn out. But I am having scalabillity  
> problems with SQLite in XMMS2. We have dimensioned XMMS2 to handle  
> insanely huge media libraries, playlists and clients. Our goal is to  
> be able to run medialibs with 50.000 files without problem. Our  
> backend is SQLite3. When we get somewhere around 16 rows (10k  
> entries) we're starting to see problems with scalabillity of SQLite3.  
> I would like some views on how we could speed up the storage backend  
> and also comments on our structure.
> 
> Today we save all data in one table called Media. Each entry has a id  
> number and each id number can have a indefinitive number of  
> properties. To keep this flexible and clean we choose to add one row  
> per property. A property can be "artist", "album", "url" and more.  
> The schema is this:
> 
> create table Media (id integer, key, value, source integer);
> and the indexes:
> create unique index key_idx on Media (id,key,source);
> create index prop_idx on Media (key,value);
> create index source_idx on Media (key,source);
> create index key_source_val_idx on Media (key,source,value);
> 
> The most common query is something like this:
> 
> select value from Media where id=5 and key='url' and source=1;
> 
> This query remains very fast no matter how many entries I have in my  
> database also things like:
> 
> select key, value from Media where id=5 and source=1;
> 
> is still very fast.
> 
> But more advanced queries like "show me all albums and artists that  
> are not compilations" are very slow:
> 
> select distinct m1.value as artist, ifnull(m2.value,'[unknown]') as  
> album from Media m1 left join Media m2 on m1.id = m2.id and  
> m2.key='album' left join Media m3 on m1.id = m3.id and  
> m3.key='compilation' where m1.key='artist' and m3.value is null;
> 
> In fact, whenever I join with myself and try to extract a big number  
> of values it can take forever to get the result.
> 
> I have tried to increase the cache_size to somewhere around 32000 to  
> see if it made any difference, it didn't.
> idxchk tells me that the good indexes are in use.
> 
> Any comments, help or blame is welcome to try to solve this issue of  
> scalabillity. You can download a medialib here:
> http://debian.as/~skid/medialib.db.gz this contains almost 20  
> rows and 14000 songs and is a real user library.
> 
> Looking forward to getting your input.
> 
> Thanks
> Tobias
> 
> 
> --
> * Zoner PhotoStudio 8 - Your Photos perfect, shared, organised! 
> www.zoner.com/zps
> 
> 

__
Stops spam 100% for your email accounts or you get paid. http://www.cashette.com


Re: [sqlite] Join on same table and performance.

2006-03-24 Thread drh
Elcin Recebli <[EMAIL PROTECTED]> wrote:
> Hi.
> 
> You're joining the table with itself using 'id'. However, there's no index 
> just on that field. I'm not sure how exactly SQLite utilises indices, but it 
> might be unable to use index on (id,key,source) triple to optimise 
> calculation of "m1.id = m2.id".
> 

SQLite is able to use the prefix of an index.  So in
this case, the index on (id,key,source) would be used
to optimize m1.id=m2.id.
--
D. Richard Hipp   <[EMAIL PROTECTED]>



Re: [sqlite] Join on same table and performance.

2006-03-24 Thread Tobias Rundström


El 24-03-2006, a las 16:08, [EMAIL PROTECTED] escribió:


Elcin Recebli <[EMAIL PROTECTED]> wrote:

Hi.

You're joining the table with itself using 'id'. However, there's  
no index just on that field. I'm not sure how exactly SQLite  
utilises indices, but it might be unable to use index on  
(id,key,source) triple to optimise calculation of "m1.id = m2.id".




SQLite is able to use the prefix of an index.  So in
this case, the index on (id,key,source) would be used
to optimize m1.id=m2.id.


Hello,

Well it definitly did something. The query execution time was down by  
a factor of at least 100.


Thanks Elcin.

-- Tobias

Re: [sqlite] Join on same table and performance.

2006-03-24 Thread drh
=?ISO-8859-1?Q?Tobias_Rundstr=F6m?= <[EMAIL PROTECTED]> wrote:
> 
> create table Media (id integer, key, value, source integer);
> and the indexes:
> create unique index key_idx on Media (id,key,source);
> create index prop_idx on Media (key,value);
> create index source_idx on Media (key,source);
> create index key_source_val_idx on Media (key,source,value);

source_idx and key_source_val_idx are redundant.  Your queries
will run just as fast if you DROP source_idx and keep just
key_source_val_idx.  And your INSERTs, UPDATEs, and DELETEs will
be a little faster, since there is one fewer index to 
maintain.

This is just an aside - it is not the cause of your problems.

> 
> But more advanced queries like "show me all albums and artists that  
> are not compilations" are very slow:
> 
> select distinct m1.value as artist, ifnull(m2.value,'[unknown]') as  
> album from Media m1 left join Media m2 on m1.id = m2.id and  
> m2.key='album' left join Media m3 on m1.id = m3.id and  
> m3.key='compilation' where m1.key='artist' and m3.value is null;

I downloaded your database and the query above was indeed slow.
But then I ran "ANALYZE" so that SQLite can gather statistics
on the various indices, then reran the query.  This time, SQLite
was able to use the ANALYZE results to make better index choices 
and the query is quite speedy.

The results of ANALYZE are stored in a special table named
"sqlite_stat1".  So you only have to run it once and the result
will be used for all subsequent queries.

--
D. Richard Hipp   <[EMAIL PROTECTED]>



Re: [sqlite] Join on same table and performance.

2006-03-24 Thread Doug Currie
Friday, March 24, 2006, 2:33:36 PM, Tobias Rundström wrote:

> [...]
> The schema is this:

> create table Media (id integer, key, value, source integer);
> and the indexes:
> create unique index key_idx on Media (id,key,source);
> create index prop_idx on Media (key,value);
> create index source_idx on Media (key,source);
> create index key_source_val_idx on Media (key,source,value);

I wonder what effect

create table Media (id INTEGER PRIMARY KEY, key, value, source integer);

would have on your query time. This would use the already built-in
BTree index.

e

-- 
Doug Currie
Londonderry, NH



Re: [sqlite] Join on same table and performance.

2006-03-24 Thread Tobias Rundström

I downloaded your database and the query above was indeed slow.
But then I ran "ANALYZE" so that SQLite can gather statistics
on the various indices, then reran the query.  This time, SQLite
was able to use the ANALYZE results to make better index choices
and the query is quite speedy.

The results of ANALYZE are stored in a special table named
"sqlite_stat1".  So you only have to run it once and the result
will be used for all subsequent queries.


Thank, you.

This helps of course. How often should I run analyze? Trying to  
figure out how to make this programaticly for new installations. I  
guess there has to be a bit of data in the table before I can run  
analyze with the outcome that I want?


-- Tobias


Re: [sqlite] Join on same table and performance.

2006-03-24 Thread drh
=?ISO-8859-1?Q?Tobias_Rundstr=F6m?= <[EMAIL PROTECTED]> wrote:
> 
> create table Media (id integer, key, value, source integer);
> and the indexes:
> create unique index key_idx on Media (id,key,source);
> create index prop_idx on Media (key,value);
> create index source_idx on Media (key,source);
> create index key_source_val_idx on Media (key,source,value);
> 

Investigating further, I see that Media.source=1 for every
row in your database.  What is Media.source?  Does it ever
have a value other than 1?

Since source is always 1, there is really no point in indexing
it. For the sample data set you supplied, you could just as well
get by with the following two indices:

   CREATE UNIQUE INDEX key_idx ON media (id,key);
   CREATE INDEX prop_idx ON media (key,value);

If in other data sets media.source takes on more a more
diverse set of values, then perhaps some of the other indices
above would be useful - but not in the sample data you
supplied.

--
D. Richard Hipp   <[EMAIL PROTECTED]>



Re: [sqlite] Join on same table and performance.

2006-03-24 Thread drh
=?ISO-8859-1?Q?Tobias_Rundstr=F6m?= <[EMAIL PROTECTED]> wrote:
> > I downloaded your database and the query above was indeed slow.
> > But then I ran "ANALYZE" so that SQLite can gather statistics
> > on the various indices, then reran the query.  This time, SQLite
> > was able to use the ANALYZE results to make better index choices
> > and the query is quite speedy.
> >
> > The results of ANALYZE are stored in a special table named
> > "sqlite_stat1".  So you only have to run it once and the result
> > will be used for all subsequent queries.
> 
> Thank, you.
> 
> This helps of course. How often should I run analyze? Trying to  
> figure out how to make this programaticly for new installations. I  
> guess there has to be a bit of data in the table before I can run  
> analyze with the outcome that I want?
> 

The statistics on all of your databases are likely to be
very much alike.  So run ANALYZE once on a sample database
such as the one you posted earlier.  Save off the data
that the ANALYZE command writes into the "sqlite_stat1"
table.

When creating a new database, run ANALYZE before doing any
CREATE TABLEs.  The ANALYZE command will create an empty 
"sqlite_stat1" table.  Then use INSERTs to populate the 
sqlite_stat1 table with information saved from when you 
ran ANALYZE on the large sample database.

--
D. Richard Hipp   <[EMAIL PROTECTED]>