Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Lazarus 101
On Tue, Jun 28, 2011 at 8:18 PM, Jan Hudec  wrote:

> > name TEXT
> >
> > DATA
> > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE,
> > data_type TEXT,
>
> If nothing else, you want to define integer identifiers for the data types
> and use integer here. That will save you some space (reading from flash is
> still performance bottleneck, especially if the flash is SD card) and some
> unnecessarily costly string comparisons. In code, you'd obviously use
> symbolic constants.
>

i made this change but did not see any performance improvements, but it
makes sense to keep it.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Simon Slavin

On 29 Jun 2011, at 2:26am, Lazarus 101 wrote:

> the select statement is
>   SELECT * from files left join data on files.file_id=data.file_id;

So you read all the records for the correct file_id, and deal with each one as 
you find it, ignoring those you don't want.  Hmm.  I don't see why your app 
isn't far faster with the index.  Can an SQLite expert explain it ?

By the way, when SQLite does create the index automatically it deletes it when 
the SELECT finishes.  So in your real application it'll keep recreating the 
index for every SELECT.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Lazarus 101
the select statement is
   SELECT * from files left join data on files.file_id=data.file_id;

to test the performance i'm only doing

long t1 = System.currentTimeMillis();

Cursor cursor = db.rawQuery(...);
while (cursor.moveToNext()) {
}
 android.util.Log.e(TAG, "loaded in: " + (System.currentTimeMillis() -t1));

and the results are
without index: 8143
with index:7039

this is for 1453 entries in resources and 7697 entries in data

the output for explain query plan (without the above index) is
0|0|TABLE files
1|1|TABLE data WITH INDEX sqlite_autoindex_data_1

so it looks like it creates an index automatically and that's why there is
no big improvement

On Wed, Jun 29, 2011 at 2:30 AM, Simon Slavin  wrote:

>
> On 28 Jun 2011, at 9:58pm, Lazarus 101 wrote:
>
> >> You should make an index on the columns 'file_id' and 'data_type' from
> the
> >> 'DATA' table.  This will allow it to be searched far more quickly.  Your
> >> command will be something like
> >>
> >> CREATE UNIQUE INDEX dfd ON data (file_id,data_type)
> >>
> >> Then do the above testing again.
> >
> > tried that and it didn't help much
>
> Something is wrong with your logic or programming.  Having this index
> should make a huge difference in the speed of lookup.  How are you finding
> the entries in the TABLE called 'data' ?
>
> Actually, as Jan posted, instead of the above index do this one:
>
> CREATE UNIQUE INDEX dfdv ON data (file_id,data_type, value)
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Simon Slavin

On 28 Jun 2011, at 9:58pm, Lazarus 101 wrote:

>> You should make an index on the columns 'file_id' and 'data_type' from the
>> 'DATA' table.  This will allow it to be searched far more quickly.  Your
>> command will be something like
>> 
>> CREATE UNIQUE INDEX dfd ON data (file_id,data_type)
>> 
>> Then do the above testing again.
> 
> tried that and it didn't help much

Something is wrong with your logic or programming.  Having this index should 
make a huge difference in the speed of lookup.  How are you finding the entries 
in the TABLE called 'data' ?

Actually, as Jan posted, instead of the above index do this one:

CREATE UNIQUE INDEX dfdv ON data (file_id,data_type, value)

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Lazarus 101
On Tue, Jun 28, 2011 at 8:18 PM, Jan Hudec  wrote:

> On Tue, Jun 28, 2011 at 17:33:23 +0300, Lazarus 101 wrote:
> > Hi guys, i'm working on an Android app and using sqlite to store some
> data
> > and i need some help with a query.
> >
> > I have the following table structure:
> >
> > FILES
> > file_id INTEGER NOT NULL,
>    Do file_ids repeat? If not, it should be
> "integer
>  primary key". Than sqlite will alias the column to the "rowid"
>  column the table is internally ordered by for improved performance
>  looking up by file_id and saving some space (one fewer columns).
>

yes, that was already marked as primary key but it was at the end of the
create statement, that's why i forgot to mention it.


> > name TEXT
> >
> > DATA
> > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE,
> > data_type TEXT,
>
> If nothing else, you want to define integer identifiers for the data types
> and use integer here. That will save you some space (reading from flash is
> still performance bottleneck, especially if the flash is SD card) and some
> unnecessarily costly string comparisons. In code, you'd obviously use
> symbolic constants.
>
> > value TEXT
>
> Obviously you need index on DATA(file_id, data_type, value)
>
> Yes, all three columns. The first two will appear in your query, so have to
> be first two having the result also included in the index saves time,
> because
> now everything can be obtained from the index and the table itself does not
> have to be fetched.
>

I will try this, thanks.


>
> > data_type can be one of:
> > (title, description, date_taken, latitude,longitude, duration,
> > album,track_nr,genre, artist, year)
>
> Why this "hypernormalized" structure. When the set is fixed and there can
> be
> only one of each for each file, a big table with one column for each
> attribute will do you much better service.
>

it's not fixed, for photos for example it's possible that we would want to
add some other exif values, also some mp3 tags can have more then one value
(artist, genre).


>  Just create FILES with
>file_id integer primary key,
>name text,
>title text,
>description text,
>date_taken text, /* or integer if you decide to store timestamps instead
> */
>latitude number,
>longitude number,
>duration number,
>album text,
>track_nr integer,
>genre text,
>artist text,
>year integer
>

The files table has some other fields, I only mentioned those that I want in
the result (there is also a path, favorite flag, parent_id, size,state etc.
10 columns in total), so I don't think a table with 20+ columns is a good
idea, also there is the problem of adding new data types that would require
altering the files table structure. And most of the time when I display the
files list i don't need these specific media fields.



>
> > it's possible that a music from files does not have any data associated
> with
> > it, in this case i want the name to be in the result set
> >
> > This has to run as fast as possible. A left join between these tables is
> too
> > slow, for 10.000 entries it takes around 15 seconds just to navigate
> through
> > the cursor, if I add a where clause selecting only one kind of data then
> it
> > reduces to less than 5 seconds which is acceptable.
>
> Last but not least, "explain query plan" is your friend. If you prefix your
> query with "explain query plan" and run it against the database (you can
> use
> the command-line shell or some management tool), sqlite will tell you what
> tables it would read, in which order, using which indices and how big it
> expects the result set to be.
>
> So create various indices and experiment with tweaking the query and look
> what explain query plan tells you. Reading by primary key is fastest,
> followed by covering index, noncovering index and temporary index or linear
> search are worst (except join by temporary b-tree is near-optimal when you
> are not filtering out anything).
>
> Sqlite before 3.7.4 didn't support temporary indices and temporary b-tree
> joins, so it is much more critical to have good indices in older versions,
> because they did happily regress to quadratic or worse complexity and that
> would take ages to complete.
>
> Oh, and remember to remove the indices you end up not using to save space
> and time needed to keep them up to date.
>

On my Android device sqlite version is  3.7.2
Thanks a lot, i will try the "explain query plan"
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Lazarus 101
On Tue, Jun 28, 2011 at 5:48 PM, Simon Slavin  wrote:

>
> On 28 Jun 2011, at 3:33pm, Lazarus 101 wrote:
>
> > FILES
> > file_id INTEGER NOT NULL,
> > name TEXT
>
> I assume that SQLite has identified 'file_id' as its own 'rowid' column and
> made in INDEX for it.
>

it's also marked as primary key


> You should make an index on the columns 'file_id' and 'data_type' from the
> 'DATA' table.  This will allow it to be searched far more quickly.  Your
> command will be something like
>
> CREATE UNIQUE INDEX dfd ON data (file_id,data_type)
>
> Then do the above testing again.
>

tried that and it didn't help much


> If it's still not fast enough, one possibility would be to create a TABLE
> which actually reflects your data.  Presumably one that reflects your layout
>
> file_id | name | duration | genre | artist | description | album | track_nr
> | year
>
> You could use TRIGGERs to make this table change whenever your DATA table
> changes.  Or you could do it in software.  Or you could abandon your DATA
> table entirely.


thanks, I will try this. Do you think that setting a trigger will have a
performance impact on the insert statements? I receive the file list from
the network in chunks of 1000 files and save them in the db and this also
has to be fast (it's currently less then 500 millis per insert), or is it
better to create this table after the initial sync and set the trigger only
for future changes?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Jan Hudec
On Tue, Jun 28, 2011 at 17:33:23 +0300, Lazarus 101 wrote:
> Hi guys, i'm working on an Android app and using sqlite to store some data
> and i need some help with a query.
> 
> I have the following table structure:
> 
> FILES
> file_id INTEGER NOT NULL,
   Do file_ids repeat? If not, it should be "integer
  primary key". Than sqlite will alias the column to the "rowid"
  column the table is internally ordered by for improved performance
  looking up by file_id and saving some space (one fewer columns).
> name TEXT
> 
> DATA
> file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE,
> data_type TEXT,

If nothing else, you want to define integer identifiers for the data types
and use integer here. That will save you some space (reading from flash is
still performance bottleneck, especially if the flash is SD card) and some
unnecessarily costly string comparisons. In code, you'd obviously use
symbolic constants.

> value TEXT

Obviously you need index on DATA(file_id, data_type, value)

Yes, all three columns. The first two will appear in your query, so have to
be first two having the result also included in the index saves time, because
now everything can be obtained from the index and the table itself does not
have to be fetched.

> data_type can be one of:
> (title, description, date_taken, latitude,longitude, duration,
> album,track_nr,genre, artist, year)

Why this "hypernormalized" structure. When the set is fixed and there can be
only one of each for each file, a big table with one column for each
attribute will do you much better service. Just create FILES with

file_id integer primary key,
name text,
title text,
description text,
date_taken text, /* or integer if you decide to store timestamps instead */
latitude number,
longitude number,
duration number,
album text,
track_nr integer,
genre text,
artist text,
year integer

Or you can split up to generic attributes, photograph attributes (date_taken,
latitude, longitude) and song attributes (duration, album, track_nr, genre,
artist, year), but I don't think you would save space (the extra row id and
foreign key will take more than the nulls) and you definitely wouldn't save
time.

You certainly want index on the name column as I suppose that will be your
initial input.

You can combine it with separate table for data with multiple values per
file. E.g. tags:

file_id integer references files(file_id) on delete cascade,
tag text,
value text

If it's user-defined tags, you want text, but if it's just additional
multi-valued entries defined by the application, use integer identifiers to
save space and unnecessarily costly string comparisons.

> Now, I want to select all music details in the following format
> file_id | name | duration | genre | artist | description | album | track_nr
> | year

Of course with the less normalized structure, that's result of simple select
from single table.

> it's possible that a music from files does not have any data associated with
> it, in this case i want the name to be in the result set
> 
> This has to run as fast as possible. A left join between these tables is too
> slow, for 10.000 entries it takes around 15 seconds just to navigate through
> the cursor, if I add a where clause selecting only one kind of data then it
> reduces to less than 5 seconds which is acceptable.

Last but not least, "explain query plan" is your friend. If you prefix your
query with "explain query plan" and run it against the database (you can use
the command-line shell or some management tool), sqlite will tell you what
tables it would read, in which order, using which indices and how big it
expects the result set to be.

So create various indices and experiment with tweaking the query and look
what explain query plan tells you. Reading by primary key is fastest,
followed by covering index, noncovering index and temporary index or linear
search are worst (except join by temporary b-tree is near-optimal when you
are not filtering out anything).

Sqlite before 3.7.4 didn't support temporary indices and temporary b-tree
joins, so it is much more critical to have good indices in older versions,
because they did happily regress to quadratic or worse complexity and that
would take ages to complete.

Oh, and remember to remove the indices you end up not using to save space
and time needed to keep them up to date.

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Simon Slavin

On 28 Jun 2011, at 3:33pm, Lazarus 101 wrote:

> FILES
> file_id INTEGER NOT NULL,
> name TEXT

I assume that SQLite has identified 'file_id' as its own 'rowid' column and 
made in INDEX for it.

> DATA
> file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE,
> data_type TEXT,
> value TEXT
> 
> data_type can be one of:
> (title, description, date_taken, latitude,longitude, duration,
> album,track_nr,genre, artist, year)
> 
> as you've probably guessed this is used to store information about some
> media files.
> 
> Now, I want to select all music details in the following format
> file_id | name | duration | genre | artist | description | album | track_nr
> | year
> 
> it's possible that a music from files does not have any data associated with
> it, in this case i want the name to be in the result set
> 
> This has to run as fast as possible. A left join between these tables is too
> slow, for 10.000 entries it takes around 15 seconds just to navigate through
> the cursor

You should make an index on the columns 'file_id' and 'data_type' from the 
'DATA' table.  This will allow it to be searched far more quickly.  Your 
command will be something like

CREATE UNIQUE INDEX dfd ON data (file_id,data_type)

Then do the above testing again.

If it's still not fast enough, one possibility would be to create a TABLE which 
actually reflects your data.  Presumably one that reflects your layout

file_id | name | duration | genre | artist | description | album | track_nr | 
year

You could use TRIGGERs to make this table change whenever your DATA table 
changes.  Or you could do it in software.  Or you could abandon your DATA table 
entirely.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help needed to optimize a query

2011-06-28 Thread Pavel Ivanov
> This has to run as fast as possible. A left join between these tables is too
> slow, for 10.000 entries it takes around 15 seconds just to navigate through
> the cursor, if I add a where clause selecting only one kind of data then it
> reduces to less than 5 seconds which is acceptable.

What kind of selects did you do here?

Generally in my opinion if you want to get all data as one table with
one file per row you shouldn't do that using just SELECT statement.
You should get raw data with SELECT and then transform it into the
necessary table format in your application.


Pavel


On Tue, Jun 28, 2011 at 10:33 AM, Lazarus 101  wrote:
> Hi guys, i'm working on an Android app and using sqlite to store some data
> and i need some help with a query.
>
> I have the following table structure:
>
> FILES
> file_id INTEGER NOT NULL,
> name TEXT
>
> DATA
> file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE,
> data_type TEXT,
> value TEXT
>
> data_type can be one of:
> (title, description, date_taken, latitude,longitude, duration,
> album,track_nr,genre, artist, year)
>
> as you've probably guessed this is used to store information about some
> media files.
>
> Now, I want to select all music details in the following format
> file_id | name | duration | genre | artist | description | album | track_nr
> | year
>
> it's possible that a music from files does not have any data associated with
> it, in this case i want the name to be in the result set
>
> This has to run as fast as possible. A left join between these tables is too
> slow, for 10.000 entries it takes around 15 seconds just to navigate through
> the cursor, if I add a where clause selecting only one kind of data then it
> reduces to less than 5 seconds which is acceptable.
>
> Thanks a lot for any input.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] help needed to optimize a query

2011-06-28 Thread Lazarus 101
Hi guys, i'm working on an Android app and using sqlite to store some data
and i need some help with a query.

I have the following table structure:

FILES
file_id INTEGER NOT NULL,
name TEXT

DATA
file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE,
data_type TEXT,
value TEXT

data_type can be one of:
(title, description, date_taken, latitude,longitude, duration,
album,track_nr,genre, artist, year)

as you've probably guessed this is used to store information about some
media files.

Now, I want to select all music details in the following format
file_id | name | duration | genre | artist | description | album | track_nr
| year

it's possible that a music from files does not have any data associated with
it, in this case i want the name to be in the result set

This has to run as fast as possible. A left join between these tables is too
slow, for 10.000 entries it takes around 15 seconds just to navigate through
the cursor, if I add a where clause selecting only one kind of data then it
reduces to less than 5 seconds which is acceptable.

Thanks a lot for any input.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users