Re: [sqlite] help needed to optimize a query
On Tue, Jun 28, 2011 at 8:18 PM, Jan Hudec wrote: > > name TEXT > > > > DATA > > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE, > > data_type TEXT, > > If nothing else, you want to define integer identifiers for the data types > and use integer here. That will save you some space (reading from flash is > still performance bottleneck, especially if the flash is SD card) and some > unnecessarily costly string comparisons. In code, you'd obviously use > symbolic constants. > i made this change but did not see any performance improvements, but it makes sense to keep it. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
On 29 Jun 2011, at 2:26am, Lazarus 101 wrote: > the select statement is > SELECT * from files left join data on files.file_id=data.file_id; So you read all the records for the correct file_id, and deal with each one as you find it, ignoring those you don't want. Hmm. I don't see why your app isn't far faster with the index. Can an SQLite expert explain it ? By the way, when SQLite does create the index automatically it deletes it when the SELECT finishes. So in your real application it'll keep recreating the index for every SELECT. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
the select statement is SELECT * from files left join data on files.file_id=data.file_id; to test the performance i'm only doing long t1 = System.currentTimeMillis(); Cursor cursor = db.rawQuery(...); while (cursor.moveToNext()) { } android.util.Log.e(TAG, "loaded in: " + (System.currentTimeMillis() -t1)); and the results are without index: 8143 with index:7039 this is for 1453 entries in resources and 7697 entries in data the output for explain query plan (without the above index) is 0|0|TABLE files 1|1|TABLE data WITH INDEX sqlite_autoindex_data_1 so it looks like it creates an index automatically and that's why there is no big improvement On Wed, Jun 29, 2011 at 2:30 AM, Simon Slavin wrote: > > On 28 Jun 2011, at 9:58pm, Lazarus 101 wrote: > > >> You should make an index on the columns 'file_id' and 'data_type' from > the > >> 'DATA' table. This will allow it to be searched far more quickly. Your > >> command will be something like > >> > >> CREATE UNIQUE INDEX dfd ON data (file_id,data_type) > >> > >> Then do the above testing again. > > > > tried that and it didn't help much > > Something is wrong with your logic or programming. Having this index > should make a huge difference in the speed of lookup. How are you finding > the entries in the TABLE called 'data' ? > > Actually, as Jan posted, instead of the above index do this one: > > CREATE UNIQUE INDEX dfdv ON data (file_id,data_type, value) > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
On 28 Jun 2011, at 9:58pm, Lazarus 101 wrote: >> You should make an index on the columns 'file_id' and 'data_type' from the >> 'DATA' table. This will allow it to be searched far more quickly. Your >> command will be something like >> >> CREATE UNIQUE INDEX dfd ON data (file_id,data_type) >> >> Then do the above testing again. > > tried that and it didn't help much Something is wrong with your logic or programming. Having this index should make a huge difference in the speed of lookup. How are you finding the entries in the TABLE called 'data' ? Actually, as Jan posted, instead of the above index do this one: CREATE UNIQUE INDEX dfdv ON data (file_id,data_type, value) Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
On Tue, Jun 28, 2011 at 8:18 PM, Jan Hudec wrote: > On Tue, Jun 28, 2011 at 17:33:23 +0300, Lazarus 101 wrote: > > Hi guys, i'm working on an Android app and using sqlite to store some > data > > and i need some help with a query. > > > > I have the following table structure: > > > > FILES > > file_id INTEGER NOT NULL, > Do file_ids repeat? If not, it should be > "integer > primary key". Than sqlite will alias the column to the "rowid" > column the table is internally ordered by for improved performance > looking up by file_id and saving some space (one fewer columns). > yes, that was already marked as primary key but it was at the end of the create statement, that's why i forgot to mention it. > > name TEXT > > > > DATA > > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE, > > data_type TEXT, > > If nothing else, you want to define integer identifiers for the data types > and use integer here. That will save you some space (reading from flash is > still performance bottleneck, especially if the flash is SD card) and some > unnecessarily costly string comparisons. In code, you'd obviously use > symbolic constants. > > > value TEXT > > Obviously you need index on DATA(file_id, data_type, value) > > Yes, all three columns. The first two will appear in your query, so have to > be first two having the result also included in the index saves time, > because > now everything can be obtained from the index and the table itself does not > have to be fetched. > I will try this, thanks. > > > data_type can be one of: > > (title, description, date_taken, latitude,longitude, duration, > > album,track_nr,genre, artist, year) > > Why this "hypernormalized" structure. When the set is fixed and there can > be > only one of each for each file, a big table with one column for each > attribute will do you much better service. > it's not fixed, for photos for example it's possible that we would want to add some other exif values, also some mp3 tags can have more then one value (artist, genre). > Just create FILES with >file_id integer primary key, >name text, >title text, >description text, >date_taken text, /* or integer if you decide to store timestamps instead > */ >latitude number, >longitude number, >duration number, >album text, >track_nr integer, >genre text, >artist text, >year integer > The files table has some other fields, I only mentioned those that I want in the result (there is also a path, favorite flag, parent_id, size,state etc. 10 columns in total), so I don't think a table with 20+ columns is a good idea, also there is the problem of adding new data types that would require altering the files table structure. And most of the time when I display the files list i don't need these specific media fields. > > > it's possible that a music from files does not have any data associated > with > > it, in this case i want the name to be in the result set > > > > This has to run as fast as possible. A left join between these tables is > too > > slow, for 10.000 entries it takes around 15 seconds just to navigate > through > > the cursor, if I add a where clause selecting only one kind of data then > it > > reduces to less than 5 seconds which is acceptable. > > Last but not least, "explain query plan" is your friend. If you prefix your > query with "explain query plan" and run it against the database (you can > use > the command-line shell or some management tool), sqlite will tell you what > tables it would read, in which order, using which indices and how big it > expects the result set to be. > > So create various indices and experiment with tweaking the query and look > what explain query plan tells you. Reading by primary key is fastest, > followed by covering index, noncovering index and temporary index or linear > search are worst (except join by temporary b-tree is near-optimal when you > are not filtering out anything). > > Sqlite before 3.7.4 didn't support temporary indices and temporary b-tree > joins, so it is much more critical to have good indices in older versions, > because they did happily regress to quadratic or worse complexity and that > would take ages to complete. > > Oh, and remember to remove the indices you end up not using to save space > and time needed to keep them up to date. > On my Android device sqlite version is 3.7.2 Thanks a lot, i will try the "explain query plan" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
On Tue, Jun 28, 2011 at 5:48 PM, Simon Slavin wrote: > > On 28 Jun 2011, at 3:33pm, Lazarus 101 wrote: > > > FILES > > file_id INTEGER NOT NULL, > > name TEXT > > I assume that SQLite has identified 'file_id' as its own 'rowid' column and > made in INDEX for it. > it's also marked as primary key > You should make an index on the columns 'file_id' and 'data_type' from the > 'DATA' table. This will allow it to be searched far more quickly. Your > command will be something like > > CREATE UNIQUE INDEX dfd ON data (file_id,data_type) > > Then do the above testing again. > tried that and it didn't help much > If it's still not fast enough, one possibility would be to create a TABLE > which actually reflects your data. Presumably one that reflects your layout > > file_id | name | duration | genre | artist | description | album | track_nr > | year > > You could use TRIGGERs to make this table change whenever your DATA table > changes. Or you could do it in software. Or you could abandon your DATA > table entirely. thanks, I will try this. Do you think that setting a trigger will have a performance impact on the insert statements? I receive the file list from the network in chunks of 1000 files and save them in the db and this also has to be fast (it's currently less then 500 millis per insert), or is it better to create this table after the initial sync and set the trigger only for future changes? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
On Tue, Jun 28, 2011 at 17:33:23 +0300, Lazarus 101 wrote: > Hi guys, i'm working on an Android app and using sqlite to store some data > and i need some help with a query. > > I have the following table structure: > > FILES > file_id INTEGER NOT NULL, Do file_ids repeat? If not, it should be "integer primary key". Than sqlite will alias the column to the "rowid" column the table is internally ordered by for improved performance looking up by file_id and saving some space (one fewer columns). > name TEXT > > DATA > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE, > data_type TEXT, If nothing else, you want to define integer identifiers for the data types and use integer here. That will save you some space (reading from flash is still performance bottleneck, especially if the flash is SD card) and some unnecessarily costly string comparisons. In code, you'd obviously use symbolic constants. > value TEXT Obviously you need index on DATA(file_id, data_type, value) Yes, all three columns. The first two will appear in your query, so have to be first two having the result also included in the index saves time, because now everything can be obtained from the index and the table itself does not have to be fetched. > data_type can be one of: > (title, description, date_taken, latitude,longitude, duration, > album,track_nr,genre, artist, year) Why this "hypernormalized" structure. When the set is fixed and there can be only one of each for each file, a big table with one column for each attribute will do you much better service. Just create FILES with file_id integer primary key, name text, title text, description text, date_taken text, /* or integer if you decide to store timestamps instead */ latitude number, longitude number, duration number, album text, track_nr integer, genre text, artist text, year integer Or you can split up to generic attributes, photograph attributes (date_taken, latitude, longitude) and song attributes (duration, album, track_nr, genre, artist, year), but I don't think you would save space (the extra row id and foreign key will take more than the nulls) and you definitely wouldn't save time. You certainly want index on the name column as I suppose that will be your initial input. You can combine it with separate table for data with multiple values per file. E.g. tags: file_id integer references files(file_id) on delete cascade, tag text, value text If it's user-defined tags, you want text, but if it's just additional multi-valued entries defined by the application, use integer identifiers to save space and unnecessarily costly string comparisons. > Now, I want to select all music details in the following format > file_id | name | duration | genre | artist | description | album | track_nr > | year Of course with the less normalized structure, that's result of simple select from single table. > it's possible that a music from files does not have any data associated with > it, in this case i want the name to be in the result set > > This has to run as fast as possible. A left join between these tables is too > slow, for 10.000 entries it takes around 15 seconds just to navigate through > the cursor, if I add a where clause selecting only one kind of data then it > reduces to less than 5 seconds which is acceptable. Last but not least, "explain query plan" is your friend. If you prefix your query with "explain query plan" and run it against the database (you can use the command-line shell or some management tool), sqlite will tell you what tables it would read, in which order, using which indices and how big it expects the result set to be. So create various indices and experiment with tweaking the query and look what explain query plan tells you. Reading by primary key is fastest, followed by covering index, noncovering index and temporary index or linear search are worst (except join by temporary b-tree is near-optimal when you are not filtering out anything). Sqlite before 3.7.4 didn't support temporary indices and temporary b-tree joins, so it is much more critical to have good indices in older versions, because they did happily regress to quadratic or worse complexity and that would take ages to complete. Oh, and remember to remove the indices you end up not using to save space and time needed to keep them up to date. -- Jan 'Bulb' Hudec ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
On 28 Jun 2011, at 3:33pm, Lazarus 101 wrote: > FILES > file_id INTEGER NOT NULL, > name TEXT I assume that SQLite has identified 'file_id' as its own 'rowid' column and made in INDEX for it. > DATA > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE, > data_type TEXT, > value TEXT > > data_type can be one of: > (title, description, date_taken, latitude,longitude, duration, > album,track_nr,genre, artist, year) > > as you've probably guessed this is used to store information about some > media files. > > Now, I want to select all music details in the following format > file_id | name | duration | genre | artist | description | album | track_nr > | year > > it's possible that a music from files does not have any data associated with > it, in this case i want the name to be in the result set > > This has to run as fast as possible. A left join between these tables is too > slow, for 10.000 entries it takes around 15 seconds just to navigate through > the cursor You should make an index on the columns 'file_id' and 'data_type' from the 'DATA' table. This will allow it to be searched far more quickly. Your command will be something like CREATE UNIQUE INDEX dfd ON data (file_id,data_type) Then do the above testing again. If it's still not fast enough, one possibility would be to create a TABLE which actually reflects your data. Presumably one that reflects your layout file_id | name | duration | genre | artist | description | album | track_nr | year You could use TRIGGERs to make this table change whenever your DATA table changes. Or you could do it in software. Or you could abandon your DATA table entirely. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help needed to optimize a query
> This has to run as fast as possible. A left join between these tables is too > slow, for 10.000 entries it takes around 15 seconds just to navigate through > the cursor, if I add a where clause selecting only one kind of data then it > reduces to less than 5 seconds which is acceptable. What kind of selects did you do here? Generally in my opinion if you want to get all data as one table with one file per row you shouldn't do that using just SELECT statement. You should get raw data with SELECT and then transform it into the necessary table format in your application. Pavel On Tue, Jun 28, 2011 at 10:33 AM, Lazarus 101 wrote: > Hi guys, i'm working on an Android app and using sqlite to store some data > and i need some help with a query. > > I have the following table structure: > > FILES > file_id INTEGER NOT NULL, > name TEXT > > DATA > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE, > data_type TEXT, > value TEXT > > data_type can be one of: > (title, description, date_taken, latitude,longitude, duration, > album,track_nr,genre, artist, year) > > as you've probably guessed this is used to store information about some > media files. > > Now, I want to select all music details in the following format > file_id | name | duration | genre | artist | description | album | track_nr > | year > > it's possible that a music from files does not have any data associated with > it, in this case i want the name to be in the result set > > This has to run as fast as possible. A left join between these tables is too > slow, for 10.000 entries it takes around 15 seconds just to navigate through > the cursor, if I add a where clause selecting only one kind of data then it > reduces to less than 5 seconds which is acceptable. > > Thanks a lot for any input. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] help needed to optimize a query
Hi guys, i'm working on an Android app and using sqlite to store some data and i need some help with a query. I have the following table structure: FILES file_id INTEGER NOT NULL, name TEXT DATA file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE, data_type TEXT, value TEXT data_type can be one of: (title, description, date_taken, latitude,longitude, duration, album,track_nr,genre, artist, year) as you've probably guessed this is used to store information about some media files. Now, I want to select all music details in the following format file_id | name | duration | genre | artist | description | album | track_nr | year it's possible that a music from files does not have any data associated with it, in this case i want the name to be in the result set This has to run as fast as possible. A left join between these tables is too slow, for 10.000 entries it takes around 15 seconds just to navigate through the cursor, if I add a where clause selecting only one kind of data then it reduces to less than 5 seconds which is acceptable. Thanks a lot for any input. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users