Re: [sqlite] EXT : Speed up count(distinct col)
Hi, thanks a lot for the helpful replies. Sorry to bother again, but there is still something that hasn't been answered. Simon Slavin-3 has addressed my question but not exactly what I was asking. Suppose I have two tables "movies" and "people" and other tables to relate both tables: "directors", "writers", etc... Simon Slavin-3 told me to create one single table to relate "movies" and "people": MoviePeople: id, movie, person, capacity But with that solution I can't perform the fast count(*) that I want. I would have to do: SELECT count(distinct person) FROM MoviePeople WHERE capacity = "director"; I want to know if there is any standard solution for normalizing this database so I can do count(*) to count all directors or writers... Thanks in advance for your help and patience. -- View this message in context: http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30892423.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
You have: CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id)); CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag)); You can solve your problem, using pure SQL. No need to resort to the application layer. Just execute the SQL transaction below. It takes care of everything, including normalizing, assigning integer IDs to unique tags, filtering our duplicates and auto indexes. I haven't tested on your tables specifically, but I do similar all the time. Any errors should be simple name errors. begin immediate; create temp table "Movies Old" as select * from Movies; create temp table "Tags Old" from Tags; drop table Movies; drop table Tags; create table Movies ( Movie_ID integer primary key not null , Title text collate no case not null ); create table Tags ( Tag_ID integer primary key not null , Name text not null unique collate nocase ); create table Movie_Tags ( ID integer primary key not null , Movie_ID integer not null references Movies (Movie_ID) on delete cascade , Tag_ID integer not null references Tags (Tag_ID) on delete cascade , unique (Movie_ID, Tag_ID) ); insert into Movies (Movie_ID, Title) select Movie_ID, Title from "Movies Old"; insert or ignore into Tags (Name) select distinct Tag from "Tags Old"; insert into Movie_Tags (Movie_ID, Tag_ID) select distinct Movie_ID, (select Tag_ID from Tags where Name = "Tags Old".Tag) from "Tags Old"; drop table "Movies Old"; drop table "Tags Old"; commit or rollback; Note that the last line "commit or rollback" is not an actual SQLite command. You will need to manually decide at that point whether to issue a commit or rollback. If there are any errors in previous lines, use rollback. Unfortunately, SQLite does not automatically rollback all commands in a transaction if one of the commands fails (specifically create and drop commands). Then you should be able to count tags very fast by: select count(*) from Tags; In order to make foreign keys work, you need to have SQLite version 3.6.19 or later, and use this pragma when you open a connection to SQLite (ie before any commands that require use of foreign keys): pragma foreign_keys = yes; A normalized database using integer keys is fast, small and elegant. Viewing integers by users is ugly. So for viewing the Movie-Tag combinations (and data entry), you might want to create a view to make it user friendly. It depends on your user interface. Reply here if you need that. Just something like this: create view "Movie Tags View" as select Movie_Tags.ID as ID , Movies.Title as "Movie" , Tags.Name as "Tag" from Movie_Tags join Movies on Movie_Tags.Movie_ID = Movies.Movie_ID join Tags on Movie_Tags.Tag_ID = Tags.Tag_ID ; Hope this helps, Tom BareFeetWare -- Comparison of SQLite GUI tools: http://www.barefeetware.com/sqlite/compare/?ml ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
On 7 Feb 2011, at 11:19pm, Yuzem wrote: > One more thing: > I have the tables "movies" and "people" > Those two tables are related by tables "directors", "writers", etc... > movies: id, title > people: id, name > > directors: movieId, peopleId > writers: movieId, peopleId > etc... > > How can I normalize that so I can count directors and writers using > count(*)? > Should I create another table for "directors" and another for "writers", > etc... with only the ids? > I will end up having: > movies > movies_directors < (directors people) This is very basic database design. You need to read up not on SQLite or even on SQL, but some books on how to arrange your data in databases. One solution would include these three tables: Movies: id, name, releaseDate, etc.. People: id, name, DOB, biog, etc.. MoviePeople: id, movie, person, capacity Fill your Movies TABLE. Fill your People TABLE with actors, directors, writers, etc.. Then fill the MoviePeople table with connections between the two: Clint Eastwood worked in one movie as actor, but he worked in another move as director, etc.. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
Ok, thanks all for your answers, I guest I will have to normalize the database and explore the foreign key feature. One more thing: I have the tables "movies" and "people" Those two tables are related by tables "directors", "writers", etc... movies: id, title people: id, name directors: movieId, peopleId writers: movieId, peopleId etc... How can I normalize that so I can count directors and writers using count(*)? Should I create another table for "directors" and another for "writers", etc... with only the ids? I will end up having: movies > movies_directors < (directors people) Thanks again! -- View this message in context: http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30868907.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
00 > 4 Goto 0 18 0 00 > 5 OpenRead 0 2 0 2 00 > 6 Rewind 0 13 0 00 > 7 Column 0 1 3 00 > 8 Found 1 12 3 1 00 > 9 MakeRecord 3 1 4 00 > 10 IdxInsert 1 4 0 00 > 11 AggStep 0 3 1 count(1) 01 > 12 Next 0 7 0 01 > 13 Close 0 0 0 00 > 14 AggFinal 1 1 0 count(1) 00 > 15 Copy 1 5 0 00 > 16 ResultRow 5 1 0 00 > 17 Halt 0 0 0 00 > 18 Transaction 0 0 0 00 > 19 VerifyCookie 0 2 0 00 > 20 TableLock 0 2 0 tags 00 > 21 Goto 0 5 0 00 > > Michael D. Black > Senior Scientist > NG Information Systems > Advanced Analytics Directorate > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Yuzem [naujnit...@gmail.com] > Sent: Monday, February 07, 2011 2:13 PM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] EXT : Speed up count(distinct col) > > Black, Michael (IS) wrote: >> >> Test#1 >> create index tags_index on tags(tag); >> You should have an index for any fields you query on like this. >> > > Thanks Michael but I don't see any speed improvement: > create index test on tags(tag); > select count(distinct tag) from tags; > > This is much faster: > select count(*) from tags; > > Am I doing something wrong? > -- > View this message in context: > http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
oto 0 5 000 Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Yuzem [naujnit...@gmail.com] Sent: Monday, February 07, 2011 2:13 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] EXT : Speed up count(distinct col) Black, Michael (IS) wrote: > > Test#1 > create index tags_index on tags(tag); > You should have an index for any fields you query on like this. > Thanks Michael but I don't see any speed improvement: create index test on tags(tag); select count(distinct tag) from tags; This is much faster: select count(*) from tags; Am I doing something wrong? -- View this message in context: http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
Yuzem wrote: > > > > Test#1 > > create index tags_index on tags(tag); > > You should have an index for any fields you query on like this. > > > > Thanks Michael but I don't see any speed improvement: > create index test on tags(tag); > select count(distinct tag) from tags; > > This is much faster: > select count(*) from tags; > > Am I doing something wrong? As Michael mentioned, you might be getting killed by string comparisons. I'm no SQL expert, so gurus are welcome to add to the following without insulting me. To beef up Michael's suggestion, try something like this: CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id)); (BTW, you should consider making movie_id "INTEGER PRIMARY KEY" and then remove "unique(movie_id)" -- as long as that otherwise makes sense for your architecture. For SQLite-specific reasons that will probably be faster.) CREATE TABLE tagNames(tagId INTEGER PRIMARY KEY, tagName TEXT UNIQUE); CREATE INDEX tagNamesIdxtagName on tagNames(tagName); CREATE TABLE tags(movie_id INTEGER, tagId INTEGER REFERENCES tagNames, unique(movie_id,tag)); CREATE INDEX tagsIdxTagId on tags(tagId); Then see how fast it is to ask SELECT COUNT(*) FROM tagString; Note: The index tagNamesIdxtagName is there because you'll probably want it to speed up insertions into 'tags'. Note: If foreign key checking is on, be careful about deletes on table "tagNames". Without an index on tags(tagId) a delete of a single row on tagNames implies a full table scan on tags (to make sure there are no rows referencing it). That bit me in the past. Eric -- Eric A. Smith The concept is interesting and well-formed, but in order to earn better than a 'C,' the idea must be feasible. -- A Yale University management professor in response to Fred Smith's paper proposing reliable overnight delivery service. (Smith went on to found Federal Express Corp.) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
Black, Michael (IS) wrote: > > Test#1 > create index tags_index on tags(tag); > You should have an index for any fields you query on like this. > Thanks Michael but I don't see any speed improvement: create index test on tags(tag); select count(distinct tag) from tags; This is much faster: select count(*) from tags; Am I doing something wrong? -- View this message in context: http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : Speed up count(distinct col)
Test#1 create index tags_index on tags(tag); You should have an index for any fields you query on like this. Test#2 Normalize your tags into a separate table so you then store the rowid in your tags table instead of the string. Your compares will be notably faster using integers rather than strings for more fancy queries. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Yuzem [naujnit...@gmail.com] Sent: Monday, February 07, 2011 9:26 AM To: sqlite-users@sqlite.org Subject: EXT :[sqlite] Speed up count(distinct col) Hi. I have this tables: CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id)); CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag)); I have many movies by tag and many tables like "tags" (keywords, countries, languages, genres, etc..) I can count different movies very fast with: SELECT count(*) FROM movies; But the other tables are much slower depending on the size of the table: SELECT count(distinct tag) FROM tags; My solution is to create an additional table "count_tags" and then every time a tag is added to table "tags" a trigger adds the tag to "count_tags", I need also another trigger to remove the tag, then I can do: SELECT count(*) FROM count_tags; This solution implies one additional table and two triggers by each table. I wanted to know if there is a simpler solution, maybe by using indexes. Thanks in advance. -- View this message in context: http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30864622.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users