Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-10 Thread Yuzem

Hi, thanks a lot for the helpful replies.
Sorry to bother again, but there is still something that hasn't been
answered.

Simon Slavin-3 has addressed my question but not exactly what I was asking.
Suppose I have two tables "movies" and "people" and other tables to relate
both tables: "directors", "writers", etc...

Simon Slavin-3 told me to create one single table to relate "movies" and
"people":
MoviePeople: id, movie, person, capacity

But with that solution I can't perform the fast count(*) that I want.
I would have to do:
SELECT count(distinct person) FROM MoviePeople WHERE capacity = "director";

I want to know if there is any standard solution for normalizing this
database so I can do count(*) to count all directors or writers...

Thanks in advance for your help and patience.
-- 
View this message in context: 
http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30892423.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread BareFeetWare
You have:

CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id));
CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag));

You can solve your problem, using pure SQL. No need to resort to the 
application layer. Just execute the SQL transaction below. It takes care of 
everything, including normalizing, assigning integer IDs to unique tags, 
filtering our duplicates and auto indexes. I haven't tested on your tables 
specifically, but I do similar all the time. Any errors should be simple name 
errors.

begin immediate;
create temp table "Movies Old" as select * from Movies;
create temp table "Tags Old" from Tags;
drop table Movies;
drop table Tags;
create table Movies
(   Movie_ID integer primary key not null
,   Title text collate no case not null
);
create table Tags
(   Tag_ID integer primary key not null
,   Name text not null unique collate nocase
);
create table Movie_Tags
(   ID integer primary key not null
,   Movie_ID integer not null
references Movies (Movie_ID) on delete cascade
,   Tag_ID integer not null
references Tags (Tag_ID) on delete cascade
,   unique (Movie_ID, Tag_ID)
);
insert into Movies (Movie_ID, Title) select Movie_ID, Title from "Movies Old";

insert or ignore into Tags (Name) select distinct Tag from "Tags Old";

insert into Movie_Tags (Movie_ID, Tag_ID)
select distinct Movie_ID, (select Tag_ID from Tags where Name = "Tags Old".Tag)
from "Tags Old";

drop table "Movies Old";
drop table "Tags Old";

commit or rollback;

Note that the last line "commit or rollback" is not an actual SQLite command. 
You will need to manually decide at that point whether to issue a commit or 
rollback. If there are any errors in previous lines, use rollback. 
Unfortunately, SQLite does not automatically rollback all commands in a 
transaction if one of the commands fails (specifically create and drop 
commands).

Then you should be able to count tags very fast by:

select count(*) from Tags;

In order to make foreign keys work, you need to have SQLite version 3.6.19 or 
later, and use this pragma when you open a connection to SQLite (ie before any 
commands that require use of foreign keys):

pragma foreign_keys = yes;

A normalized database using integer keys is fast, small and elegant. Viewing 
integers by users is ugly. So for viewing the Movie-Tag combinations (and data 
entry), you might want to create a view to make it user friendly. It depends on 
your user interface. Reply here if you need that. Just something like this:

create view "Movie Tags View"
as select
Movie_Tags.ID as ID
,   Movies.Title as "Movie"
,   Tags.Name as "Tag"
from Movie_Tags
join Movies on Movie_Tags.Movie_ID = Movies.Movie_ID
join Tags on Movie_Tags.Tag_ID = Tags.Tag_ID
;

Hope this helps,
Tom
BareFeetWare

--
Comparison of SQLite GUI tools:
http://www.barefeetware.com/sqlite/compare/?ml
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Simon Slavin

On 7 Feb 2011, at 11:19pm, Yuzem wrote:

> One more thing:
> I have the tables "movies" and "people"
> Those two tables are related by tables "directors", "writers", etc...
> movies: id, title
> people: id, name
> 
> directors: movieId, peopleId
> writers: movieId, peopleId
> etc...
> 
> How can I normalize that so I can count directors and writers using
> count(*)?
> Should I create another table for "directors" and another for "writers",
> etc... with only the ids?
> I will end up having:
> movies > movies_directors < (directors people)


This is very basic database design.  You need to read up not on SQLite or even 
on SQL, but some books on how to arrange your data in databases.

One solution would include these three tables:

Movies: id, name, releaseDate, etc..

People: id, name, DOB, biog, etc..

MoviePeople: id, movie, person, capacity

Fill your Movies TABLE.  Fill your People TABLE with actors, directors, 
writers, etc..

Then fill the MoviePeople table with connections between the two: Clint 
Eastwood worked in one movie as actor, but he worked in another move as 
director, etc..

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Yuzem

Ok, thanks all for your answers, I guest I will have to normalize the
database and explore the foreign key feature.

One more thing:
I have the tables "movies" and "people"
Those two tables are related by tables "directors", "writers", etc...
movies: id, title
people: id, name

directors: movieId, peopleId
writers: movieId, peopleId
etc...

How can I normalize that so I can count directors and writers using
count(*)?
Should I create another table for "directors" and another for "writers",
etc... with only the ids?
I will end up having:
movies > movies_directors < (directors people)

Thanks again!
-- 
View this message in context: 
http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30868907.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Pavel Ivanov
 00
> 4     Goto           0     18    0                    00
> 5     OpenRead       0     2     0     2              00
> 6     Rewind         0     13    0                    00
> 7     Column         0     1     3                    00
> 8     Found          1     12    3     1              00
> 9     MakeRecord     3     1     4                    00
> 10    IdxInsert      1     4     0                    00
> 11    AggStep        0     3     1     count(1)       01
> 12    Next           0     7     0                    01
> 13    Close          0     0     0                    00
> 14    AggFinal       1     1     0     count(1)       00
> 15    Copy           1     5     0                    00
> 16    ResultRow      5     1     0                    00
> 17    Halt           0     0     0                    00
> 18    Transaction    0     0     0                    00
> 19    VerifyCookie   0     2     0                    00
> 20    TableLock      0     2     0     tags           00
> 21    Goto           0     5     0                    00
>
> Michael D. Black
> Senior Scientist
> NG Information Systems
> Advanced Analytics Directorate
>
>
>
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
> behalf of Yuzem [naujnit...@gmail.com]
> Sent: Monday, February 07, 2011 2:13 PM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] EXT :  Speed up count(distinct col)
>
> Black, Michael (IS) wrote:
>>
>> Test#1
>> create index tags_index on tags(tag);
>> You should have an index for any fields you query on like this.
>>
>
> Thanks Michael but I don't see any speed improvement:
> create index test on tags(tag);
> select count(distinct tag) from tags;
>
> This is much faster:
> select count(*) from tags;
>
> Am I doing something wrong?
> --
> View this message in context: 
> http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Black, Michael (IS)
oto   0 5 000

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Yuzem [naujnit...@gmail.com]
Sent: Monday, February 07, 2011 2:13 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] EXT :  Speed up count(distinct col)

Black, Michael (IS) wrote:
>
> Test#1
> create index tags_index on tags(tag);
> You should have an index for any fields you query on like this.
>

Thanks Michael but I don't see any speed improvement:
create index test on tags(tag);
select count(distinct tag) from tags;

This is much faster:
select count(*) from tags;

Am I doing something wrong?
--
View this message in context: 
http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Eric Smith
Yuzem wrote:

> > 
> > Test#1
> > create index tags_index on tags(tag);
> > You should have an index for any fields you query on like this.
> > 
> 
> Thanks Michael but I don't see any speed improvement:
> create index test on tags(tag);
> select count(distinct tag) from tags;
> 
> This is much faster:
> select count(*) from tags;
> 
> Am I doing something wrong?

As Michael mentioned, you might be getting killed by string comparisons.

I'm no SQL expert, so gurus are welcome to add to the following without
insulting me.

To beef up Michael's suggestion, try something like this:

CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id));

(BTW, you should consider making movie_id "INTEGER PRIMARY KEY" and then
remove "unique(movie_id)" -- as long as that otherwise makes sense for 
your architecture.  For SQLite-specific reasons that will probably be
faster.)

CREATE TABLE tagNames(tagId INTEGER PRIMARY KEY, tagName TEXT UNIQUE);
CREATE INDEX tagNamesIdxtagName on tagNames(tagName);
CREATE TABLE tags(movie_id INTEGER,
  tagId INTEGER REFERENCES tagNames,
  unique(movie_id,tag));

CREATE INDEX tagsIdxTagId on tags(tagId);

Then see how fast it is to ask 
SELECT COUNT(*) FROM tagString;

Note: The index tagNamesIdxtagName is there because you'll probably want it 
to speed up insertions into 'tags'.

Note: If foreign key checking is on, be careful about deletes on table
"tagNames".  Without an index on tags(tagId) a delete of a single row on
tagNames implies a full table scan on tags (to make sure there are no
rows referencing it).  That bit me in the past.

Eric

--
Eric A. Smith

The concept is interesting and well-formed, but in order to earn 
better than a 'C,' the idea must be feasible.
-- A Yale University management professor in response to Fred Smith's paper
   proposing reliable overnight delivery service.
   (Smith went on to found Federal Express Corp.)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Yuzem



Black, Michael (IS) wrote:
> 
> Test#1
> create index tags_index on tags(tag);
> You should have an index for any fields you query on like this.
> 

Thanks Michael but I don't see any speed improvement:
create index test on tags(tag);
select count(distinct tag) from tags;

This is much faster:
select count(*) from tags;

Am I doing something wrong?
-- 
View this message in context: 
http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30867275.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : Speed up count(distinct col)

2011-02-07 Thread Black, Michael (IS)
Test#1
create index tags_index on tags(tag);
You should have an index for any fields you query on like this.

Test#2
Normalize your tags into a separate table so you then store the rowid in your 
tags table instead of the string.  Your compares will be notably faster using 
integers rather than strings for more fancy queries.

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Yuzem [naujnit...@gmail.com]
Sent: Monday, February 07, 2011 9:26 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite]  Speed up count(distinct col)

Hi.
I have this tables:
CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id));
CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag));

I have many movies by tag and many tables like "tags" (keywords, countries,
languages, genres, etc..)
I can count different movies very fast with:
SELECT count(*) FROM movies;

But the other tables are much slower depending on the size of the table:
SELECT count(distinct tag) FROM tags;

My solution is to create an additional table "count_tags" and then every
time a tag is added to table "tags" a trigger adds the tag to "count_tags",
I need also another trigger to remove the tag, then I can do:
SELECT count(*) FROM count_tags;

This solution implies one additional table and two triggers by each table.

I wanted to know if there is a simpler solution, maybe by using indexes.
Thanks in advance.
--
View this message in context: 
http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30864622.html
Sent from the SQLite mailing list archive at Nabble.com.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users