Re: [sqlite] a system for arbitrarily tagging rows in a table

2009-07-26 Thread P Kishor
On Sun, Jul 26, 2009 at 9:23 PM, Jay A. Kreibich wrote:
> On Sat, Jul 25, 2009 at 09:26:16AM -0700, Jim Showalter scratched on the wall:
>> You can have the tags in a separate table that has a foreign-key to
>> the table with the rows in in you want to tag,
>
>  That's essentially what the OP is doing, except they've built a
>  many-to-many relationship using a bridge table, rather than a
>  one-to-many using a simple foreign key.  This allows any tag to be
>  associated with any row without duplicating any of the tags, allowing
>  for better data normalization.

Indeed. My query is working quite well for now. I do realize that it
will not be efficient for very large datasets, but it is quite good
for my needs.

>
>> If you need to search for
>> multiple tags at a time, each requires a join:
>
>  Right, and that's the problem.  If you've got an application that
>  allows you to search for tags with an arbitrary number of tags, you
>  need a different query for each situation (one tag, two tags, etc.).
>  Also, adding two more JOINs (since you need to join through the
>  bridge table) for each additional search target means your performance
>  degrades very quickly when you've got four or five search tags.
>
>  This is why I suggested looking at Relational division.  Even if SQL
>  doesn't support it directly, there are other ways of doing it to make
>  the query more generic.  With division, you can do something like (in the
>  terms of the OP's post):
>
>  (foo JOIN foo_tag JOIN tag) / (search_tags) = foo with search_tags
>
>  This basically builds a table of all foo:tag combinations and then
>  divides out our search tags, resulting in a list of foo.f_id values
>  that have an association with all of the tags found in search_tags.
>  You can then JOIN the results of this expression back to the tags.

I gave relational division a gander. The text was a bit thick so my
eyes started glazing after a while, but it piqued my curiosity
sufficiently that I am going to go back to it and try and understand
it better.


>  The OP asked to have the search tags removed from the results, which
>  is also a bit of a trick to do in a generic way.

Yes, this is a "view" problem that is fairly easily solved in the
application layer. Essentially, one wants to see all the tags for the
result returned for a given tag... for example, for a given
nationality, "US," show me all the ethnicities, but, since I already
know that all the rows in the result are for nationality = "US," there
is no point in displaying "US" along with the rows. Much easier to do
this in the application logic.



>
>  Anyways... The relational division makes any query search -- from
>  1 to 1000 tags -- have the same structure.  As long as the division
>  performance is good, the query performance shouldn't horribly
>  degrade as you add more search tags.
>
>> select t
>> from tagged as t
>> join t.tags as tag1
>> join t.tags as tag2
>> where tag1 = some value
>>   and tag2 = some value.
>
>  This points in the direction of a one-to-many structure, only the
>  syntax is not making sense to me.  JOINs work on tables, not columns,
>  and WHERE expressions work on columns, not tables.
>
>  You're also only doing the first half of the problem, since you need
>  to join this back to the tags to get the result the OP was looking for
>  (including pulling back out the search targets).  It's a trickier
>  problem than it first looks, especially if you want to do it in one
>  SQL statement with a minimal number of sub-SELECTs.
>
>   -j
>
>> - Original Message -
>> From: "Jay A. Kreibich" 
>> To: ; "General Discussion of SQLite Database"
>> 
>> Sent: Friday, July 24, 2009 9:21 PM
>> Subject: Re: [sqlite] a system for arbitrarily tagging rows in a table
>>
>>
>> > On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the
>> > wall:
>> >> I am trying to develop a "tagging" system, whereby each row in a
>> >> table
>> >> can be tagged with arbitrary number of tags.
>> >
>> >  This smells of a Relational division problem.  If you're dealing
>> > with
>> >  tags you might want to have a look at that (Celko has a few good
>> >  articles on it).  Since SQL lacks a native Relational division
>> >  operator, chances are a solution in that direction is going to be
>> >  more complex -- at least for this problem.  But any time I've done
>> >  tags or attributes, sooner or later I find myself needing to do a
>> >  division.  They come in handy any time you say "my data is vertical
>> >  but I need it horizontal."  You might want to read up on them just
>> > to
>> >  have that knowledge available.
>> >
>> >
>> >> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT);
>> >> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT);
>> >> TABLE foo_tag (f_id INTEGER, t_id INTEGER);
>> >
>> >
>> >> I have the following solution. Could I do better or differently?
>> >
>> >  I'm not sure about 

Re: [sqlite] a system for arbitrarily tagging rows in a table

2009-07-26 Thread Jay A. Kreibich
On Sat, Jul 25, 2009 at 09:26:16AM -0700, Jim Showalter scratched on the wall:
> You can have the tags in a separate table that has a foreign-key to 
> the table with the rows in in you want to tag,

  That's essentially what the OP is doing, except they've built a
  many-to-many relationship using a bridge table, rather than a
  one-to-many using a simple foreign key.  This allows any tag to be
  associated with any row without duplicating any of the tags, allowing
  for better data normalization.

> If you need to search for 
> multiple tags at a time, each requires a join: 

  Right, and that's the problem.  If you've got an application that
  allows you to search for tags with an arbitrary number of tags, you
  need a different query for each situation (one tag, two tags, etc.).
  Also, adding two more JOINs (since you need to join through the
  bridge table) for each additional search target means your performance
  degrades very quickly when you've got four or five search tags.

  This is why I suggested looking at Relational division.  Even if SQL
  doesn't support it directly, there are other ways of doing it to make
  the query more generic.  With division, you can do something like (in the
  terms of the OP's post):

  (foo JOIN foo_tag JOIN tag) / (search_tags) = foo with search_tags

  This basically builds a table of all foo:tag combinations and then
  divides out our search tags, resulting in a list of foo.f_id values
  that have an association with all of the tags found in search_tags.
  You can then JOIN the results of this expression back to the tags.
  The OP asked to have the search tags removed from the results, which
  is also a bit of a trick to do in a generic way.

  Anyways... The relational division makes any query search -- from
  1 to 1000 tags -- have the same structure.  As long as the division
  performance is good, the query performance shouldn't horribly
  degrade as you add more search tags.

> select t
> from tagged as t 
> join t.tags as tag1
> join t.tags as tag2
> where tag1 = some value
>   and tag2 = some value.

  This points in the direction of a one-to-many structure, only the
  syntax is not making sense to me.  JOINs work on tables, not columns,
  and WHERE expressions work on columns, not tables.
  
  You're also only doing the first half of the problem, since you need
  to join this back to the tags to get the result the OP was looking for
  (including pulling back out the search targets).  It's a trickier
  problem than it first looks, especially if you want to do it in one
  SQL statement with a minimal number of sub-SELECTs.

   -j

> - Original Message - 
> From: "Jay A. Kreibich" 
> To: ; "General Discussion of SQLite Database" 
> 
> Sent: Friday, July 24, 2009 9:21 PM
> Subject: Re: [sqlite] a system for arbitrarily tagging rows in a table
> 
> 
> > On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the 
> > wall:
> >> I am trying to develop a "tagging" system, whereby each row in a 
> >> table
> >> can be tagged with arbitrary number of tags.
> >
> >  This smells of a Relational division problem.  If you're dealing 
> > with
> >  tags you might want to have a look at that (Celko has a few good
> >  articles on it).  Since SQL lacks a native Relational division
> >  operator, chances are a solution in that direction is going to be
> >  more complex -- at least for this problem.  But any time I've done
> >  tags or attributes, sooner or later I find myself needing to do a
> >  division.  They come in handy any time you say "my data is vertical
> >  but I need it horizontal."  You might want to read up on them just 
> > to
> >  have that knowledge available.
> >
> >
> >> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT);
> >> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT);
> >> TABLE foo_tag (f_id INTEGER, t_id INTEGER);
> >
> >
> >> I have the following solution. Could I do better or differently?
> >
> >  I'm not sure about "better", but here's different:
> >
> > sqlite> SELECT foo.f_name, tag.t_name
> >   ...> FROM tag AS target
> >   ...>   NATURAL JOIN foo_tag AS target_ft
> >   ...>   NATURAL JOIN foo
> >   ...>   NATURAL JOIN foo_tag
> >   ...>   NATURAL JOIN tag
> >   ...> WHERE target.t_name = 'bar'
> >   ...>   AND tag.t_id != target.t_id
> >   ...> ORDER BY foo.f_name, tag.t_name;
> >
> >  This basically folds your IN sub-select back into the main query.
> >  We join "foo" to the tag table in two directions... one to find
> >  the search target tag id and the other to produce the output.
> >
> >  "tag AS target" with the first WHERE clause should return one row.
> >  We join that through "foo_tag AS target_ft" to get a list of foo 
> > ids
> >  that have the search target tag.  We then build the normal output
> >  list by joining that back through the foo_tag bridge table to the
> >  tags, and throw out any rows with an output tag id that matches the
> >  

Re: [sqlite] a system for arbitrarily tagging rows in a table

2009-07-25 Thread Jim Showalter
You can have the tags in a separate table that has a foreign-key to 
the table with the rows in in you want to tag, and then do a select 
from that table to find the tagged rows. If you need to search for 
multiple tags at a time, each requires a join: select t from tagged as 
t join t.tags as tag1 join t.tags as tag2 where tag1 = some value and 
tag2 = some value.

- Original Message - 
From: "Jay A. Kreibich" 
To: ; "General Discussion of SQLite Database" 

Sent: Friday, July 24, 2009 9:21 PM
Subject: Re: [sqlite] a system for arbitrarily tagging rows in a table


> On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the 
> wall:
>> I am trying to develop a "tagging" system, whereby each row in a 
>> table
>> can be tagged with arbitrary number of tags.
>
>  This smells of a Relational division problem.  If you're dealing 
> with
>  tags you might want to have a look at that (Celko has a few good
>  articles on it).  Since SQL lacks a native Relational division
>  operator, chances are a solution in that direction is going to be
>  more complex -- at least for this problem.  But any time I've done
>  tags or attributes, sooner or later I find myself needing to do a
>  division.  They come in handy any time you say "my data is vertical
>  but I need it horizontal."  You might want to read up on them just 
> to
>  have that knowledge available.
>
>
>> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT);
>> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT);
>> TABLE foo_tag (f_id INTEGER, t_id INTEGER);
>
>
>> I have the following solution. Could I do better or differently?
>
>  I'm not sure about "better", but here's different:
>
> sqlite> SELECT foo.f_name, tag.t_name
>   ...> FROM tag AS target
>   ...>   NATURAL JOIN foo_tag AS target_ft
>   ...>   NATURAL JOIN foo
>   ...>   NATURAL JOIN foo_tag
>   ...>   NATURAL JOIN tag
>   ...> WHERE target.t_name = 'bar'
>   ...>   AND tag.t_id != target.t_id
>   ...> ORDER BY foo.f_name, tag.t_name;
>
>  This basically folds your IN sub-select back into the main query.
>  We join "foo" to the tag table in two directions... one to find
>  the search target tag id and the other to produce the output.
>
>  "tag AS target" with the first WHERE clause should return one row.
>  We join that through "foo_tag AS target_ft" to get a list of foo 
> ids
>  that have the search target tag.  We then build the normal output
>  list by joining that back through the foo_tag bridge table to the
>  tags, and throw out any rows with an output tag id that matches the
>  search target tag id.
>
>  Simple!
>
>  Best of all, the target only appears in the query once, and your 
> name
>  convention means we can use NATURAL JOINs to keep things clean.  It
>  also totally falls apart if you need to search on more than one 
> tag.
>  That's where Relational division comes in.
>
>   -j (who had to draw a picture to get it right)
>
> -- 
> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>
> "Our opponent is an alien starship packed with atomic bombs.  We 
> have
> a protractor."   "I'll go home and see if I can scrounge up a ruler
> and a piece of string."  --from Anathem by Neal Stephenson
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] a system for arbitrarily tagging rows in a table

2009-07-24 Thread Jay A. Kreibich
On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the wall:
> I am trying to develop a "tagging" system, whereby each row in a table
> can be tagged with arbitrary number of tags.

  This smells of a Relational division problem.  If you're dealing with
  tags you might want to have a look at that (Celko has a few good
  articles on it).  Since SQL lacks a native Relational division
  operator, chances are a solution in that direction is going to be
  more complex -- at least for this problem.  But any time I've done
  tags or attributes, sooner or later I find myself needing to do a
  division.  They come in handy any time you say "my data is vertical
  but I need it horizontal."  You might want to read up on them just to
  have that knowledge available.


> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT);
> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT);
> TABLE foo_tag (f_id INTEGER, t_id INTEGER);


> I have the following solution. Could I do better or differently?

  I'm not sure about "better", but here's different:

sqlite> SELECT foo.f_name, tag.t_name 
   ...> FROM tag AS target 
   ...>   NATURAL JOIN foo_tag AS target_ft
   ...>   NATURAL JOIN foo
   ...>   NATURAL JOIN foo_tag
   ...>   NATURAL JOIN tag
   ...> WHERE target.t_name = 'bar' 
   ...>   AND tag.t_id != target.t_id
   ...> ORDER BY foo.f_name, tag.t_name;

  This basically folds your IN sub-select back into the main query.
  We join "foo" to the tag table in two directions... one to find
  the search target tag id and the other to produce the output.

  "tag AS target" with the first WHERE clause should return one row.
  We join that through "foo_tag AS target_ft" to get a list of foo ids
  that have the search target tag.  We then build the normal output
  list by joining that back through the foo_tag bridge table to the
  tags, and throw out any rows with an output tag id that matches the
  search target tag id.

  Simple!

  Best of all, the target only appears in the query once, and your name
  convention means we can use NATURAL JOINs to keep things clean.  It
  also totally falls apart if you need to search on more than one tag.
  That's where Relational division comes in.

   -j (who had to draw a picture to get it right)

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] a system for arbitrarily tagging rows in a table

2009-07-24 Thread Simon Slavin

On 25 Jul 2009, at 3:20am, P Kishor wrote:

> I am trying to develop a "tagging" system, whereby each row in a table
> can be tagged with arbitrary number of tags.

As an alternative for using a table for tags, consider using a long  
string instead.  The default value for this column would be the  
character (picking one at random) ','.  To add a tag you append the  
tag plus another ',' to it.  So a record tagged as both 'hot' and  
'current bug' would have this value in the 'tags' column:

,hot,current bug,

To search for all rows containing a specific tag, use the 'LIKE'  
operator to find all strings including ','||thisTag||','.  To show all  
the tags apart from that one, use a 'replace' expression to remove it  
from the string or use whatever your programming languages uses for  
'replace'.  To forget all uses of the tag use an UPDATE command with a  
'replace' expression.

This solution does not suit every application, since it's slower for  
searching big tables, but this way of implementing tags allows  
operations like the one you describe to be done as a single simple  
SELECT statement with no JOIN or SELECT arguments.  It does not  
require you to carefully modify a 'tags' table when you delete a  
record.  It removes the requirement for one table and one index, and  
thus makes the overall database smaller.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users