Re: [sqlite] a system for arbitrarily tagging rows in a table
On Sun, Jul 26, 2009 at 9:23 PM, Jay A. Kreibichwrote: > On Sat, Jul 25, 2009 at 09:26:16AM -0700, Jim Showalter scratched on the wall: >> You can have the tags in a separate table that has a foreign-key to >> the table with the rows in in you want to tag, > > That's essentially what the OP is doing, except they've built a > many-to-many relationship using a bridge table, rather than a > one-to-many using a simple foreign key. This allows any tag to be > associated with any row without duplicating any of the tags, allowing > for better data normalization. Indeed. My query is working quite well for now. I do realize that it will not be efficient for very large datasets, but it is quite good for my needs. > >> If you need to search for >> multiple tags at a time, each requires a join: > > Right, and that's the problem. If you've got an application that > allows you to search for tags with an arbitrary number of tags, you > need a different query for each situation (one tag, two tags, etc.). > Also, adding two more JOINs (since you need to join through the > bridge table) for each additional search target means your performance > degrades very quickly when you've got four or five search tags. > > This is why I suggested looking at Relational division. Even if SQL > doesn't support it directly, there are other ways of doing it to make > the query more generic. With division, you can do something like (in the > terms of the OP's post): > > (foo JOIN foo_tag JOIN tag) / (search_tags) = foo with search_tags > > This basically builds a table of all foo:tag combinations and then > divides out our search tags, resulting in a list of foo.f_id values > that have an association with all of the tags found in search_tags. > You can then JOIN the results of this expression back to the tags. I gave relational division a gander. The text was a bit thick so my eyes started glazing after a while, but it piqued my curiosity sufficiently that I am going to go back to it and try and understand it better. > The OP asked to have the search tags removed from the results, which > is also a bit of a trick to do in a generic way. Yes, this is a "view" problem that is fairly easily solved in the application layer. Essentially, one wants to see all the tags for the result returned for a given tag... for example, for a given nationality, "US," show me all the ethnicities, but, since I already know that all the rows in the result are for nationality = "US," there is no point in displaying "US" along with the rows. Much easier to do this in the application logic. > > Anyways... The relational division makes any query search -- from > 1 to 1000 tags -- have the same structure. As long as the division > performance is good, the query performance shouldn't horribly > degrade as you add more search tags. > >> select t >> from tagged as t >> join t.tags as tag1 >> join t.tags as tag2 >> where tag1 = some value >> and tag2 = some value. > > This points in the direction of a one-to-many structure, only the > syntax is not making sense to me. JOINs work on tables, not columns, > and WHERE expressions work on columns, not tables. > > You're also only doing the first half of the problem, since you need > to join this back to the tags to get the result the OP was looking for > (including pulling back out the search targets). It's a trickier > problem than it first looks, especially if you want to do it in one > SQL statement with a minimal number of sub-SELECTs. > > -j > >> - Original Message - >> From: "Jay A. Kreibich" >> To: ; "General Discussion of SQLite Database" >> >> Sent: Friday, July 24, 2009 9:21 PM >> Subject: Re: [sqlite] a system for arbitrarily tagging rows in a table >> >> >> > On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the >> > wall: >> >> I am trying to develop a "tagging" system, whereby each row in a >> >> table >> >> can be tagged with arbitrary number of tags. >> > >> > This smells of a Relational division problem. If you're dealing >> > with >> > tags you might want to have a look at that (Celko has a few good >> > articles on it). Since SQL lacks a native Relational division >> > operator, chances are a solution in that direction is going to be >> > more complex -- at least for this problem. But any time I've done >> > tags or attributes, sooner or later I find myself needing to do a >> > division. They come in handy any time you say "my data is vertical >> > but I need it horizontal." You might want to read up on them just >> > to >> > have that knowledge available. >> > >> > >> >> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT); >> >> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT); >> >> TABLE foo_tag (f_id INTEGER, t_id INTEGER); >> > >> > >> >> I have the following solution. Could I do better or differently? >> > >> > I'm not sure about
Re: [sqlite] a system for arbitrarily tagging rows in a table
On Sat, Jul 25, 2009 at 09:26:16AM -0700, Jim Showalter scratched on the wall: > You can have the tags in a separate table that has a foreign-key to > the table with the rows in in you want to tag, That's essentially what the OP is doing, except they've built a many-to-many relationship using a bridge table, rather than a one-to-many using a simple foreign key. This allows any tag to be associated with any row without duplicating any of the tags, allowing for better data normalization. > If you need to search for > multiple tags at a time, each requires a join: Right, and that's the problem. If you've got an application that allows you to search for tags with an arbitrary number of tags, you need a different query for each situation (one tag, two tags, etc.). Also, adding two more JOINs (since you need to join through the bridge table) for each additional search target means your performance degrades very quickly when you've got four or five search tags. This is why I suggested looking at Relational division. Even if SQL doesn't support it directly, there are other ways of doing it to make the query more generic. With division, you can do something like (in the terms of the OP's post): (foo JOIN foo_tag JOIN tag) / (search_tags) = foo with search_tags This basically builds a table of all foo:tag combinations and then divides out our search tags, resulting in a list of foo.f_id values that have an association with all of the tags found in search_tags. You can then JOIN the results of this expression back to the tags. The OP asked to have the search tags removed from the results, which is also a bit of a trick to do in a generic way. Anyways... The relational division makes any query search -- from 1 to 1000 tags -- have the same structure. As long as the division performance is good, the query performance shouldn't horribly degrade as you add more search tags. > select t > from tagged as t > join t.tags as tag1 > join t.tags as tag2 > where tag1 = some value > and tag2 = some value. This points in the direction of a one-to-many structure, only the syntax is not making sense to me. JOINs work on tables, not columns, and WHERE expressions work on columns, not tables. You're also only doing the first half of the problem, since you need to join this back to the tags to get the result the OP was looking for (including pulling back out the search targets). It's a trickier problem than it first looks, especially if you want to do it in one SQL statement with a minimal number of sub-SELECTs. -j > - Original Message - > From: "Jay A. Kreibich"> To: ; "General Discussion of SQLite Database" > > Sent: Friday, July 24, 2009 9:21 PM > Subject: Re: [sqlite] a system for arbitrarily tagging rows in a table > > > > On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the > > wall: > >> I am trying to develop a "tagging" system, whereby each row in a > >> table > >> can be tagged with arbitrary number of tags. > > > > This smells of a Relational division problem. If you're dealing > > with > > tags you might want to have a look at that (Celko has a few good > > articles on it). Since SQL lacks a native Relational division > > operator, chances are a solution in that direction is going to be > > more complex -- at least for this problem. But any time I've done > > tags or attributes, sooner or later I find myself needing to do a > > division. They come in handy any time you say "my data is vertical > > but I need it horizontal." You might want to read up on them just > > to > > have that knowledge available. > > > > > >> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT); > >> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT); > >> TABLE foo_tag (f_id INTEGER, t_id INTEGER); > > > > > >> I have the following solution. Could I do better or differently? > > > > I'm not sure about "better", but here's different: > > > > sqlite> SELECT foo.f_name, tag.t_name > > ...> FROM tag AS target > > ...> NATURAL JOIN foo_tag AS target_ft > > ...> NATURAL JOIN foo > > ...> NATURAL JOIN foo_tag > > ...> NATURAL JOIN tag > > ...> WHERE target.t_name = 'bar' > > ...> AND tag.t_id != target.t_id > > ...> ORDER BY foo.f_name, tag.t_name; > > > > This basically folds your IN sub-select back into the main query. > > We join "foo" to the tag table in two directions... one to find > > the search target tag id and the other to produce the output. > > > > "tag AS target" with the first WHERE clause should return one row. > > We join that through "foo_tag AS target_ft" to get a list of foo > > ids > > that have the search target tag. We then build the normal output > > list by joining that back through the foo_tag bridge table to the > > tags, and throw out any rows with an output tag id that matches the > >
Re: [sqlite] a system for arbitrarily tagging rows in a table
You can have the tags in a separate table that has a foreign-key to the table with the rows in in you want to tag, and then do a select from that table to find the tagged rows. If you need to search for multiple tags at a time, each requires a join: select t from tagged as t join t.tags as tag1 join t.tags as tag2 where tag1 = some value and tag2 = some value. - Original Message - From: "Jay A. Kreibich"To: ; "General Discussion of SQLite Database" Sent: Friday, July 24, 2009 9:21 PM Subject: Re: [sqlite] a system for arbitrarily tagging rows in a table > On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the > wall: >> I am trying to develop a "tagging" system, whereby each row in a >> table >> can be tagged with arbitrary number of tags. > > This smells of a Relational division problem. If you're dealing > with > tags you might want to have a look at that (Celko has a few good > articles on it). Since SQL lacks a native Relational division > operator, chances are a solution in that direction is going to be > more complex -- at least for this problem. But any time I've done > tags or attributes, sooner or later I find myself needing to do a > division. They come in handy any time you say "my data is vertical > but I need it horizontal." You might want to read up on them just > to > have that knowledge available. > > >> TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT); >> TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT); >> TABLE foo_tag (f_id INTEGER, t_id INTEGER); > > >> I have the following solution. Could I do better or differently? > > I'm not sure about "better", but here's different: > > sqlite> SELECT foo.f_name, tag.t_name > ...> FROM tag AS target > ...> NATURAL JOIN foo_tag AS target_ft > ...> NATURAL JOIN foo > ...> NATURAL JOIN foo_tag > ...> NATURAL JOIN tag > ...> WHERE target.t_name = 'bar' > ...> AND tag.t_id != target.t_id > ...> ORDER BY foo.f_name, tag.t_name; > > This basically folds your IN sub-select back into the main query. > We join "foo" to the tag table in two directions... one to find > the search target tag id and the other to produce the output. > > "tag AS target" with the first WHERE clause should return one row. > We join that through "foo_tag AS target_ft" to get a list of foo > ids > that have the search target tag. We then build the normal output > list by joining that back through the foo_tag bridge table to the > tags, and throw out any rows with an output tag id that matches the > search target tag id. > > Simple! > > Best of all, the target only appears in the query once, and your > name > convention means we can use NATURAL JOINs to keep things clean. It > also totally falls apart if you need to search on more than one > tag. > That's where Relational division comes in. > > -j (who had to draw a picture to get it right) > > -- > Jay A. Kreibich < J A Y @ K R E I B I.C H > > > "Our opponent is an alien starship packed with atomic bombs. We > have > a protractor." "I'll go home and see if I can scrounge up a ruler > and a piece of string." --from Anathem by Neal Stephenson > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] a system for arbitrarily tagging rows in a table
On Fri, Jul 24, 2009 at 09:20:29PM -0500, P Kishor scratched on the wall: > I am trying to develop a "tagging" system, whereby each row in a table > can be tagged with arbitrary number of tags. This smells of a Relational division problem. If you're dealing with tags you might want to have a look at that (Celko has a few good articles on it). Since SQL lacks a native Relational division operator, chances are a solution in that direction is going to be more complex -- at least for this problem. But any time I've done tags or attributes, sooner or later I find myself needing to do a division. They come in handy any time you say "my data is vertical but I need it horizontal." You might want to read up on them just to have that knowledge available. > TABLE foo (f_id INTEGER PRIMARY KEY, f_name TEXT); > TABLE tag (t_id INTEGER PRIMARY KEY, t_name TEXT); > TABLE foo_tag (f_id INTEGER, t_id INTEGER); > I have the following solution. Could I do better or differently? I'm not sure about "better", but here's different: sqlite> SELECT foo.f_name, tag.t_name ...> FROM tag AS target ...> NATURAL JOIN foo_tag AS target_ft ...> NATURAL JOIN foo ...> NATURAL JOIN foo_tag ...> NATURAL JOIN tag ...> WHERE target.t_name = 'bar' ...> AND tag.t_id != target.t_id ...> ORDER BY foo.f_name, tag.t_name; This basically folds your IN sub-select back into the main query. We join "foo" to the tag table in two directions... one to find the search target tag id and the other to produce the output. "tag AS target" with the first WHERE clause should return one row. We join that through "foo_tag AS target_ft" to get a list of foo ids that have the search target tag. We then build the normal output list by joining that back through the foo_tag bridge table to the tags, and throw out any rows with an output tag id that matches the search target tag id. Simple! Best of all, the target only appears in the query once, and your name convention means we can use NATURAL JOINs to keep things clean. It also totally falls apart if you need to search on more than one tag. That's where Relational division comes in. -j (who had to draw a picture to get it right) -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] a system for arbitrarily tagging rows in a table
On 25 Jul 2009, at 3:20am, P Kishor wrote: > I am trying to develop a "tagging" system, whereby each row in a table > can be tagged with arbitrary number of tags. As an alternative for using a table for tags, consider using a long string instead. The default value for this column would be the character (picking one at random) ','. To add a tag you append the tag plus another ',' to it. So a record tagged as both 'hot' and 'current bug' would have this value in the 'tags' column: ,hot,current bug, To search for all rows containing a specific tag, use the 'LIKE' operator to find all strings including ','||thisTag||','. To show all the tags apart from that one, use a 'replace' expression to remove it from the string or use whatever your programming languages uses for 'replace'. To forget all uses of the tag use an UPDATE command with a 'replace' expression. This solution does not suit every application, since it's slower for searching big tables, but this way of implementing tags allows operations like the one you describe to be done as a single simple SELECT statement with no JOIN or SELECT arguments. It does not require you to carefully modify a 'tags' table when you delete a record. It removes the requirement for one table and one index, and thus makes the overall database smaller. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users