Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer
On 9/1/06, Randall Lucas <[EMAIL PROTECTED]> wrote: On Fri, Sep 01, 2006 at 09:30:57AM -0400, Merlin Moncure wrote: > A key tenet of relational thinking is to reduce all information to its > functional dependencies, and to try and avoid as much as possible > keeping information state in the data in a declarative sense. > last_query_shown_tuples() is imo a violation in that sense. I used last_query_shown_tuples() as a quick example. It seems more likely that it would be implemented in fact as something like EXPLAIN, where it acts upon a given query. Like, EXPLAIN DEPENDENCIES SELECT * FROM ... It seems that the query planner *must* know which rows, from which tables, actually get used in producing the output for a given query. Given that this info is present somewhere in the depths, is there a way to get this information out to the app level? not exactly in the sense you are describing, but you can make the query such that it gives you the knowledge to get back to the original data and look it up, yes? merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer
On Fri, Sep 01, 2006 at 09:30:57AM -0400, Merlin Moncure wrote: > On 8/31/06, Randall Lucas <[EMAIL PROTECTED]> wrote: > >Now that I have this query, in order to make my case, I need to "sign > >off" on all of the individual data that went into it. I would like to > >do something like: > > > >select last_query_shown_tuples(); > > schema | table_name | pk_columns | pk_values > >+---++--- > > public | company |[id]| [2] > > public | officer |[id]| [3] > > public | insider_trade |[id]| [1] > > (3 rows) > > right. in sql, except for a few miscellaneous things that are session > based, information state is kept in the tables and if you want to keep > things relational all information should be have a primary key. Agreed. As noted in my example above, all of my tables have primary keys (serials for simplicity, although I represented the pk columns and values as arrays because of the possibility of other sorts of keys, like multicolumn). > A key tenet of relational thinking is to reduce all information to its > functional dependencies, and to try and avoid as much as possible > keeping information state in the data in a declarative sense. > last_query_shown_tuples() is imo a violation in that sense. I used last_query_shown_tuples() as a quick example. It seems more likely that it would be implemented in fact as something like EXPLAIN, where it acts upon a given query. Like, EXPLAIN DEPENDENCIES SELECT * FROM ... It seems that the query planner *must* know which rows, from which tables, actually get used in producing the output for a given query. Given that this info is present somewhere in the depths, is there a way to get this information out to the app level? Best, Randall -- Randall Lucas Tercent, Inc. DF93EAD1 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer
On 8/31/06, Randall Lucas <[EMAIL PROTECTED]> wrote: Now that I have this query, in order to make my case, I need to "sign off" on all of the individual data that went into it. I would like to do something like: select last_query_shown_tuples(); schema | table_name | pk_columns | pk_values +---++--- public | company |[id]| [2] public | officer |[id]| [3] public | insider_trade |[id]| [1] (3 rows) right. in sql, except for a few miscellaneous things that are session based, information state is kept in the tables and if you want to keep things relational all information should be have a primary key. Your last_query_shown_tuples function should be a simple select statement returning information written with an id based on some criteria. You could call this table 'investigation' and put in it the information necesasry to backtrack to the other tables. A key tenet of relational thinking is to reduce all information to its functional dependencies, and to try and avoid as much as possible keeping information state in the data in a declarative sense. last_query_shown_tuples() is imo a violation in that sense. so is currval() in fact, because given the same database defined by its data it could give different answers depending on things leading up to how the data was put there. therefore, it is a hack, albeit a very useful one. One more thing I would like to suggest is to not automatically use a serial column to define a primary key. While this is a good idea in many cases for various reasons, it can lead to lazy thinking and in extreme cases bad performance. a good way to approach sequences is to define the natural key and use the generated ID as an alternative candidate key if you want to keep things tight. merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer
On Thu, Aug 31, 2006 at 09:17:54AM -0400, Merlin Moncure wrote: > On 8/30/06, Randall Lucas <[EMAIL PROTECTED]> wrote: > > >An inverted way of thinking about the problem is the notion of getting a > >source document (say, a "company registration form") and parsing and > >storing it in a normalized format. How do you "tag" the origin of each > >and every row in every table that resulted from that source document? > > your form should have a code, either entered by the customer or by the > preparer who enters it into the database, which becomes the key that > identifies the registration document. Put that key into other tables. Yes -- I have flirted with this idea for the case of importing a set of facts from a single source. But where it breaks down is in being able to do the reverse -- ex post facto, to certify the results of a query as being accurate, and thereby attesting to the underlying facts. > be careful, you are flirting with EAV thinking. I think EAV designs > are terrible. (Though religion is not at issue here, I am pretty devoutly relational. I just want a good way to perform audits and other functions on my relations.) > > It seems to me that the elegant way to do this would be to get the > >entire graph of dependencies for not only all tables that reference the > >entity, but only those rows within those tables that refer specifically > >to my entity. > > > >The query engine should have a pretty good idea about which tables and > >which rows would actually be used in forming the responses to a given > >query. How can I get this information? Or am I off the deep end (-ency > >graph) with this one? > > I am not sure where you are going with this. Maybe you should mock up > some simple tables and repost your question. Imagine that I am the SEC. I look for cases of insider trading, and when I find a case, I have to go back and double-check all the pieces of evidence I am using against them. create table company ( id serial primary key, nametext not null ); create table officer ( id serial primary key, company_id int not null references company(id) on delete cascade, nametext not null, title text not null ); create table insider_trade ( id serial primary key, officer_id int not null references officer(id) on delete cascade, shares_traded numeric, share_price numeric, trade_date timestamptz ); insert into company (name) values ('goodco'); insert into company (name) values ('badco'); insert into officer (company_id, name, title) values (1, 'johnny b. good', 'ceo'); insert into officer (company_id, name, title) values (1, 'mother teresa', 'saint'); insert into officer (company_id, name, title) values (2, 'leroy brown', 'ceo'); insert into insider_trade (officer_id, shares_traded, share_price, trade_date) values (3, '5', '6.66', '2006-07-04 1:23 PM PST'); Now, my database looks like this: select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id; id | name | id | company_id | name | title | id | officer_id | shares_traded | share_price | trade_date +++++---+++---+-+ 1 | goodco | 1 | 1 | johnny b. good | ceo ||| | | 1 | goodco | 2 | 1 | mother teresa | saint ||| | | 2 | badco | 3 | 2 | leroy brown| ceo | 1 | 3 | 5 |6.66 | 2006-07-04 14:23:00-07 (3 rows) If I want to know whom to investigate, I might do a query like this: select * from company left join officer on company.id=officer.company_id left join insider_trade on officer.id=insider_trade.officer_id where insider_trade.id is not null; id | name | id | company_id |name | title | id | officer_id | shares_traded | share_price | trade_date +---+++-+---+++---+-+ 2 | badco | 3 | 2 | leroy brown | ceo | 1 | 3 | 5 |6.66 | 2006-07-04 14:23:00-07 (1 row) Now that I have this query, in order to make my case, I need to "sign off" on all of the individual data that went into it. I would like to do something like: select last_query_shown_tuples(); schema | table_name | pk_columns | pk_values +---++--- public | company |[id]| [2] public | officer |[id]| [3] public | insider_trade |[id]| [1] (3 rows) (I am aware that, at le
Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer
On 8/30/06, Randall Lucas <[EMAIL PROTECTED]> wrote: I'm storing facts about an entity (e.g., "company") in a normalized form with various repeating groups, link tables, etc. My workflow requires that after (or as part of) collecting these facts, I be able to "sign off" as having verified all of the facts that pertain to a given company. I understand this as meaning I need to sign off on each row that was used in answering the query "select * from company left join ..." An inverted way of thinking about the problem is the notion of getting a source document (say, a "company registration form") and parsing and storing it in a normalized format. How do you "tag" the origin of each and every row in every table that resulted from that source document? your form should have a code, either entered by the customer or by the preparer who enters it into the database, which becomes the key that identifies the registration document. Put that key into other tables. It is possible to do so by associating an extra column with each inserted or modified value (yuck). be careful, you are flirting with EAV thinking. I think EAV designs are terrible. > It seems to me that the elegant way to do this would be to get the entire graph of dependencies for not only all tables that reference the entity, but only those rows within those tables that refer specifically to my entity. The query engine should have a pretty good idea about which tables and which rows would actually be used in forming the responses to a given query. How can I get this information? Or am I off the deep end (-ency graph) with this one? I am not sure where you are going with this. Maybe you should mock up some simple tables and repost your question. merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly