Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer

2006-09-05 Thread Merlin Moncure

On 9/1/06, Randall Lucas [EMAIL PROTECTED] wrote:

On Fri, Sep 01, 2006 at 09:30:57AM -0400, Merlin Moncure wrote:
 A key tenet of relational thinking is to reduce all information to its
 functional dependencies, and to try and avoid as much as possible
 keeping information state in the data in a declarative sense.
 last_query_shown_tuples() is imo a violation in that sense.

I used last_query_shown_tuples() as a quick example.  It seems more
likely that it would be implemented in fact as something like EXPLAIN,
where it acts upon a given query.  Like,

EXPLAIN DEPENDENCIES SELECT * FROM ...

It seems that the query planner *must* know which rows, from which
tables, actually get used in producing the output for a given query.
Given that this info is present somewhere in the depths, is there a way
to get this information out to the app level?


not exactly in the sense you are describing, but you can make the
query such that it gives you the knowledge to get back to the original
data and look it up, yes?

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer

2006-09-01 Thread Merlin Moncure

On 8/31/06, Randall Lucas [EMAIL PROTECTED] wrote:

Now that I have this query, in order to make my case, I need to sign
off on all of the individual data that went into it.  I would like to
do something like:

select last_query_shown_tuples();
 schema |  table_name   | pk_columns | pk_values
+---++---
 public |   company |[id]|   [2]
 public |   officer |[id]|   [3]
 public | insider_trade |[id]|   [1]
 (3 rows)


right.  in sql, except for a few miscellaneous things that are session
based, information state is kept in the tables and if you want to keep
things relational all information should be have a primary key.  Your
last_query_shown_tuples function should be a simple select statement
returning information written with an id based on some criteria.  You
could call this table 'investigation' and put in it the information
necesasry to backtrack to the other tables.

A key tenet of relational thinking is to reduce all information to its
functional dependencies, and to try and avoid as much as possible
keeping information state in the data in a declarative sense.
last_query_shown_tuples() is imo a violation in that sense.  so is
currval() in fact, because given the same database defined by its data
it could give different answers depending on things leading up to how
the data was put there.  therefore, it is a hack, albeit a very useful
one.

One more thing I would like to suggest is to not automatically use a
serial column to define a primary key.  While this is a good idea in
many cases for various reasons, it can lead to lazy thinking and in
extreme cases bad performance.  a good way to approach sequences is to
define the natural key and use the generated ID as an alternative
candidate key if you want to keep things tight.

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer

2006-09-01 Thread Randall Lucas
On Fri, Sep 01, 2006 at 09:30:57AM -0400, Merlin Moncure wrote:
 On 8/31/06, Randall Lucas [EMAIL PROTECTED] wrote:
 Now that I have this query, in order to make my case, I need to sign
 off on all of the individual data that went into it.  I would like to
 do something like:
 
 select last_query_shown_tuples();
  schema |  table_name   | pk_columns | pk_values
 +---++---
  public |   company |[id]|   [2]
  public |   officer |[id]|   [3]
  public | insider_trade |[id]|   [1]
  (3 rows)
 
 right.  in sql, except for a few miscellaneous things that are session
 based, information state is kept in the tables and if you want to keep
 things relational all information should be have a primary key.

Agreed.  As noted in my example above, all of my tables have primary
keys (serials for simplicity, although I represented the pk columns and
values as arrays because of the possibility of other sorts of keys,
like multicolumn).


 A key tenet of relational thinking is to reduce all information to its
 functional dependencies, and to try and avoid as much as possible
 keeping information state in the data in a declarative sense.
 last_query_shown_tuples() is imo a violation in that sense.

I used last_query_shown_tuples() as a quick example.  It seems more
likely that it would be implemented in fact as something like EXPLAIN,
where it acts upon a given query.  Like,

EXPLAIN DEPENDENCIES SELECT * FROM ...

It seems that the query planner *must* know which rows, from which
tables, actually get used in producing the output for a given query.
Given that this info is present somewhere in the depths, is there a way
to get this information out to the app level?

Best,

Randall

-- 
Randall Lucas   Tercent, Inc.   DF93EAD1

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer

2006-08-31 Thread Merlin Moncure

On 8/30/06, Randall Lucas [EMAIL PROTECTED] wrote:

I'm storing facts about an entity (e.g., company) in a normalized
form with various repeating groups, link tables, etc.  My workflow
requires that after (or as part of) collecting these facts, I be able
to sign off as having verified all of the facts that pertain to a
given company.  I understand this as meaning I need to sign off on each
row that was used in answering the query select * from company left
join ...



An inverted way of thinking about the problem is the notion of getting a
source document (say, a company registration form) and parsing and
storing it in a normalized format.  How do you tag the origin of each
and every row in every table that resulted from that source document?


your form should have a code, either entered by the customer or by the
preparer who enters it into the database, which becomes the key that
identifies the registration document.  Put that key into other tables.


It is possible to do so by associating an extra column with each
inserted or modified value (yuck).


be careful, you are flirting with EAV thinking.  I think EAV designs
are terrible.

 It seems to me that the elegant way to do this would be to get the

entire graph of dependencies for not only all tables that reference the
entity, but only those rows within those tables that refer specifically
to my entity.

The query engine should have a pretty good idea about which tables and
which rows would actually be used in forming the responses to a given
query.  How can I get this information?  Or am I off the deep end (-ency
graph) with this one?


I am not sure where you are going with this.  Maybe you should mock up
some simple tables and repost your question.

merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Dependency graph of all tuples relied upon in a query answer

2006-08-31 Thread Randall Lucas
On Thu, Aug 31, 2006 at 09:17:54AM -0400, Merlin Moncure wrote:
 On 8/30/06, Randall Lucas [EMAIL PROTECTED] wrote:
 
 An inverted way of thinking about the problem is the notion of getting a
 source document (say, a company registration form) and parsing and
 storing it in a normalized format.  How do you tag the origin of each
 and every row in every table that resulted from that source document?
 
 your form should have a code, either entered by the customer or by the
 preparer who enters it into the database, which becomes the key that
 identifies the registration document.  Put that key into other tables.

Yes -- I have flirted with this idea for the case of importing a set of
facts from a single source.  But where it breaks down is in being able
to do the reverse -- ex post facto, to certify the results of a query
as being accurate, and thereby attesting to the underlying facts.

 be careful, you are flirting with EAV thinking.  I think EAV designs
 are terrible.

(Though religion is not at issue here, I am pretty devoutly relational.
I just want a good way to perform audits and other functions on my
relations.)

  It seems to me that the elegant way to do this would be to get the
 entire graph of dependencies for not only all tables that reference the
 entity, but only those rows within those tables that refer specifically
 to my entity.
 
 The query engine should have a pretty good idea about which tables and
 which rows would actually be used in forming the responses to a given
 query.  How can I get this information?  Or am I off the deep end (-ency
 graph) with this one?
 
 I am not sure where you are going with this.  Maybe you should mock up
 some simple tables and repost your question.

Imagine that I am the SEC.  I look for cases of insider trading, and
when I find a case, I have to go back and double-check all the pieces
of evidence I am using against them.

create table company (
id  serial primary key,
nametext not null
);

create table officer (
id  serial primary key,
company_id  int not null references company(id) on delete cascade,
nametext not null,
title   text not null
);

create table insider_trade (
id  serial primary key,
officer_id int not null references officer(id) on delete cascade,
shares_traded   numeric,
share_price numeric,
trade_date  timestamptz
);

insert into company (name) values ('goodco');
insert into company (name) values ('badco');
insert into officer (company_id, name, title) values (1, 'johnny b. good', 
'ceo');
insert into officer (company_id, name, title) values (1, 'mother teresa', 
'saint');
insert into officer (company_id, name, title) values (2, 'leroy brown', 
'ceo');
insert into insider_trade (officer_id, shares_traded, share_price, 
trade_date) values
(3, '5', '6.66', '2006-07-04 1:23 PM PST');

Now, my database looks like this:

select * from company left join officer on company.id=officer.company_id 
left join insider_trade on officer.id=insider_trade.officer_id;
 id |  name  | id | company_id |  name  | title | id | officer_id | 
shares_traded | share_price |   trade_date   

+++++---+++---+-+
  1 | goodco |  1 |  1 | johnny b. good | ceo   ||| 
  | | 
  1 | goodco |  2 |  1 | mother teresa  | saint ||| 
  | | 
  2 | badco  |  3 |  2 | leroy brown| ceo   |  1 |  3 | 
5 |6.66 | 2006-07-04 14:23:00-07
(3 rows)

If I want to know whom to investigate, I might do a query like this:

select * from company left join officer on company.id=officer.company_id 
left join insider_trade on officer.id=insider_trade.officer_id where 
insider_trade.id is not null;
 id | name  | id | company_id |name | title | id | officer_id | 
shares_traded | share_price |   trade_date   

+---+++-+---+++---+-+
  2 | badco |  3 |  2 | leroy brown | ceo   |  1 |  3 | 
5 |6.66 | 2006-07-04 14:23:00-07
(1 row)

Now that I have this query, in order to make my case, I need to sign
off on all of the individual data that went into it.  I would like to
do something like:

select last_query_shown_tuples();
 schema |  table_name   | pk_columns | pk_values 
+---++---
 public |   company |[id]|   [2] 
 public |   officer |[id]|   [3] 
 public | insider_trade |[id]|   [1] 
 (3 rows)

(I am aware that, at least theoretically, my query is just as