Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-16 Thread David Beck
 There is a hook post_parse_analyze_hook but I think it comes too
 late as it comes after the analyze step which is when Postgres looks
 up the schema information for every relation mentioned in the query.
 What you would need is a post_parse_hook which would work on the raw
 parse tree before the analyze step. That doesn't seem terribly
 controversial to add though there may be some technical details. The
 API would of course be completely unstable from major release to major
 release -- the parse tree gets knocked around quite a bit.

Good to know that parse tree is not stable. My original proposal was exactly 
the one you describe. I called it post_raw_parse_hook and wanted to call it 
from exec_simple_query(), right after pg_parse_query().
May be not the best place or way to achieve this? 

It was a few liner patch, I wonder if it ever has a chance to get into Postgres.

 And I have to wonder if you aren't going the long way around to do
 something that can be done more simply some other way. If you have
 150k objects I wonder if your objects aren't all very similar and
 could be handled by a single Postgres schema object. Either a single
 FDW object or a simple function.

I need to do a bit more research on that side. I was hoping to spend less time 
on understanding that legacy system...

 As a side note, you should evaluate carefully what lazily creating
 objects will buy you. Perhaps just creating 150k objects would be
 cheaper than maintaining this code. In particular since the user
 *might* access all 150k you still have to worry about the worst case
 anyway and it might be cheaper to just engineer for it in the first
 place.

I have a similar problem with the legacy system. The set of 150k objects may 
change in between releases. My gut feeling is that it is easier to understand 
and keep track of an open source project like Postgres.

Best regards and thanks for all insights and ideas, David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-15 Thread David Beck
2014.02.15. dátummal, 0:46 időpontban Greg Stark st...@mit.edu írta:

 On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 Another point I liked in mysql is the possibility to write info schema 
 plugins: 
 http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
 Like a virtual catalog. Is there anything similar in Postgres?
 
 The documentation you linked to describes how to provide
 information_schema plugins but not why you would want to do such a
 thing. I'm not seeing why this would be useful. The information_schema
 schema is described by the standard so creating new views in it isn't
 needed very often and the schema for the existing views doesn't change
 very often. I can see why a plugin might want to add rows to the views
 but that doesn't seem to be what this feature is about.

Another reason I was thinking about dynamic catalog and/or query rewrite is the 
project I work on is a data integration platform. Right now it is in the 
feasibility study phase and Postgres+extension looks to be the strongest option.

The legacy system we want to interface with has over 150k table like objects. 
Our platform’s task is to provide a relational view on top of them.

I know that it is unlikely the users to use all 150k tables. I would expect may 
be 10-100 are used in practice, but I didn’t want to figure out which 100, 
neither want to create all 150k catalog entries in advance.

I was also dreaming about the possibility to transfer the small enough objects 
to Postgres tables in the background and spare the communication with the 
legacy system and let Postgres do the joins on these.

The solution I was thinking about is this:

- when the query arrives a smart rewrite would know 1) what tables are local 2) 
what tables need new catalog entries 3) what can be joined on the other side
- the rewriter would potentially add SQL statements in the beginning of the 
query for creating the missing FDW catalog entries if needed
- the FDW would be handled by the same extension so they can easily talk to 
each other about the status of the objects, so the rewriter would know if the 
background transfer of the small table is completed and should do the rewrite 
accordingly

I know these are pretty far from the functionality and traditional operation of 
an RDBMS… but if you look at the FDW examples like do a select on a Google Imap 
mailbox, it is not that far from Postgres

Best regards, David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-15 Thread Greg Stark
On Sat, Feb 15, 2014 at 2:06 PM, David Beck db...@starschema.net wrote:
 - when the query arrives a smart rewrite would know 1) what tables are local 
 2) what tables need new catalog entries 3) what can be joined on the other 
 side
 - the rewriter would potentially add SQL statements in the beginning of the 
 query for creating the missing FDW catalog entries if needed
 - the FDW would be handled by the same extension so they can easily talk to 
 each other about the status of the objects, so the rewriter would know if the 
 background transfer of the small table is completed and should do the rewrite 
 accordingly

I think you have a pretty big impedence mismatch with Postgres which
assumes the schema of the database is fairly static and known when
parsing begins. To do what you describe you pretty much need to write
your own SQL parser.

There is a hook post_parse_analyze_hook but I think it comes too
late as it comes after the analyze step which is when Postgres looks
up the schema information for every relation mentioned in the query.
What you would need is a post_parse_hook which would work on the raw
parse tree before the analyze step. That doesn't seem terribly
controversial to add though there may be some technical details. The
API would of course be completely unstable from major release to major
release -- the parse tree gets knocked around quite a bit.

But that only gets you so far. I think you would be able to get lazy
creation of schema objects fairly straightforwardly. Ie. Any legacy
objects referenced in a query get corresponding fdw schema objects
created before analyze is done -- though I would expect a few gotchas,
possibly deadlocks, with concurrent queries creating the same objects.

But I don't think that gets you the joins which I think would be quite a battle.

And I have to wonder if you aren't going the long way around to do
something that can be done more simply some other way. If you have
150k objects I wonder if your objects aren't all very similar and
could be handled by a single Postgres schema object. Either a single
FDW object or a simple function.

If they really are different schema objects perhaps a single function
that returns a more flexible data type like an hstore blob? That has
obvious disadvantages over an object that the planner understands
better but at least you would be in a well supported stable API (well,
for interesting definitions of stable given 9.4 will have a whole
new version of hstore).

As a side note, you should evaluate carefully what lazily creating
objects will buy you. Perhaps just creating 150k objects would be
cheaper than maintaining this code. In particular since the user
*might* access all 150k you still have to worry about the worst case
anyway and it might be cheaper to just engineer for it in the first
place.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
Thanks for the reply. There are two things I think I’ve been misunderstood:

1, the point is to do the rewrite without and before catalog access
2, I do want to push the join to the source and equally important pushing the 
where conditions there

Best regards, David


2014.02.13. dátummal, 21:22 időpontban Tom Lane t...@sss.pgh.pa.us írta:

 David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
  “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”
 
 to:
 
  “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
 
 TBH this sounds like a spectacularly bad idea, especially in the place and
 way you propose to do it.  You can't even do catalog access safely where
 you've put that hook, not to mention that there are many other places
 where queries can be submitted.  But more generally, an FDW should not
 operate in the way you're describing.
 
 We do lack support for pushing joins to the foreign server, and that needs
 to be addressed; but we need to do it in the planner, not by kluging the
 query somewhere upstream of that.
 
   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
Let me rephrase this:

Let’s remove my motivations and use cases from this conversation….

Why is that a bad idea of rewriting the query before it reaches 
transform/analyze (without ever accessing the catalog)?

If that flexibility is acceptable to you, where would be the best place to put 
it in?

Thanks, David


2014.02.14. dátummal, 10:30 időpontban David Beck db...@starschema.net írta:

 Thanks for the reply. There are two things I think I’ve been misunderstood:
 
 1, the point is to do the rewrite without and before catalog access
 2, I do want to push the join to the source and equally important pushing the 
 where conditions there
 
 Best regards, David
 
 
 2014.02.13. dátummal, 21:22 időpontban Tom Lane t...@sss.pgh.pa.us írta:
 
 David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work 
 on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
 “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”
 
 to:
 
 “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”
 
 TBH this sounds like a spectacularly bad idea, especially in the place and
 way you propose to do it.  You can't even do catalog access safely where
 you've put that hook, not to mention that there are many other places
 where queries can be submitted.  But more generally, an FDW should not
 operate in the way you're describing.
 
 We do lack support for pushing joins to the foreign server, and that needs
 to be addressed; but we need to do it in the planner, not by kluging the
 query somewhere upstream of that.
 
  regards, tom lane
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark
On Fri, Feb 14, 2014 at 2:28 PM, David Beck db...@starschema.net wrote:
 Why is that a bad idea of rewriting the query before it reaches 
 transform/analyze (without ever accessing the catalog)?

 If that flexibility is acceptable to you, where would be the best place to 
 put it in?

Well if there are two foreign tables and the planner could push the
join work down to the fdw then the planner should be able to
accurately represent that plan and cost it without having the user
have to create any catalog structures. That's what the planner does
for every other type of plan node.

What you're describing would still be useful for materialized views.
In that case the user is creating the materialized view and it is a
real thing in the catalogs that won't disappear on the planner. Even
then it would be ideal if the planner could decide to use the
materialized view late enough that it can actually determine if it's
superior rather than rewriting the query before it gets to that point.
That would be much more flexible for users too who might not write the
query in a way that exactly matches the materialized view.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread David Beck
I think I’m gonna need to dig into the planner to fully understand your points. 
Thank you for the insights. I was more into putting the knowledge of the legacy 
system into the an extension and my codebase. Now I see better use of the 
planner would help. Thank you.

What inspired me is the scriptable query rewrite in 
http://dev.mysql.com/downloads/mysql-proxy/ 
The hook I proposed would be a lot nicer in Postgres because the raw parsing is 
already done at this point while in mysql-proxy that has to be done manually.

Another point I liked in mysql is the possibility to write info schema plugins: 
http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
Like a virtual catalog. Is there anything similar in Postgres?

Thank you, David


2014.02.14. dátummal, 18:06 időpontban Greg Stark st...@mit.edu írta:

 On Fri, Feb 14, 2014 at 2:28 PM, David Beck db...@starschema.net wrote:
 Why is that a bad idea of rewriting the query before it reaches 
 transform/analyze (without ever accessing the catalog)?
 
 If that flexibility is acceptable to you, where would be the best place to 
 put it in?
 
 Well if there are two foreign tables and the planner could push the
 join work down to the fdw then the planner should be able to
 accurately represent that plan and cost it without having the user
 have to create any catalog structures. That's what the planner does
 for every other type of plan node.
 
 What you're describing would still be useful for materialized views.
 In that case the user is creating the materialized view and it is a
 real thing in the catalogs that won't disappear on the planner. Even
 then it would be ideal if the planner could decide to use the
 materialized view late enough that it can actually determine if it's
 superior rather than rewriting the query before it gets to that point.
 That would be much more flexible for users too who might not write the
 query in a way that exactly matches the materialized view.
 
 -- 
 greg



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark
On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 Another point I liked in mysql is the possibility to write info schema 
 plugins: 
 http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
 Like a virtual catalog. Is there anything similar in Postgres?

The documentation you linked to describes how to provide
information_schema plugins but not why you would want to do such a
thing. I'm not seeing why this would be useful. The information_schema
schema is described by the standard so creating new views in it isn't
needed very often and the schema for the existing views doesn't change
very often. I can see why a plugin might want to add rows to the views
but that doesn't seem to be what this feature is about.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 Another point I liked in mysql is the possibility to write info schema 
 plugins: 
 http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html
 Like a virtual catalog. Is there anything similar in Postgres?

 The documentation you linked to describes how to provide
 information_schema plugins but not why you would want to do such a
 thing. I'm not seeing why this would be useful. The information_schema
 schema is described by the standard so creating new views in it isn't
 needed very often and the schema for the existing views doesn't change
 very often.

IIRC, mysql has a liberal view about what they can do to the
information_schema, so for them this isn't as insane as it sounds to us.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-14 Thread Greg Stark
On Fri, Feb 14, 2014 at 9:16 PM, David Beck db...@starschema.net wrote:
 What inspired me is the scriptable query rewrite in 
 http://dev.mysql.com/downloads/mysql-proxy/
 The hook I proposed would be a lot nicer in Postgres because the raw parsing 
 is already done at this point while in mysql-proxy that has to be done 
 manually.

There are a few similar things in the Postgres world such as pgbouncer
and plproxy but as you note they are limited in how powerful they can
be by the complexity of parsing and analyzing SQL.

I think Postgres tends to take a different tack regarding
extensibility than MySQL. Rather than have one system then hooks to
allow external code to modify or replace it, in Postgres modules
usually are treated similarly to the internal code. This is a pretty
broad generalization and there are certainly exceptions. But for
example new data types, functions, even whole new index types are all
treated almost identically to the internal data types, functions, and
index types. The planner then considers them all more or less on equal
basis.

Obviously there are limits. Btree indexes are kind of special because
they represent Postgres's basic concept of ordering. And we don't have
a pluggable recovery system which makes any externally provided index
types non-crash-safe. And we don't have a generalized concept of table
storage -- the closest thing we have is FDWs which is more like
MySQL's style of extensibility where the extension is a special case.

But this is why our instinct is that if you want to be able to push
down joins the way to do it is to extend the FDW api so that the
planner can make those decisions just like it makes the decisions
about internal joins rather than have an external module that takes
over part of the planner's work.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread Kohei KaiGai
See the discussion of Custom-Scan API.
https://commitfest.postgresql.org/action/patch_view?id=1282

I believe my third patch is what you really want to do...

 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.

 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.

I'd like to see why you wants the pseudo table fdw_tableA_tableB to
be in the catalog,
instead of the tableA and tableB. In addition, parser shall raise
an error if referenced
columns (as a part of tableA or tableB) are not in-catalog because
of name lookup
error.

Thanks,


2014-02-13 19:01 GMT+09:00 David Beck db...@starschema.net:
 Hello Hackers,

 I work on a foreign data wrapper for a legacy system. I generally find the 
 hook system very useful and flexible way to extend Postgres.
 The post parse analyze hook almost fits what I need, but I have a few use 
 cases where I would need to tap right into the parsed queries but before any 
 catalog based validation is done.
 Please find the attached trivial patch for this new hook.

 One of the use cases I have is this:

 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.

 The idea is to rewrite the SQL queries like this:

   SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987

 to:

   SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987


 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.

 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.

 Looking forward to hear your thoughts, opinions, comments.

 Best regards, David





 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
KaiGai Kohei kai...@kaigai.gr.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread David Beck
Thanks for the link.

I want flexibility. Here is a situation: my hook knows the size of tableA and 
tableB on the legacy side. It should be able to decide wether to offload the 
join/filter onto the legacy side or not. At the same time it can start 
transferring the data to real Postgres tables in the background. When the data 
is here in the local DB it can change the rewrite to use the local tables 
instead for the new queries.

Answering your question:

I don’t know what you mean by pseudo table. The “fdw_tableA_tableB” is a real 
FDW table with a catalog entry, with real catalog columns. tableA and tableB 
are non-real tables in my setup. My understanding is the 
raw_parse-transform-analyze triple first checks the catalog in the transform 
phase. That’s why I want the hook to be after raw_parse but before transform, 
so the transform phase won’t see tableA, tableB and their columns, thus I don’t 
expect errors.

Thank you, David


Please find my answers inline.

2014.02.13. dátummal, 13:28 időpontban Kohei KaiGai kai...@kaigai.gr.jp írta:

 See the discussion of Custom-Scan API.
 https://commitfest.postgresql.org/action/patch_view?id=1282
 
 I believe my third patch is what you really want to do...
 
 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.
 
 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.
 
 I'd like to see why you wants the pseudo table fdw_tableA_tableB to
 be in the catalog,
 instead of the tableA and tableB. In addition, parser shall raise
 an error if referenced
 columns (as a part of tableA or tableB) are not in-catalog because
 of name lookup
 error.
 
 Thanks,
 
 
 2014-02-13 19:01 GMT+09:00 David Beck db...@starschema.net:
 Hello Hackers,
 
 I work on a foreign data wrapper for a legacy system. I generally find the 
 hook system very useful and flexible way to extend Postgres.
 The post parse analyze hook almost fits what I need, but I have a few use 
 cases where I would need to tap right into the parsed queries but before any 
 catalog based validation is done.
 Please find the attached trivial patch for this new hook.
 
 One of the use cases I have is this:
 
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.
 
 The idea is to rewrite the SQL queries like this:
 
  SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987
 
 to:
 
  SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987
 
 
 This rewritten query would be handled by the FDW table that I previously 
 added to the catalog.
 
 The reason I want this new hook is that I don't want tableA and tableB to be 
 in the catalog.
 
 Looking forward to hear your thoughts, opinions, comments.
 
 Best regards, David
 
 
 
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
 
 -- 
 KaiGai Kohei kai...@kaigai.gr.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New hook after raw parsing, before analyze

2014-02-13 Thread Tom Lane
David Beck db...@starschema.net writes:
 I have table like data structures in the source system for the FDW I work on.
 These tables are sometimes too big and the source system is able to filter 
 and join them with limitations, thus it is not optimal to transfer the data 
 to Postgres.
 At the same time I want the users to think in terms of the original tables.

 The idea is to rewrite the SQL queries like this:

   “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND 
 b.col2=987”

 to:

   “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987”

TBH this sounds like a spectacularly bad idea, especially in the place and
way you propose to do it.  You can't even do catalog access safely where
you've put that hook, not to mention that there are many other places
where queries can be submitted.  But more generally, an FDW should not
operate in the way you're describing.

We do lack support for pushing joins to the foreign server, and that needs
to be addressed; but we need to do it in the planner, not by kluging the
query somewhere upstream of that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers