Re: [HACKERS] New hook after raw parsing, before analyze
> There is a hook "post_parse_analyze_hook" but I think it comes too > late as it comes after the analyze step which is when Postgres looks > up the schema information for every relation mentioned in the query. > What you would need is a post_parse_hook which would work on the raw > parse tree before the analyze step. That doesn't seem terribly > controversial to add though there may be some technical details. The > API would of course be completely unstable from major release to major > release -- the parse tree gets knocked around quite a bit. Good to know that parse tree is not stable. My original proposal was exactly the one you describe. I called it post_raw_parse_hook and wanted to call it from exec_simple_query(), right after pg_parse_query(). May be not the best place or way to achieve this? It was a few liner patch, I wonder if it ever has a chance to get into Postgres. > And I have to wonder if you aren't going the long way around to do > something that can be done more simply some other way. If you have > 150k objects I wonder if your objects aren't all very similar and > could be handled by a single Postgres schema object. Either a single > FDW object or a simple function. I need to do a bit more research on that side. I was hoping to spend less time on understanding that legacy system... > As a side note, you should evaluate carefully what lazily creating > objects will buy you. Perhaps just creating 150k objects would be > cheaper than maintaining this code. In particular since the user > *might* access all 150k you still have to worry about the worst case > anyway and it might be cheaper to just engineer for it in the first > place. I have a similar problem with the legacy system. The set of 150k objects may change in between releases. My gut feeling is that it is easier to understand and keep track of an open source project like Postgres. Best regards and thanks for all insights and ideas, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New hook after raw parsing, before analyze
2014.02.15. dátummal, 0:46 időpontban Greg Stark írta: > On Fri, Feb 14, 2014 at 9:16 PM, David Beck wrote: >> Another point I liked in mysql is the possibility to write info schema >> plugins: >> http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html >> Like a virtual catalog. Is there anything similar in Postgres? > > The documentation you linked to describes how to provide > information_schema plugins but not why you would want to do such a > thing. I'm not seeing why this would be useful. The information_schema > schema is described by the standard so creating new views in it isn't > needed very often and the schema for the existing views doesn't change > very often. I can see why a plugin might want to add rows to the views > but that doesn't seem to be what this feature is about. Another reason I was thinking about dynamic catalog and/or query rewrite is the project I work on is a data integration platform. Right now it is in the feasibility study phase and Postgres+extension looks to be the strongest option. The legacy system we want to interface with has over 150k table like objects. Our platform’s task is to provide a relational view on top of them. I know that it is unlikely the users to use all 150k tables. I would expect may be 10-100 are used in practice, but I didn’t want to figure out which 100, neither want to create all 150k catalog entries in advance. I was also dreaming about the possibility to transfer the small enough objects to Postgres tables in the background and spare the communication with the legacy system and let Postgres do the joins on these. The solution I was thinking about is this: - when the query arrives a smart rewrite would know 1) what tables are local 2) what tables need new catalog entries 3) what can be joined on the other side - the rewriter would potentially add SQL statements in the beginning of the query for creating the missing FDW catalog entries if needed - the FDW would be handled by the same extension so they can easily talk to each other about the status of the objects, so the rewriter would know if the background transfer of the small table is completed and should do the rewrite accordingly I know these are pretty far from the functionality and traditional operation of an RDBMS… but if you look at the FDW examples like do a select on a Google Imap mailbox, it is not that far from Postgres Best regards, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New hook after raw parsing, before analyze
I think I’m gonna need to dig into the planner to fully understand your points. Thank you for the insights. I was more into putting the knowledge of the legacy system into the an extension and my codebase. Now I see better use of the planner would help. Thank you. What inspired me is the scriptable query rewrite in http://dev.mysql.com/downloads/mysql-proxy/ The hook I proposed would be a lot nicer in Postgres because the raw parsing is already done at this point while in mysql-proxy that has to be done manually. Another point I liked in mysql is the possibility to write info schema plugins: http://dev.mysql.com/doc/refman/5.1/en/writing-information-schema-plugins.html Like a virtual catalog. Is there anything similar in Postgres? Thank you, David 2014.02.14. dátummal, 18:06 időpontban Greg Stark írta: > On Fri, Feb 14, 2014 at 2:28 PM, David Beck wrote: >> Why is that a bad idea of rewriting the query before it reaches >> transform/analyze (without ever accessing the catalog)? >> >> If that flexibility is acceptable to you, where would be the best place to >> put it in? > > Well if there are two foreign tables and the planner could push the > join work down to the fdw then the planner should be able to > accurately represent that plan and cost it without having the user > have to create any catalog structures. That's what the planner does > for every other type of plan node. > > What you're describing would still be useful for materialized views. > In that case the user is creating the materialized view and it is a > real thing in the catalogs that won't disappear on the planner. Even > then it would be ideal if the planner could decide to use the > materialized view late enough that it can actually determine if it's > superior rather than rewriting the query before it gets to that point. > That would be much more flexible for users too who might not write the > query in a way that exactly matches the materialized view. > > -- > greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New hook after raw parsing, before analyze
Let me rephrase this: Let’s remove my motivations and use cases from this conversation…. Why is that a bad idea of rewriting the query before it reaches transform/analyze (without ever accessing the catalog)? If that flexibility is acceptable to you, where would be the best place to put it in? Thanks, David 2014.02.14. dátummal, 10:30 időpontban David Beck írta: > Thanks for the reply. There are two things I think I’ve been misunderstood: > > 1, the point is to do the rewrite without and before catalog access > 2, I do want to push the join to the source and equally important pushing the > where conditions there > > Best regards, David > > > 2014.02.13. dátummal, 21:22 időpontban Tom Lane írta: > >> David Beck writes: >>> I have table like data structures in the source system for the FDW I work >>> on. >>> These tables are sometimes too big and the source system is able to filter >>> and join them with limitations, thus it is not optimal to transfer the data >>> to Postgres. >>> At the same time I want the users to think in terms of the original tables. >> >>> The idea is to rewrite the SQL queries like this: >> >>> “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND >>> b.col2=987” >> >>> to: >> >>> “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987” >> >> TBH this sounds like a spectacularly bad idea, especially in the place and >> way you propose to do it. You can't even do catalog access safely where >> you've put that hook, not to mention that there are many other places >> where queries can be submitted. But more generally, an FDW should not >> operate in the way you're describing. >> >> We do lack support for pushing joins to the foreign server, and that needs >> to be addressed; but we need to do it in the planner, not by kluging the >> query somewhere upstream of that. >> >> regards, tom lane > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New hook after raw parsing, before analyze
Thanks for the reply. There are two things I think I’ve been misunderstood: 1, the point is to do the rewrite without and before catalog access 2, I do want to push the join to the source and equally important pushing the where conditions there Best regards, David 2014.02.13. dátummal, 21:22 időpontban Tom Lane írta: > David Beck writes: >> I have table like data structures in the source system for the FDW I work on. >> These tables are sometimes too big and the source system is able to filter >> and join them with limitations, thus it is not optimal to transfer the data >> to Postgres. >> At the same time I want the users to think in terms of the original tables. > >> The idea is to rewrite the SQL queries like this: > >> “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND >> b.col2=987” > >> to: > >> “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987” > > TBH this sounds like a spectacularly bad idea, especially in the place and > way you propose to do it. You can't even do catalog access safely where > you've put that hook, not to mention that there are many other places > where queries can be submitted. But more generally, an FDW should not > operate in the way you're describing. > > We do lack support for pushing joins to the foreign server, and that needs > to be addressed; but we need to do it in the planner, not by kluging the > query somewhere upstream of that. > > regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New hook after raw parsing, before analyze
Thanks for the link. I want flexibility. Here is a situation: my hook knows the size of tableA and tableB on the legacy side. It should be able to decide wether to offload the join/filter onto the legacy side or not. At the same time it can start transferring the data to real Postgres tables in the background. When the data is here in the local DB it can change the rewrite to use the local tables instead for the new queries. Answering your question: I don’t know what you mean by pseudo table. The “fdw_tableA_tableB” is a real FDW table with a catalog entry, with real catalog columns. tableA and tableB are non-real tables in my setup. My understanding is the raw_parse-transform-analyze triple first checks the catalog in the transform phase. That’s why I want the hook to be after raw_parse but before transform, so the transform phase won’t see tableA, tableB and their columns, thus I don’t expect errors. Thank you, David Please find my answers inline. 2014.02.13. dátummal, 13:28 időpontban Kohei KaiGai írta: > See the discussion of Custom-Scan API. > https://commitfest.postgresql.org/action/patch_view?id=1282 > > I believe my third patch is what you really want to do... > >> This rewritten query would be handled by the FDW table that I previously >> added to the catalog. >> >> The reason I want this new hook is that I don't want tableA and tableB to be >> in the catalog. >> > I'd like to see why you wants the pseudo table "fdw_tableA_tableB" to > be in the catalog, > instead of the "tableA" and "tableB". In addition, parser shall raise > an error if referenced > columns (as a part of "tableA" or "tableB") are not in-catalog because > of name lookup > error. > > Thanks, > > > 2014-02-13 19:01 GMT+09:00 David Beck : >> Hello Hackers, >> >> I work on a foreign data wrapper for a legacy system. I generally find the >> hook system very useful and flexible way to extend Postgres. >> The post parse analyze hook almost fits what I need, but I have a few use >> cases where I would need to tap right into the parsed queries but before any >> catalog based validation is done. >> Please find the attached trivial patch for this new hook. >> >> One of the use cases I have is this: >> >> I have table like data structures in the source system for the FDW I work on. >> These tables are sometimes too big and the source system is able to filter >> and join them with limitations, thus it is not optimal to transfer the data >> to Postgres. >> At the same time I want the users to think in terms of the original tables. >> >> The idea is to rewrite the SQL queries like this: >> >> "SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND >> b.col2=987" >> >> to: >> >> "SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987" >> >> >> This rewritten query would be handled by the FDW table that I previously >> added to the catalog. >> >> The reason I want this new hook is that I don't want tableA and tableB to be >> in the catalog. >> >> Looking forward to hear your thoughts, opinions, comments. >> >> Best regards, David >> >> >> >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> > > > > -- > KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New hook after raw parsing, before analyze
Hello Hackers, I work on a foreign data wrapper for a legacy system. I generally find the hook system very useful and flexible way to extend Postgres. The post parse analyze hook almost fits what I need, but I have a few use cases where I would need to tap right into the parsed queries but before any catalog based validation is done. Please find the attached trivial patch for this new hook. One of the use cases I have is this: I have table like data structures in the source system for the FDW I work on. These tables are sometimes too big and the source system is able to filter and join them with limitations, thus it is not optimal to transfer the data to Postgres. At the same time I want the users to think in terms of the original tables. The idea is to rewrite the SQL queries like this: “SELECT * FROM tableA a, tableB b WHERE a.id=b.id AND a.col1=1234 AND b.col2=987” to: “SELECT * FROM fdw_tableA_tableB ab WHERE ab.col1=1234 AND ab.col2=987” This rewritten query would be handled by the FDW table that I previously added to the catalog. The reason I want this new hook is that I don’t want tableA and tableB to be in the catalog. Looking forward to hear your thoughts, opinions, comments. Best regards, David post_raw_parse.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers