Re: [HACKERS] exporting raw parser

2010-06-07 Thread Daniel Farina
On Wed, May 26, 2010 at 6:02 PM, Tatsuo Ishii is...@postgresql.org wrote:
 I'm thinking about exporting the raw parser and related modules as a C
 library. Though this will not be an immediate benefit of PostgreSQL
 itself, it will be a huge benefit for any PostgreSQL
 applications/middle ware those need to parse SQL statements.

In the past I and people I have known/worked with have made strategic
use of UDFs running on a live server that return the parse tree,
semantically analyzed tree, and planned tree (I think) outNode textual
representation for various projects, and found them highly useful.
Syntactic, semantic, and operational meaning of a query was useful for
our projects.

Some of this code was linked with the server, and so reading the node
using Postgres' parser was easy. Otherwise, a small parser needed be
written for external projects. Perhaps a slightly more ideal state of
affairs would be:

* These hooks to acquire the syntactic/semantic/planned trees would be
bundled for free
* When writing code not linked against the server, a more common
serialization format, ala JSON or whatnot

A more ambitious project that I don't think is in the scope of any
initial implementation would be to allow for cross referencing of
these compilation passes, similar to how GNU Bison allows you to
interrogate for the position of a lexeme when reporting errors. In my
experience, code written that mangles one layer (say, semantic, or
harder yet, plan) has a hard time doing the best error because getting
from a node at the bottom to the right lexeme(s) at the top is
very cumbersome. One could imagine this being useful for other
purposes too, but that is how I felt it firsthand. Feels a lot harder,
though.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-06-07 Thread Dimitri Fontaine
Daniel Farina drfar...@acm.org writes:
 Some of this code was linked with the server, and so reading the node
 using Postgres' parser was easy. Otherwise, a small parser needed be
 written for external projects. Perhaps a slightly more ideal state of
 affairs would be:

 * These hooks to acquire the syntactic/semantic/planned trees would be
 bundled for free
 * When writing code not linked against the server, a more common
 serialization format, ala JSON or whatnot

Accessing to those data have been talked about with respect to DDL
triggers too. You want to be able to know what exactly is being
executed, and against what objects.

And you want to be able to abuse this information from either a C-coded
server function or a PLpgSQL trigger. I guess the WIP JSON datatype
would help a lot even when working from within the server, as that does
not mean working in C.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-06-01 Thread Jan Wieck

On 5/26/2010 10:16 PM, Tatsuo Ishii wrote:

As was already discussed, I don't believe that premise.  None of the
applications you cite would be able to make use of the raw parser
output, because it doesn't contain the semantic information they need.
If what you actually meant was the analyzed parse tree, that *might*
serve the need depending on just what is wanted (in particular,
properties that could be affected by the expansion of views or
inlineable functions could still not be determined reliably).
But you can't have that without access to the current system catalog
contents.


No, what pgpoo-II needs is a raw parse tree. When it needs info in the
system catalog, it sends SELECT to PostgreSQL. So that would be no
problem.


But doesn't it need that parse tree BEFORE it makes the decision, which 
node to execute the query on?


The parser needs the system catalog in order to create a parse tree. 
Where would that stand-alone library version of the parser get the 
catalog information from? Don't you need to know which user defined 
function in the query is volatile?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii
I'm thinking about exporting the raw parser and related modules as a C
library. Though this will not be an immediate benefit of PostgreSQL
itself, it will be a huge benefit for any PostgreSQL
applications/middle ware those need to parse SQL statements.

For example, pgpool-II parses queries to know if it's a read query or
not. In other case, it needs to know if a SELECT statement includes
any temporal constructor such as CURRENT_TIME_STAMP. These are not a
trivial job since SQL grammar is complex. For this purpose pgpool-II
copies PostgreSQL parser code and use it. Of course maintaining the
part is pain since PostgreSQL's parser will be changed from release to
release.

I believe not only pgpool-II but some connection pooling middle wares
need SQL parser as well(pgbouncer?). Also any tool which accepts SQL
statement as its input would also need SQL parser(pgAdmin?). For them
exported raw parser will be a huge benefit.

The implementation will not be very difficult since pgpool-II has
already done most of necessary work for this:

- extract raw parser part from parser directory, which include gram.y,
  scan.l and keywords.c

- extract utility functions needed to handle raw parse tree:
  nodes/nodes.c makefunc.c etc.

- create an exportable version of memory manager

- create an exportable exception handling routines(i.e. elog)

- wrap all of above into a libXX*.so

I think those works are essentially a refactoring of existing raw
parser, and will not add performance degration nor maintenance cost.

Comments?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-05-26 Thread Josh Berkus

 I think those works are essentially a refactoring of existing raw
 parser, and will not add performance degration nor maintenance cost.
 
 Comments?

You should call it libSQL; who knows, other DB projects might want it.
 They seem to borrow our parser enough as it is.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 I'm thinking about exporting the raw parser and related modules as a C
 library. Though this will not be an immediate benefit of PostgreSQL
 itself, it will be a huge benefit for any PostgreSQL
 applications/middle ware those need to parse SQL statements.

As was already discussed, I don't believe that premise.  None of the
applications you cite would be able to make use of the raw parser
output, because it doesn't contain the semantic information they need.
If what you actually meant was the analyzed parse tree, that *might*
serve the need depending on just what is wanted (in particular,
properties that could be affected by the expansion of views or
inlineable functions could still not be determined reliably).
But you can't have that without access to the current system catalog
contents.

In any case there's the serious problem that we simply are not going
to promise that the parser output representation is stable.  We've
changed it many times in the past and will do so in the future.

 I think those works are essentially a refactoring of existing raw
 parser, and will not add performance degration nor maintenance cost.

Quite aside from whether the result would be of any use or not, that
opinion is obviously wrong.  This would be at least as difficult to
maintain as ecpg ... which has been a enormous time sink.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-05-26 Thread Takahiro Itagaki

Tatsuo Ishii is...@postgresql.org wrote:

 I'm thinking about exporting the raw parser and related modules as a C
 library. Though this will not be an immediate benefit of PostgreSQL
 itself, it will be a huge benefit for any PostgreSQL
 applications/middle ware those need to parse SQL statements.

I read your proposal says postgres.exe will link to libSQL.dll,
and pgpool.exe will also link to the DLL, right?

I think it is reasonable, but I'm not sure what part of postgres
should be in the DLL. Obviously we should avoid code duplication
between the DLL and postgres.exe.

 - create an exportable version of memory manager
 - create an exportable exception handling routines(i.e. elog)

Are there any other issues? For example,
  - How to split headers for raw parser nodes?
  - Which module do we define T_xxx enumerations and support functions?
(outfuncs, readfuncs, copyfuncs, and equalfuncs)

The proposal will be acceptable only when all of the technical issues
are solved. The libSQL should also be available in stand-alone.
It should not be a collection of half-baked functions.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii
 As was already discussed, I don't believe that premise.  None of the
 applications you cite would be able to make use of the raw parser
 output, because it doesn't contain the semantic information they need.
 If what you actually meant was the analyzed parse tree, that *might*
 serve the need depending on just what is wanted (in particular,
 properties that could be affected by the expansion of views or
 inlineable functions could still not be determined reliably).
 But you can't have that without access to the current system catalog
 contents.

No, what pgpoo-II needs is a raw parse tree. When it needs info in the
system catalog, it sends SELECT to PostgreSQL. So that would be no
problem.

 In any case there's the serious problem that we simply are not going
 to promise that the parser output representation is stable.  We've
 changed it many times in the past and will do so in the future.

That's acceptable at least for pgpool-II. Basically what I need is,
a)SQL statement type, b)target tables, c)target columns(functions)
etc., which seem pretty stable among versions. Even if PostgreSQL
changes the representation of the praser, pgpool-II could ask the
PostgreSQL version and could undertstand the different
representations. Pgpool-II has already done this with the system
catalog changes.

Also good thing is, the parser provides nice APIs to process the parse
tree: raw_expression_tree_walker, outfuncs and macros. Those will
absorb the version difference.

 Quite aside from whether the result would be of any use or not, that
 opinion is obviously wrong.  This would be at least as difficult to
 maintain as ecpg ... which has been a enormous time sink.

From reading README.parser of ecpg, the maintenance problem with ecpg
seems comes from that it needs to modify the grammer. My proposal
does not require the grammer changes. So I don't understand why you
think this would be difficult as ecpg.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii
 I read your proposal says postgres.exe will link to libSQL.dll,
 and pgpool.exe will also link to the DLL, right?

Perhaps.

 I think it is reasonable, but I'm not sure what part of postgres
 should be in the DLL. Obviously we should avoid code duplication
 between the DLL and postgres.exe.

  - create an exportable version of memory manager
  - create an exportable exception handling routines(i.e. elog)
 
 Are there any other issues? For example,
   - How to split headers for raw parser nodes?
   - Which module do we define T_xxx enumerations and support functions?
 (outfuncs, readfuncs, copyfuncs, and equalfuncs)
 
 The proposal will be acceptable only when all of the technical issues
 are solved. The libSQL should also be available in stand-alone.
 It should not be a collection of half-baked functions.

What do you mean by should also be available in stand-alone? If you
want more abstract API than libSQL, you could invent such a thing
based on it as much as you like. IMO anything need to parse/operate
the raw parse tree should be in libSQL.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-05-26 Thread Takahiro Itagaki

Tatsuo Ishii is...@sraoss.co.jp wrote:

  The proposal will be acceptable only when all of the technical issues
  are solved. The libSQL should also be available in stand-alone.
  It should not be a collection of half-baked functions.
 
 What do you mean by should also be available in stand-alone? If you
 want more abstract API than libSQL, you could invent such a thing
 based on it as much as you like. IMO anything need to parse/operate
 the raw parse tree should be in libSQL.

My stand-alone means libSQL can be used from many modules
without duplicated codes. For example, copy routines for raw
parse trees should be in the DLL rather than in postgres.exe.

Then, we need to consider other products than pgpool. Who will
use the dll? If pgpool is the only user, we might not allow to
modify core codes only for one usecase. More research other than
pgpool is required to decide the interface routines for libSQL.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] exporting raw parser

2010-05-26 Thread Tatsuo Ishii
 My stand-alone means libSQL can be used from many modules
 without duplicated codes. For example, copy routines for raw
 parse trees should be in the DLL rather than in postgres.exe.
 
 Then, we need to consider other products than pgpool. Who will
 use the dll? If pgpool is the only user, we might not allow to
 modify core codes only for one usecase. More research other than
 pgpool is required to decide the interface routines for libSQL.

If the user of the new API is only pgpool-II, I hadn't made the
propose in the first place. It's a waste of time and I would rather
keep on borrowing the parse code. I thought there were several people
who needed the API as well in the cluster meeting. If somebody who
made such a vote in the meeting is on the list, please express your
opinion for the API.

I'm not in the position of speaking for other products.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers