-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 25/04/2010 03:02, Tom Lane wrote:
> Robert Haas <[email protected]> writes:
>> On Sat, Apr 24, 2010 at 8:07 PM, Bruce Momjian <[email protected]> wrote:
>>> Sounds useful to me, though as a function like suggested in a later
>>> email.
>
>> If tool-builders think this is useful, I have no problem with making
>> it available. It should be suitably disclaimed: "We reserve the right
>> to rip out the entire flex/yacc-based lexer and parser at any time and
>> replace them with a hand-coded system written in Prolog that emits
>> tokenization information only in ASN.1-encoded pig latin. If massive
>> changes in the way this function works - or its complete disappearance
>> - are going to make you grumpy, don't call it."
>
> I'm a bit concerned with the vagueness of the goals here. We started
> with a request to dump out node trees, ie, post-parsing representation;
> but the example use case of syntax highlighting would find that
> representation quite useless. (Example: foo::bar and CAST(foo AS bar)
> yield the same parse tree.)
Well, the tokenizer stuff was actually my understanding of the following
quote from Michael Tharp :
« ... making the internal SQL parser available to clients via a
C-language SQL function. ».
I thought Michael was trying to write a tokenizer based on node tree
returned by raw_parser. As it seems Michael is not even sure about what
he's trying to do, I prefer refocus a bit this thread
> A syntax highlighter might get some use
> out of the lexer-output token stream, but I'm afraid from the proposed
> output that people might be expecting more semantic information than
> the lexer can provide. The lexer doesn't, for example, have any clue
> that some keywords are commands and others aren't; nor any very clear
> understanding about the semantic difference between the tokens '='
> and ';'.
Exact, a proper tokenizer function should be able to give some (simple)
information about the type of each token. That is what I tried to define
in this draft with the "type" field :
=> SELECT pgtokenize($script$
SELECT 1;
UPDATE test SET "a"=2;
$script$);
type | pos | value | line
-------------+-----+----------+------
SQL_COMMAND | 1 | 'SELECT' | 1
CONSTANT | 8 | '1' | 1
DELIMITER | 9 | ';' | 1
SQL_COMMAND | 11 | 'UPDATE' | 2
IDENTIFIER | 18 | 'test' | 2
SQL_KEYWORD | 23 | 'SET' | 2
IDENTIFIER | 27 | '"a"' | 2
OPERATOR | 30 | '=' | 2
CONSTANT | 31 | '1' | 2
>
> Also, if all you want is the lexer, it's not that hard to steal psql's
> version and adapt it to your purposes. The lexer doesn't change very
> fast, and it's not that big either.
Stealing the lexer from psql is possible...for C application.
Don't know yet if we could port it to other languages easily and if a
simple lexer would really answer the use cases here.
>
> Anyway, it certainly wouldn't be hard for an add-on module to provide a
> SRF that calls the lexer (or parser) and returns some sort of tabular
> representation of the results. I'm just not sure how useful it'll be
> in the real world.
Well, I would prefer not to tell users of pgAdmin or phpPgAdmin that
they depend on a contrib module.
Moreover, PostgreSQL already expose a lot of informations about its
internal mechanisms, configuration, ddl etc. I think having a proper
tokenizer function is just a natural new functionality for core if possible.
Having dropped an eye here and there in the parser code, I am not sure
where I could get required info and mix them to produce something close
to my draft yet.
But I prefer to discussing first before spending too much time and
throwing any potential code after...
>
> regards, tom lane
- --
JGuillaume (ioguix) de Rorthais
http://www.dalibo.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkvXdxgACgkQxWGfaAgowiJujQCglXpCYpFttwHOkmkCd92zMxnv
r00An1sjmRrR6u61VjCtXputcNBevHsz
=ri3i
-----END PGP SIGNATURE-----
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers