Re: Pretty printing SQL

Christopher Deckers Sun, 04 Mar 2012 13:38:08 -0800

Hi Lukas,

Disclaimer: I am just a pain when voicing opinions. This is only an
artefact to bring more heat to the debates in hope of finding the best (or
better) solutions :)


I chose not to re-use the logic used in the jOOQ console,


I have nothing against that statement.


as it is non-trivial to correctly parse SQL text for formatting without a
> sophisticated lexer.


I totally disagree. You don't need a sophisticated lexer to format Java,
XML, SQL and these kind of languages. I am not talking about PL-SQL or
T-SQL but normal SQL. In your case, you would also have the same issue with
free-style SQL anyway.


It is also hard to extend such a parser manually without breaking it.


I don't agree with that, because it is a simple manually coded state
machine with a few simplistic look ahead.
It's logic is very simple:
- Keep track of the indentation level.
- When there is an opening parenthesis, start a new line and increase the
indentation, with an exception: parentheses with a single word.
- When there is an SQL keyword, and the list is clearly defined and
extensible, go to next line but no indentation increase.
- When there is a quote, do not apply the above rules until a closing quote
is found.

Maybe I should add double quotes because you make extensive use of them,
but that is it.


For the console, this is less critical, as the SQL statement doesn't have
> to be executed.


This is not true: we format the code to work on it, debug it, run it. If it
were to break, it would not be much of interest.
I don't deny there could be bugs (though it works fine for us), but it is
deterministic, the rules are very simple to understand and it is easy to
add the SQL keywords generating a new line if we miss one.


Here are some sample statements, rendered with jOOQ's new pretty printing
> feature:
>

"pretty" is very subjective :)



> -- Nested selects in predicates
> --------------------------------------
> from "TEST"."T_BOOK"
> where "TEST"."T_BOOK"."ID" = any (select "array_table"."COLUMN_VALUE"
>                                   from (select 1 "COLUMN_VALUE"
>                                         from dual) "array_table")
>

As if I were to format Java code this way:

synchronized(myWonderfulLock) {if(someCondition) {
                                 dosomething();
                               }
}

I prefer to avoid horizontal scrolling, except for long lists of columns,
values. What matters to me are the keyword of SQL, not to have a new line
for arbitrary items and certain keywords on the same line. Also, the above
logic could make several block that are at the same indentation level to
not be aligned.

But as I said, this is very subjective. If you want to really support that
feature, you need to please anyone. That probably means a formatting
strategy that the user can implement with predefined strategies. Mine could
be plugged, or yours could be plugged :)


-- Nested selects in LIMIT .. OFFSET clauses
> --------------------------------------
> select * from (
>   select limit_104581173.*, rownum as rownum_104581173
>   from (
>     select
>       "TEST"."T_AUTHOR"."ID",
>       "TEST"."T_AUTHOR"."FIRST_NAME",
>       "TEST"."T_AUTHOR"."LAST_NAME",
>       "TEST"."T_AUTHOR"."DATE_OF_BIRTH",
>       "TEST"."T_AUTHOR"."YEAR_OF_BIRTH",
>       "TEST"."T_AUTHOR"."ADDRESS"
>     from "TEST"."T_AUTHOR"
>     order by "TEST"."T_AUTHOR"."ID" asc
>   ) limit_104581173
> )
> where rownum_104581173 > 0
> and rownum_104581173 <= (0 + 1)
>

This one is fine, except for the columns which I would prefer to see on the
same line. The fact that they are on multiple lines is probably interesting
here only because you prefix everything.
Side note: if pretty print is about easier reading, then using the schema
and removing the prefixes would be the first thing to do. In such case,
certain considerations could change (like all columns on same line).



> -- Nested JOIN clauses
> --------------------------------------
> select
>   "TEST"."T_BOOK"."TITLE",
>   "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME",
>   sum("MULTI_SCHEMA"."T_BOOK_SALE"."SOLD_FOR")
> from "TEST"."T_BOOK"
> join ("TEST"."T_BOOK_TO_BOOK_STORE"
> join "MULTI_SCHEMA"."T_BOOK_SALE" on
> ("MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME" =
> "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_STORE_NAME" and
>
> "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_ID" =
> "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_ID")) on
> "TEST"."T_BOOK_TO_BOOK_STORE"."BOOK_ID" = "TEST"."T_BOOK"."ID"
> group by
>   "TEST"."T_BOOK"."ID",
>   "TEST"."T_BOOK"."TITLE",
>   "MULTI_SCHEMA"."T_BOOK_SALE"."BOOK_STORE_NAME"
> order by
>   "TEST"."T_BOOK"."ID" asc,
>   sum("MULTI_SCHEMA"."T_BOOK_SALE"."SOLD_FOR") desc
> --------------------------------------
>

I find this one unreadable. Some new lines are keywords and some are
columns, there are opening parentheses without indentation, the multiple
and statements of the on clause are not where I would expect them, etc.
As a user, there is no logic that tells me how this is supposed to help me
understand the query.



> This formatter could be re-used in jOOQ console, as the console has a
> reference to org.jooq.Query. For formatting, use
>
> String sql = new Factory(
>     connection,
>     dialect,
>     new Settings().withRenderFormatted(true)).render(query);
>

I only process raw SQL, not query parts. How does your formatter work with
plain SQL?

Cheers,
-Christopher

Re: Pretty printing SQL

Reply via email to