Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On sön, 2009-11-22 at 00:23 -0500, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: Attached is an updated patch with a couple of tweaks to ensure output is formatted and spaced correctly when border=0, which was off in the last patch. Applied wih minor editorialization. Notably, I renamed the backwards-compatible option from ascii-old to old-ascii, because the original submission failed to preserve the documented behavior that the options could be abbreviated to one letter. What is the plan behind keeping the old format? Are we going to remove it after one release if no one complains, or are we seriously expecting that someone has code that actually parses this? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Peter Eisentraut pete...@gmx.net writes: What is the plan behind keeping the old format? Are we going to remove it after one release if no one complains, or are we seriously expecting that someone has code that actually parses this? Plan? Do we need a plan? The extra support consists of about two lines of code and a small table, so there wouldn't be much harm in leaving it there forever. On the other hand it seems pretty likely that no one would care after a release or two. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: Attached is an updated patch with a couple of tweaks to ensure output is formatted and spaced correctly when border=0, which was off in the last patch. Applied wih minor editorialization. Notably, I renamed the backwards-compatible option from ascii-old to old-ascii, because the original submission failed to preserve the documented behavior that the options could be abbreviated to one letter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sun, Nov 15, 2009 at 12:50:14AM +, Roger Leigh wrote: On Sat, Nov 14, 2009 at 01:31:29PM -0500, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: The side effect from this change is that some of the testsuite expected data will need updating due to the extra pad spaces No, we are *not* doing that. Somebody made a change to the print.c logic last year that started adding harmless white space to the last column, and it was a complete disaster for tracking whether anything important had changed in regression test output. Please undo that part of your patch. No problem, done as requested. I've attached an updated patch that takes care to exactly match the trailing whitespace the existing psql outputs. This fixes most of the changes between observed and expected test results. Attached is an updated patch with a couple of tweaks to ensure output is formatted and spaced correctly when border=0, which was off in the last patch. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7f03802..4b3fe71 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1765,18 +1765,40 @@ lo_import 152801 listitem para Sets the border line drawing style to one - of literalascii/literal or literalunicode/literal. - Unique abbreviations are allowed. (That would mean one - letter is enough.) + of literalascii/literal, literalascii-old/literal + or literalunicode/literal. Unique abbreviations are + allowed. (That would mean one letter is enough.) /para para - quoteASCII/quote uses plain acronymASCII/acronym characters. + quoteASCII/quote uses plain acronymASCII/acronym + characters. Newlines in data are shown using + a literal+/literal symbol in the right-hand margin, + while wrapped data uses a literal./literal symbol in the + right-hand margin of a wrapped line, and in the left-hand + margin of the following continuation line. /para para + quoteASCII-old/quote uses plain acronymASCII/acronym + characters, using the formatting style used + for productnamePostgreSQL/productname 8.4 and earlier. + Newlines in data are shown using a literal:/literal + symbol in place of the left-hand column separator, while + wrapped data uses a literal;/literal symbol. Newlines + in column headings are indicated by a literal+/literal + symbol in the left-hand margin of additional lines. + /para + + para quoteUnicode/quote uses Unicode box-drawing characters. - /para + Newlines in data are shown using a carriage return symbol + (literal#8629;/literal) in the right-hand margin. + Wrapped data uses an ellipsis symbol + (literal#8230;/literal) in the right-hand margin of a + wrapped line, and in the left-hand margin of the following + continuation line. + /para para When the selected output format is one that draws lines or boxes diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 190a8d3..544a677 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1795,11 +1795,13 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) ; else if (pg_strncasecmp(ascii, value, vallen) == 0) popt-topt.line_style = pg_asciiformat; + else if (pg_strncasecmp(ascii-old, value, vallen) == 0) + popt-topt.line_style = pg_asciiformat_old; else if (pg_strncasecmp(unicode, value, vallen) == 0) popt-topt.line_style = pg_utf8format; else { - psql_error(\\pset: allowed line styles are ascii, unicode\n); + psql_error(\\pset: allowed line styles are ascii, ascii-old, unicode\n); return false; } diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 026e043..5d1c8d4 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -45,9 +45,9 @@ static char *grouping; static char *thousands_sep; /* Line style control structures */ -const printTextFormat pg_asciiformat = +const printTextFormat pg_asciiformat_old = { - ascii, + ascii-old, { { -, +, +, + }, { -, +, +, + }, @@ -56,7 +56,36 @@ const printTextFormat pg_asciiformat = }, :, ;, - + , + +, + , + , + , + , + , + false +}; + +/* Line style control structures */ +const printTextFormat pg_asciiformat = +{ + ascii, + { + { -, +, +, + }, + { -, +, +, + }, + { -, +, +, + }, + { , |, |, | } + }, + |, + |, + |, + , + +, + , + +, + ., + ., + true }; const printTextFormat pg_utf8format = @@ -72,12 +101,23 @@ const
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, Nov 09, 2009 at 05:40:54PM -0500, Bruce Momjian wrote: Tom Lane wrote: Greg Stark gsst...@mit.edu writes: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. On the other hand, we changed that just a release or so ago (to put in the multiline output in the first place) and I didn't hear complaints about it that time. Sorry for the delayed reply: The line continuation characters were chosen in 8.4 for clarity --- if you have found something clearer for 8.5, let's make the improvement. I think clarity is more important in this area than consistency with the previous psql output format. The attached patch is proposed for the upcoming commitfest, and hopefully takes into account the comments made in this thread. To summarise the changes: - it makes the handling of newlines and wrapped lines consistent between column header and data lines. - it includes additional logic such that both the old and new styles are representable using the format structures, so we can retain backward compatibility if so desired (it's easy to remove if not). - an ascii-old linestyle is added which is identical to the old style for those who need guaranteed reproducibility of output, but this is not the default. - The Unicode format uses ↵ in the right-hand margin to indicate newlines. Wrapped lines use … in the right-hand margin before, and left-hand margin after, a break (so you can visually follow the wrap). - The ASCII format is the same but uses + and . in place of carriage return and ellipsis symbols. - All the above is documented in the SGML documentation, including the old style, which I always found confusing. For comparison, I've included a transcript of all three linestyles with a test script (also attached). Any changes to the symbols used and/or their placement are trivially made by just altering the format in print.c. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7f03802..4600303 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1765,18 +1765,40 @@ lo_import 152801 listitem para Sets the border line drawing style to one - of literalascii/literal or literalunicode/literal. - Unique abbreviations are allowed. (That would mean one - letter is enough.) + of literalascii/literal, literalascii-old/literal + or literalunicode/literal. Unique abbreviations are + allowed. (That would mean one letter is enough.) /para para - quoteASCII/quote uses plain acronymASCII/acronym characters. + quoteASCII/quote uses plain acronymASCII/acronym + characters. Newlines in data are shown using + a literal+/literal symbol in the right-hand margin. + Wrapped data uses a literal./literal symbol in the + right-hand margin of a wrapped line, and in the left-hand + margin of the following continuation line. /para para + quoteASCII-old/quote uses plain acronymASCII/acronym + characters, using the formatting style used + for productnamePostgreSQL/productname 8.4 and earlier. + Newlines in data are shown using a literal:/literal + symbol in place of the left-hand column separator, while + wrapped data uses a literal;/literal symbol. Newlines + in column headings are indicated by a literal+/literal + symbol in the left-hand margin of additional lines. + /para + + para quoteUnicode/quote uses Unicode box-drawing characters. - /para + Newlines in data are shown using a carriage return symbol + (literal#8629;/literal) in the right-hand margin. + Wrapped data uses an ellipsis symbol + (literal#8230;/literal) in the right-hand margin of a + wrapped line, and in the left-hand margin of the following + continuation line. + /para para When the selected output format is one that draws lines or boxes diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 190a8d3..544a677 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1795,11 +1795,13 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) ; else if (pg_strncasecmp(ascii,
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sat, Nov 14, 2009 at 05:40:24PM +, Roger Leigh wrote: On Mon, Nov 09, 2009 at 05:40:54PM -0500, Bruce Momjian wrote: Tom Lane wrote: Greg Stark gsst...@mit.edu writes: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. On the other hand, we changed that just a release or so ago (to put in the multiline output in the first place) and I didn't hear complaints about it that time. Sorry for the delayed reply: The line continuation characters were chosen in 8.4 for clarity --- if you have found something clearer for 8.5, let's make the improvement. I think clarity is more important in this area than consistency with the previous psql output format. The attached patch is proposed for the upcoming commitfest, and hopefully takes into account the comments made in this thread. One note: because it's not possible to know in advance (without making things much more complex) whether or not a line will wrap or continue, the code currently makes sure we fully pad output up to the right margin of the table (finalspaces). This is in some cases unnecessary, but is needed when border=1 or else the continuation/wrap symbols don't end up in the margin and will look like they are part of the data. The side effect from this change is that some of the testsuite expected data will need updating due to the extra pad spaces now added to the output (triggers, dependency, tsearch, foreign_data, prepare, with). If the actual output format is OK with people then I'll update the patch to include the test data updates as well. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: The side effect from this change is that some of the testsuite expected data will need updating due to the extra pad spaces No, we are *not* doing that. Somebody made a change to the print.c logic last year that started adding harmless white space to the last column, and it was a complete disaster for tracking whether anything important had changed in regression test output. Please undo that part of your patch. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sat, Nov 14, 2009 at 01:31:29PM -0500, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: The side effect from this change is that some of the testsuite expected data will need updating due to the extra pad spaces No, we are *not* doing that. Somebody made a change to the print.c logic last year that started adding harmless white space to the last column, and it was a complete disaster for tracking whether anything important had changed in regression test output. Please undo that part of your patch. No problem, done as requested. I've attached an updated patch that takes care to exactly match the trailing whitespace the existing psql outputs. This fixes most of the changes between observed and expected test results. Due to the fact that this patch does alter the output for newlines and wrapped lines (being its intent), the patch does alter expected testcase output for these specific cases. Because the old wrap/newline code put : and ; in place of | between columns, this meant that it never worked for the first column of data, which included single column result sets. This necessitated some changes to the expected results to reflect this change (which now makes the output uniform for all columns, irrespective of position). Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 7f03802..4b3fe71 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1765,18 +1765,40 @@ lo_import 152801 listitem para Sets the border line drawing style to one - of literalascii/literal or literalunicode/literal. - Unique abbreviations are allowed. (That would mean one - letter is enough.) + of literalascii/literal, literalascii-old/literal + or literalunicode/literal. Unique abbreviations are + allowed. (That would mean one letter is enough.) /para para - quoteASCII/quote uses plain acronymASCII/acronym characters. + quoteASCII/quote uses plain acronymASCII/acronym + characters. Newlines in data are shown using + a literal+/literal symbol in the right-hand margin, + while wrapped data uses a literal./literal symbol in the + right-hand margin of a wrapped line, and in the left-hand + margin of the following continuation line. /para para + quoteASCII-old/quote uses plain acronymASCII/acronym + characters, using the formatting style used + for productnamePostgreSQL/productname 8.4 and earlier. + Newlines in data are shown using a literal:/literal + symbol in place of the left-hand column separator, while + wrapped data uses a literal;/literal symbol. Newlines + in column headings are indicated by a literal+/literal + symbol in the left-hand margin of additional lines. + /para + + para quoteUnicode/quote uses Unicode box-drawing characters. - /para + Newlines in data are shown using a carriage return symbol + (literal#8629;/literal) in the right-hand margin. + Wrapped data uses an ellipsis symbol + (literal#8230;/literal) in the right-hand margin of a + wrapped line, and in the left-hand margin of the following + continuation line. + /para para When the selected output format is one that draws lines or boxes diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index 190a8d3..544a677 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1795,11 +1795,13 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) ; else if (pg_strncasecmp(ascii, value, vallen) == 0) popt-topt.line_style = pg_asciiformat; + else if (pg_strncasecmp(ascii-old, value, vallen) == 0) + popt-topt.line_style = pg_asciiformat_old; else if (pg_strncasecmp(unicode, value, vallen) == 0) popt-topt.line_style = pg_utf8format; else { - psql_error(\\pset: allowed line styles are ascii, unicode\n); + psql_error(\\pset: allowed line styles are ascii, ascii-old, unicode\n); return false; } diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 026e043..d382458 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -45,9 +45,9 @@ static char *grouping; static char *thousands_sep; /* Line style control structures */ -const printTextFormat pg_asciiformat = +const printTextFormat pg_asciiformat_old = { - ascii, + ascii-old, { { -, +, +, + }, { -, +, +, + }, @@ -56,7 +56,36 @@ const printTextFormat pg_asciiformat = }, :, ;, - + , + +, + , + , + , + , + , + false +}; + +/* Line
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Tom Lane wrote: Greg Stark gsst...@mit.edu writes: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. On the other hand, we changed that just a release or so ago (to put in the multiline output in the first place) and I didn't hear complaints about it that time. Sorry for the delayed reply: The line continuation characters were chosen in 8.4 for clarity --- if you have found something clearer for 8.5, let's make the improvement. I think clarity is more important in this area than consistency with the previous psql output format. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Andrew Dunstan wrote: You can set the field separator to ',' but you can't do a \pset format csv and get CSV with correct quoting, escaping etc AFAICS. It'll still break on line wrapping if wrapping is enabled, and with newlines in the data. If that would be a useful addition, I can add it. It's done by the backend, not by psql, but it has psql support - see \copy I agree with Roger that we should really have a CSV option in \pset format. COPY as CVS is certainly very useful, but it's a different use case. (You can't \copy a \-command for example). -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote: Greg Stark gsst...@mit.edu writes: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. On the other hand, we changed that just a release or so ago (to put in the multiline output in the first place) and I didn't hear complaints about it that time. The attached updated patch makes sure that the ASCII display remains the same (bar trailing whitespace used for padding). I'm tempted to add an additional ascii format such as ascii-clean which cleans up the inconsistencies in the formatting as for the unicode format, while the existing ascii format would remain the default for backwards compatibility. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 026e043..3f61293 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -56,7 +56,14 @@ const printTextFormat pg_asciiformat = }, :, ;, - + , + +, + , + , + , + , + , + false }; const printTextFormat pg_utf8format = @@ -72,12 +79,23 @@ const printTextFormat pg_utf8format = /* N/A, │, │, │ */ { , \342\224\202, \342\224\202, \342\224\202 } }, - /* ╎ */ - \342\225\216, - /* ┊ */ - \342\224\212, - /* ╷ */ - \342\225\267 + /* │ */ + \342\224\202, + /* │ */ + \342\224\202, + /* │ */ + \342\224\202, + , + /* ↵ */ + \342\206\265, + , + /* ↵ */ + \342\206\265, + /* … */ + \342\200\246, + /* … */ + \342\200\246, + true }; @@ -479,6 +497,8 @@ print_aligned_text(const printTableContent *cont, FILE *fout) int output_columns = 0; /* Width of interactive console */ bool is_pager = false; + printTextLineWrap *wrap; + if (cancel_pressed) return; @@ -499,6 +519,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) format_buf = pg_local_calloc(col_count, sizeof(*format_buf)); header_done = pg_local_calloc(col_count, sizeof(*header_done)); bytes_output = pg_local_calloc(col_count, sizeof(*bytes_output)); + wrap = pg_local_calloc(col_count, sizeof(*wrap)); } else { @@ -513,6 +534,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) format_buf = NULL; header_done = NULL; bytes_output = NULL; + wrap = NULL; } /* scan all column headers, find maximum width and max max_nl_lines */ @@ -575,7 +597,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) /* adjust the total display width based on border style */ if (opt_border == 0) - width_total = col_count - 1; + width_total = col_count; else if (opt_border == 1) width_total = col_count * 3 - 1; else @@ -770,16 +792,16 @@ print_aligned_text(const printTableContent *cont, FILE *fout) while (more_col_wrapping) { if (opt_border == 2) - fprintf(fout, %s%c, dformat-leftvrule, - curr_nl_line ? '+' : ' '); -else if (opt_border == 1) - fputc(curr_nl_line ? '+' : ' ', fout); + fputs(dformat-leftvrule, fout); for (i = 0; i cont-ncolumns; i++) { struct lineptr *this_line = col_lineptrs[i] + curr_nl_line; unsigned int nbspace; + if (opt_border != 0 || (format-cont_right_border == false i 0)) + fputs(curr_nl_line ? format-header_cont_left : , fout); + if (!header_done[i]) { nbspace = width_wrap[i] - this_line-width; @@ -796,21 +818,17 @@ print_aligned_text(const printTableContent *cont, FILE *fout) } else fprintf(fout, %*s, width_wrap[i], ); - if (i col_count - 1) - { - if (opt_border == 0) - fputc(curr_nl_line ? '+' : ' ', fout); - else - fprintf(fout, %s%c, dformat-midvrule, - curr_nl_line ? '+' : ' '); - } + + if (opt_border != 0 || format-cont_right_border == true) + fputs(!header_done[i] ? format-header_cont_right : , fout); + + if (opt_border != 0 i col_count - 1) + fputs(dformat-midvrule, fout); } curr_nl_line++; if (opt_border == 2) - fprintf(fout, %s, dformat-rightvrule); -else if (opt_border == 1) - fputc(' ', fout); + fputs(dformat-rightvrule, fout); fputc('\n', fout); } @@ -861,19 +879,28 @@ print_aligned_text(const printTableContent *cont, FILE *fout) /* left border */ if (opt_border == 2) -fprintf(fout, %s , dformat-leftvrule); - else if (opt_border == 1) -fputc(' ', fout); +fputs(dformat-leftvrule, fout); /* for each column */ for (j = 0; j col_count; j++) {
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, Oct 26, 2009 at 11:43 PM, Peter Eisentraut pete...@gmx.net wrote: On Mon, 2009-10-26 at 10:12 -0700, Greg Stark wrote: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. But as long as we're changing the format... It would at at least be good to test the behaviour What exactly are you referring to here? run something like this: $ psql stark= \o /tmp/s stark= select generate_series(1,10),generate_series(1,5); $ gnumeric /tmp/s $ ooffice /tmp/s $ kspread /tmp/s With the 8.4 formatting gnumeric automatically guesses that | is the separator and formats the speadsheet quite reasonably. Open Office gets confused and opens the word processor, but if you do insert sheet from file and manually deselect the space and semicolon delimiters and put | as an other delimiter then it looks like it should work. I don't have kspread handy. Does gnumeric still autorecognize the new formats? Do the newline indicators in 8.4 mess up gnumeric? Are the new ones better or worse? This hasn't been a top priority in the past and the ReST discussion seemed to end up concluding that we shouldn't bother if we can't make it perfect. I'm not sure I agree with that, but in any case I think as long as we're changing the format we may as well check to see what the status is. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sat, Oct 31, 2009 at 05:11:10AM -0700, Greg Stark wrote: On Mon, Oct 26, 2009 at 11:43 PM, Peter Eisentraut pete...@gmx.net wrote: On Mon, 2009-10-26 at 10:12 -0700, Greg Stark wrote: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. But as long as we're changing the format... It would at at least be good to test the behaviour What exactly are you referring to here? run something like this: $ psql stark= \o /tmp/s stark= select generate_series(1,10),generate_series(1,5); $ gnumeric /tmp/s $ ooffice /tmp/s $ kspread /tmp/s With the 8.4 formatting gnumeric automatically guesses that | is the separator and formats the speadsheet quite reasonably. Open Office gets confused and opens the word processor, but if you do insert sheet from file and manually deselect the space and semicolon delimiters and put | as an other delimiter then it looks like it should work. I don't have kspread handy. Does gnumeric still autorecognize the new formats? Do the newline indicators in 8.4 mess up gnumeric? Are the new ones better or worse? This hasn't been a top priority in the past and the ReST discussion seemed to end up concluding that we shouldn't bother if we can't make it perfect. I'm not sure I agree with that, but in any case I think as long as we're changing the format we may as well check to see what the status is. Surely if people want a machine-readable output format, they should either 1) use libpq or one of its bindings, or 2) use a dedicated machine-readable output format such as CSV, which is /designed/ for spreadsheet import. The standard psql output formats (aligned, unaligned) are for human-readable output and the others (latex, html, troff-ms) are marked up for the respective tools. None of these are really useful for other programs to parse. Wouldn't it be much simpler all around to add a csv output format in addition to the above for this purpose? Spreadsheets can read it in with no trouble at all. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh wrote: Wouldn't it be much simpler all around to add a csv output format in addition to the above for this purpose? Spreadsheets can read it in with no trouble at all. We've had CSV output since version 8.0. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sat, Oct 31, 2009 at 12:25:22PM -0400, Andrew Dunstan wrote: Roger Leigh wrote: Wouldn't it be much simpler all around to add a csv output format in addition to the above for this purpose? Spreadsheets can read it in with no trouble at all. We've had CSV output since version 8.0. Really? The only references I see are in tab-complete.c relating to COPY. You can set the field separator to ',' but you can't do a \pset format csv and get CSV with correct quoting, escaping etc AFAICS. It'll still break on line wrapping if wrapping is enabled, and with newlines in the data. If that would be a useful addition, I can add it. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh wrote: On Sat, Oct 31, 2009 at 12:25:22PM -0400, Andrew Dunstan wrote: Roger Leigh wrote: Wouldn't it be much simpler all around to add a csv output format in addition to the above for this purpose? Spreadsheets can read it in with no trouble at all. We've had CSV output since version 8.0. Really? The only references I see are in tab-complete.c relating to COPY. You can set the field separator to ',' but you can't do a \pset format csv and get CSV with correct quoting, escaping etc AFAICS. It'll still break on line wrapping if wrapping is enabled, and with newlines in the data. If that would be a useful addition, I can add it. It's done by the backend, not by psql, but it has psql support - see \copy cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, 2009-10-26 at 10:12 -0700, Greg Stark wrote: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. But as long as we're changing the format... It would at at least be good to test the behaviour What exactly are you referring to here? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, Oct 26, 2009 at 11:33:40PM +, Roger Leigh wrote: On Mon, Oct 26, 2009 at 07:19:24PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote: Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. I checked (using strace) gnumeric (via libgda and gnome-database-properties) openoffice (oobase) Even if that were the entire universe of programs we cared about, whether their internal ODBC logic goes through psql isn't really the point here. What I'm worried about is somebody piping the text output of psql into another program. On a related note, there's something odd with the pager code. Hm, what platform are you testing that on? Debian GNU/Linux (unstable) linux2.6.30 eglibc 2.10.1 libreadline6 6.0.5 libncurses5 5.7 gcc 4.3.4 This is the trace of the broken write: 16206 write(1, Name \342\224\202 Owner \342\224..., 102) = 102 16206 write(1, \342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342 \224..., 256) = 256 16206 write(1, \224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\n, 18) = 18 Further tracing showed this to be a bug in the util-linux version of more which had a static 256 byte line buffer. The above was a red herring--it's writing to a pipe. I've sent a patch to fix this by increasing the buffer size. -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On sön, 2009-10-25 at 23:48 +, Roger Leigh wrote: Just for reference, this is what the output looks like (abridged) using the attached patch. Should display fine if your mail client handles UTF-8 messages correctly: rleigh=# \l List of databases Name │ Owner │ Encoding │ Collation │Ctype│ Access privileges ─┼──┼──┼─┼─┼─── merkelpb│ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ postgres│ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ projectb│ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ rleigh │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ […] template0 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres template1 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres […] (17 rows) That's pretty much what I had in mind. Cool. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
2009/10/25 Roger Leigh rle...@codelibre.net: rleigh=# \l List of databases Name │ Owner │ Encoding │ Collation │ Ctype │ Access privileges ─┼──┼──┼─┼─┼─── merkelpb │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ postgres │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ projectb │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ rleigh │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ […] template0 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres template1 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres […] (17 rows) While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. But as long as we're changing the format... It would at at least be good to test the behaviour -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Greg Stark gsst...@mit.edu writes: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. On the other hand, we changed that just a release or so ago (to put in the multiline output in the first place) and I didn't hear complaints about it that time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote: Greg Stark gsst...@mit.edu writes: While i agree this looks nicer I wonder what it does to things like excel/gnumeric/ooffice auto-recognizing table layouts and importing files. I'm not sure our old format was so great for this so maybe this is actually an improvement I'm asking for. Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. On the other hand, we changed that just a release or so ago (to put in the multiline output in the first place) and I didn't hear complaints about it that time. I checked (using strace) gnumeric (via libgda and gnome-database-properties) openoffice (oobase) Both spreadsheets require a connection to be set up first for them to use as a handle, so I did that and traced from there. Neither made any use of psql; they both appear to use libpq via their respective database abstraction libs--no forking of any children observed. Excel is a bit tougher, I bought my first copy last week for other reasons, but I lack both windows expertise and debugging tools to trace things, and I also dual boot my computer with the postgres install on the Linux partition, making connecting to the database rather hard! I think someone else is better suited to check this one! On a related note, there's something odd with the pager code. The output of \l with the pager off: rleigh=# \l List of databases Name │ Owner │ Encoding │ Collation │Ctype│ Access privileges ─┼──┼──┼─┼─┼─── [...] (header line is 91 characters, 273 bytes) And with the pager on: rleigh=# \l List of databases Name │ Owner │ Encoding │ Collation │Ctype│ Access privileges ─┼──┼──┼─┼─┼─ ��─ [...] (longest header line 85 characters, 255 bytes, 256 bytes inc. LF, remainder on second line) Note that the pager wasn't required and so wasn't actually invoked, but the output was corrupted. A newline was inserted almost at the end of the line and the continuation lacks a leading \342 which (since these UTF-8 codes are all three-byte) leads to two bytes which are invalid UTF-8. Since this spurious newline got inserted exactly on a 256 byte boundary, I was wondering if there was some buffer either internal to psql or in the termios/pty layer that was getting flushed. It also lost the first byte of the second line (possibly swapped for the \n). Another wierdness: it only happens if the terminal width is 85 columns wide, otherwise it just wraps around as one would expect! AFAICT there are no 255/256 length buffers in the code, and the code doing the printing is just doing stdio to fout which is either stdout or a pipe! Because of this, I can't see how the spurious \n appears in the middle of a simple loop. If border=2, you'll see this for all top mid and bottom ruled lines. I do see strace showing some termios fiddling, could that be at fault or is that just readline ncurses initialisation? Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote: Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. I checked (using strace) gnumeric (via libgda and gnome-database-properties) openoffice (oobase) Even if that were the entire universe of programs we cared about, whether their internal ODBC logic goes through psql isn't really the point here. What I'm worried about is somebody piping the text output of psql into another program. On a related note, there's something odd with the pager code. Hm, what platform are you testing that on? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, Oct 26, 2009 at 07:19:24PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: On Mon, Oct 26, 2009 at 01:33:19PM -0400, Tom Lane wrote: Yeah. We can do what we like with the UTF8 format but I'm considerably more worried about the aspect of making random changes to the plain-ASCII output. I checked (using strace) gnumeric (via libgda and gnome-database-properties) openoffice (oobase) Even if that were the entire universe of programs we cared about, whether their internal ODBC logic goes through psql isn't really the point here. What I'm worried about is somebody piping the text output of psql into another program. On a related note, there's something odd with the pager code. Hm, what platform are you testing that on? Debian GNU/Linux (unstable) linux2.6.30 eglibc 2.10.1 libreadline6 6.0.5 libncurses5 5.7 gcc 4.3.4 This is the trace of the broken write: 16206 write(1, Name \342\224\202 Owner \342\224..., 102) = 102 16206 write(1, \342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\342 \224..., 256) = 256 16206 write(1, \224\200\342\224\200\342\224\200\342\224\200\342\224\200\342\224\200\n, 18) = 18 I'll attach the whole thing for reference. What's clear is that the first write was *exactly* 256 bytes, which is what was requested, presumably by libc stdio buffering (which shouldn't by itself be a problem). Since we use 3-byte UTF-8 and 256/3 is 85 + 1 remainder, this is where the wierd 85 char forced newline comes from. Since it only happens when the terminal window is 85 chars, that's where I'm assuming some odd termios influence comes from (though it might just be the source of the window size and be completely innocent). The fact that libc did the two separate writes kind of rules out termios mangling the output post-write(). Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. 16203 execve(./psql, [./psql], [/* 43 vars */]) = 0 16203 brk(0)= 0x1dbb000 16203 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff0b8697000 16203 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) 16203 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff0b8695000 16203 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) 16203 open(/usr/local/pgsql/lib/tls/x86_64/libpq.so.5, O_RDONLY) = -1 ENOENT (No such file or directory) 16203 stat(/usr/local/pgsql/lib/tls/x86_64, 0x7fffeefa1f10) = -1 ENOENT (No such file or directory) 16203 open(/usr/local/pgsql/lib/tls/libpq.so.5, O_RDONLY) = -1 ENOENT (No such file or directory) 16203 stat(/usr/local/pgsql/lib/tls, 0x7fffeefa1f10) = -1 ENOENT (No such file or directory) 16203 open(/usr/local/pgsql/lib/x86_64/libpq.so.5, O_RDONLY) = -1 ENOENT (No such file or directory) 16203 stat(/usr/local/pgsql/lib/x86_64, 0x7fffeefa1f10) = -1 ENOENT (No such file or directory) 16203 open(/usr/local/pgsql/lib/libpq.so.5, O_RDONLY) = -1 ENOENT (No such file or directory) 16203 stat(/usr/local/pgsql/lib, 0x7fffeefa1f10) = -1 ENOENT (No such file or directory) 16203 open(/etc/ld.so.cache, O_RDONLY) = 3 16203 fstat(3, {st_mode=S_IFREG|0644, st_size=115844, ...}) = 0 16203 mmap(NULL, 115844, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff0b8678000 16203 close(3) = 0 16203 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) 16203 open(/usr/lib/libpq.so.5, O_RDONLY) = 3 16203 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\0\214\0\0\0\0\0\0..., 832) = 832 16203 fstat(3, {st_mode=S_IFREG|0644, st_size=161848, ...}) = 0 16203 mmap(NULL, 2257336, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff0b845 16203 mprotect(0x7ff0b8475000, 2097152, PROT_NONE) = 0 16203 mmap(0x7ff0b8675000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7ff0b8675000 16203 close(3) = 0 16203 access(/etc/ld.so.nohwcap, F_OK) = -1 ENOENT (No such file or directory) 16203 open(/lib/libreadline.so.5, O_RDONLY) = 3 16203 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0pMa\3157\0\0\0..., 832) = 832 16203 fstat(3, {st_mode=S_IFREG|0644, st_size=260632, ...}) = 0 16203 mmap(0x37cd60, 2358528, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x37cd60 16203 mprotect(0x37cd638000, 2093056, PROT_NONE) = 0 16203 mmap(0x37cd837000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x37000) = 0x37cd837000 16203 mmap(0x37cd83f000, 3328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x37cd83f000 16203 close(3) = 0 16203 access(/etc/ld.so.nohwcap,
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sat, Oct 24, 2009 at 06:23:24PM +0100, Roger Leigh wrote: On Fri, Oct 16, 2009 at 01:38:15PM +0300, Peter Eisentraut wrote: I like the new Unicode tables, but the marking of continuation lines looks pretty horrible: List of databases Name │ Owner │ Encoding │ Collation │ Ctype │ Access privileges ───┼───┼──┼───┼───┼─── pl_regression │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ postgres │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ template0 │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ =c/peter ╷ ╷ ╷ ╷ ╎ peter=CTc/peter template1 │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ =c/peter ╷ ╷ ╷ ╷ ╎ peter=CTc/peter (4 rows) Just for reference, this is what the output looks like (abridged) using the attached patch. Should display fine if your mail client handles UTF-8 messages correctly: rleigh=# \l List of databases Name │ Owner │ Encoding │ Collation │Ctype│ Access privileges ─┼──┼──┼─┼─┼─── merkelpb│ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ postgres│ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ projectb│ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ rleigh │ rleigh │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ […] template0 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres template1 │ postgres │ UTF8 │ en_GB.UTF-8 │ en_GB.UTF-8 │ =c/postgres ↵ │ │ │ │ │ postgres=CTc/postgres […] (17 rows) This looks more like a rendering mistake than something sensible, and also it doesn't actually help the viewer to tell what lines are continued, which was the original purpose. I've worked on a solution to this, and the preliminary patch for this is attached. Note there are additional comments in the patch text. I've attached an updated version of the patch. This now - adds one column extra padding in wrapped mode if border=0 to allow correct display of wrap marks on the rightmost column. Removed special cases needed to suppress marks on the rightmost column. - updated unicode symbols to use more commonly-available symbol (ellipsis) rather than hooked arrows. Also makes wrapping more visually distinct cf. newlines. The CR and ellipsis symbols are commonly available in several character sets other than unicode, and are available in many common fonts, including all the Windows monospaced terminal fonts. Symbol availability isn't an issue on GNU/Linux. - updated ASCII symbols to use similar-looking symbols to the Unicode ones. - added some more comments to code - removed redundant enum values I've also attached a trivial test script to run through psql to test the wrapping, as well as the output I get for psql from 8.4.1 and the current mainline (both Unicode and ASCII) for comparison. Hopefully I'm not the only one that finds the proposed change clearer and more logical than the existing display! There's just one tiny display glitch I can see, and that's if you have mixed wrapping and newlines, you miss the lefthand wrap mark if the line is the last wrapped line and it ends in a newline. It might not be possible to pick that up if we can't discover that from the line state when printing out the line. Might possibly require storing the wrap state so we know what happened on the previous line, though if there's a slightly cleverer approach to detecting if we're already wrapped that would be better. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. From 80a956c0d24e53a1b44619774b099364a585cdd0 Mon Sep 17 00:00:00 2001 From: Roger Leigh rle...@debian.org Date: Sat, 24 Oct 2009 18:11:01 +0100 Subject: [PATCH] psql: Clean up line wrapping for Unicode display MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously, there were two different methods for displaying wrapping: 1) Columns headers used a + symbol in the left-hand border when the column heading contained newlines. 2) Data lines used either a : or ; symbol to represent wrapped and new lines, and to represent empty lines. These symbols replaced the | immediately to the left of the column. This patch unifies both of these, by making (1)-like formatting the behaviour for all new lines in both column headers and data. It indicates wrapping by using . symbols in the border for
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sun, Oct 25, 2009 at 11:48:27PM +, Roger Leigh wrote: There's just one tiny display glitch I can see, and that's if you have mixed wrapping and newlines, you miss the lefthand wrap mark if the line is the last wrapped line and it ends in a newline. It might not be possible to pick that up if we can't discover that from the line state when printing out the line. Might possibly require storing the wrap state so we know what happened on the previous line, though if there's a slightly cleverer approach to detecting if we're already wrapped that would be better. This appears simpler and more robust, so I've gone with this approach, and attached a new patch which fixes it. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 026e043..36183f2 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -54,9 +54,10 @@ const printTextFormat pg_asciiformat = { -, +, +, + }, { , |, |, | } }, - :, - ;, - + , + +, + ., + . }; const printTextFormat pg_utf8format = @@ -72,12 +73,13 @@ const printTextFormat pg_utf8format = /* N/A, │, │, │ */ { , \342\224\202, \342\224\202, \342\224\202 } }, - /* ╎ */ - \342\225\216, - /* ┊ */ - \342\224\212, - /* ╷ */ - \342\225\267 + , + /* ↵ */ + \342\206\265, + /* … */ + \342\200\246, + /* … */ + \342\200\246 }; @@ -479,6 +481,8 @@ print_aligned_text(const printTableContent *cont, FILE *fout) int output_columns = 0; /* Width of interactive console */ bool is_pager = false; + printTextLineWrap *wrap; + if (cancel_pressed) return; @@ -499,6 +503,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) format_buf = pg_local_calloc(col_count, sizeof(*format_buf)); header_done = pg_local_calloc(col_count, sizeof(*header_done)); bytes_output = pg_local_calloc(col_count, sizeof(*bytes_output)); + wrap = pg_local_calloc(col_count, sizeof(*wrap)); } else { @@ -513,6 +518,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) format_buf = NULL; header_done = NULL; bytes_output = NULL; + wrap = NULL; } /* scan all column headers, find maximum width and max max_nl_lines */ @@ -575,7 +581,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) /* adjust the total display width based on border style */ if (opt_border == 0) - width_total = col_count - 1; + width_total = col_count; else if (opt_border == 1) width_total = col_count * 3 - 1; else @@ -770,16 +776,16 @@ print_aligned_text(const printTableContent *cont, FILE *fout) while (more_col_wrapping) { if (opt_border == 2) - fprintf(fout, %s%c, dformat-leftvrule, - curr_nl_line ? '+' : ' '); -else if (opt_border == 1) - fputc(curr_nl_line ? '+' : ' ', fout); + fputs(dformat-leftvrule, fout); for (i = 0; i cont-ncolumns; i++) { struct lineptr *this_line = col_lineptrs[i] + curr_nl_line; unsigned int nbspace; + if (opt_border != 0) + fputs(curr_nl_line ? format-cont_left : , fout); + if (!header_done[i]) { nbspace = width_wrap[i] - this_line-width; @@ -796,21 +802,16 @@ print_aligned_text(const printTableContent *cont, FILE *fout) } else fprintf(fout, %*s, width_wrap[i], ); - if (i col_count - 1) - { - if (opt_border == 0) - fputc(curr_nl_line ? '+' : ' ', fout); - else - fprintf(fout, %s%c, dformat-midvrule, - curr_nl_line ? '+' : ' '); - } + + fputs(!header_done[i] ? format-cont_right : , fout); + + if (opt_border != 0 i col_count - 1) + fputs(dformat-midvrule, fout); } curr_nl_line++; if (opt_border == 2) - fprintf(fout, %s, dformat-rightvrule); -else if (opt_border == 1) - fputc(' ', fout); + fputs(dformat-rightvrule, fout); fputc('\n', fout); } @@ -861,19 +862,28 @@ print_aligned_text(const printTableContent *cont, FILE *fout) /* left border */ if (opt_border == 2) -fprintf(fout, %s , dformat-leftvrule); - else if (opt_border == 1) -fputc(' ', fout); +fputs(dformat-leftvrule, fout); /* for each column */ for (j = 0; j col_count; j++) { /* We have a valid array element, so index it */ struct lineptr *this_line = col_lineptrs[j][curr_nl_line[j]]; -int bytes_to_output; -int chars_to_output = width_wrap[j]; +int bytes_to_output; +int chars_to_output = width_wrap[j]; bool finalspaces = (opt_border == 2 || j col_count - 1); +/* Print left-hand wrap or newline mark */ +if (opt_border != 0) +{ + if (wrap[j] == PRINT_LINE_WRAP_WRAP) + fputs(format-wrap_left, fout); + else if (wrap[j] == PRINT_LINE_WRAP_CONT) +
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Fri, Oct 16, 2009 at 01:38:15PM +0300, Peter Eisentraut wrote: I like the new Unicode tables, but the marking of continuation lines looks pretty horrible: List of databases Name │ Owner │ Encoding │ Collation │ Ctype │ Access privileges ───┼───┼──┼───┼───┼─── pl_regression │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ postgres │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ template0 │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ =c/peter ╷ ╷ ╷ ╷ ╎ peter=CTc/peter template1 │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ =c/peter ╷ ╷ ╷ ╷ ╎ peter=CTc/peter (4 rows) This looks more like a rendering mistake than something sensible, and also it doesn't actually help the viewer to tell what lines are continued, which was the original purpose. I've worked on a solution to this, and the preliminary patch for this is attached. Note there are additional comments in the patch text. This patch does break backward compatibility with older psql versions, by using symbols in the column border to mark newlines and wrapped lines rather than putting ':' and ';' and ' ' symbols in the vertical table lines. This makes things somewhat more consistent and readable but at the expense of not perfectly preserving output. The ASCII rules are a little more compatible than the Unicode rules, but both do break things a little. If the data lines do not contain either newlines or wrapped text, then the output will remain unchanged. Any feedback would be appreciated. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. From 245c6cabe7d74afef976149fd675bd99d711cade Mon Sep 17 00:00:00 2001 From: Roger Leigh rle...@debian.org Date: Sat, 24 Oct 2009 18:11:01 +0100 Subject: [PATCH] psql: Clean up line wrapping for Unicode display MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously, there were two different methods for displaying wrapping: 1) Columns headers used a + symbol in the left-hand border when the column heading contained newlines. 2) Data lies used either a : or ; symbol to represent wrapped and new lines, and to represent empty lines. These symbols replaced the | immediately to the left of the column. This patch unifies both of these, by making (1) the behaviour for all new lines in both column headers and data. It indicates wrapping by using - symbols in the border for continuations. When using Unicode, these are replaced by ↵ (carriage return) for newlines and ↩ and ↪ (curly arrows) for continuations. In the special case of a border size of zero, only the right-hand border symbols are drawn, since there is no space for any other output. The right-most column will not display any symbols for backward compatibility, but this can be adjusted (I think it would make the display somewhat more consistent). --- src/bin/psql/print.c | 114 - src/bin/psql/print.h | 17 ++- 2 files changed, 79 insertions(+), 52 deletions(-) diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 026e043..f20eab1 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -54,9 +54,10 @@ const printTextFormat pg_asciiformat = { -, +, +, + }, { , |, |, | } }, - :, - ;, - + +, + , + -, + - }; const printTextFormat pg_utf8format = @@ -72,12 +73,13 @@ const printTextFormat pg_utf8format = /* N/A, │, │, │ */ { , \342\224\202, \342\224\202, \342\224\202 } }, - /* ╎ */ - \342\225\216, - /* ┊ */ - \342\224\212, - /* ╷ */ - \342\225\267 + , + /* ↵ */ + \342\206\265, + /* ↪ */ + \342\206\252, + /* ↩ */ + \342\206\251 }; @@ -770,16 +772,16 @@ print_aligned_text(const printTableContent *cont, FILE *fout) while (more_col_wrapping) { if (opt_border == 2) - fprintf(fout, %s%c, dformat-leftvrule, - curr_nl_line ? '+' : ' '); -else if (opt_border == 1) - fputc(curr_nl_line ? '+' : ' ', fout); + fputs(dformat-leftvrule, fout); for (i = 0; i cont-ncolumns; i++) { struct lineptr *this_line = col_lineptrs[i] + curr_nl_line; unsigned int nbspace; + if (opt_border != 0) + fputs(curr_nl_line ? format-cont_left : , fout); + if (!header_done[i]) { nbspace = width_wrap[i] - this_line-width; @@ -796,21 +798,16 @@ print_aligned_text(const printTableContent *cont, FILE *fout) } else fprintf(fout, %*s, width_wrap[i], ); - if (i col_count - 1) - { - if (opt_border == 0) - fputc(curr_nl_line ? '+' : ' ', fout); - else - fprintf(fout, %s%c, dformat-midvrule, - curr_nl_line ? '+' : '
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
(HTML mail to preserve formatting; let's see if it works.) I like the new Unicode tables, but the marking of continuation lines looks pretty horrible: List of databases Name │ Owner │ Encoding │ Collation │ Ctype │ Access privileges ───┼───┼──┼───┼───┼─── pl_regression │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ postgres │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ template0 │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ =c/peter ╷ ╷ ╷ ╷ ╎ peter=CTc/peter template1 │ peter │ LATIN2 │ cs_CZ │ cs_CZ │ =c/peter ╷ ╷ ╷ ╷ ╎ peter=CTc/peter (4 rows) This looks more like a rendering mistake than something sensible, and also it doesn't actually help the viewer to tell what lines are continued, which was the original purpose. On a side note, I don't really understand why the access privileges need to be broken up over multiple lines. There is plenty of space left on the line. Note that the above is close to a default setup, so that is what many people will see by default from now on.
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Fri, Oct 16, 2009 at 01:38:15PM +0300, Peter Eisentraut wrote: (HTML mail to preserve formatting; let's see if it works.) I like the new Unicode tables, but the marking of continuation lines looks pretty horrible: Yes, I'm not so keen myself. The ASCII characters used are '|', ':' and ' ' for normal, wrapped and newlines. Here, we are using vertical lines with two or three dashes to replace the latter two. As mentioned earlier in the thread, this patch will be followed up with a futher patch to unify the folding/newline behaviour between column header and data lines, which will replace these with normal vertical lines in all cases, with the addition of characters in the margin such as as a CR symbol or ellipsis to indicate newlines or wrapping. To preserve backward compatibility, the ASCII output will remain unchanged. I would have done this in the patch as applied, but was asked to hold off doing that until the initial work was committed. Note that the above is close to a default setup, so that is what many people will see by default from now on. I'll try to get the above proposed change done in the next week or so, in time for the next CommitFest, if you agree with the general idea. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Oct 13, 2009 at 05:08:20PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: The attached updated patch renames all user-visible uses of utf8 to unicode. It also updates the documentation regarding locale to psql client character set encoding so the docs now match the code exactly. Applied with light editorialization. The main non-cosmetic change I made was to postpone selection of default line_style until runtime (see get_line_style() in the committed patch). The original coding required knowledge of the line_style default rule not only in three different places in psql, but in every other place using print.c, such as createlang/droplang -l (which dumped core with the patch as submitted). I changed it so that leaving line_style NULL implies the default encoding-driven behavior, so that we don't need to touch any of the callers. Thanks. I agree with your improvement in get_line_style(), it's cleaner this way; I didn't realise print.c was used by the other programs, sorry about that. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: The attached updated patch renames all user-visible uses of utf8 to unicode. It also updates the documentation regarding locale to psql client character set encoding so the docs now match the code exactly. Applied with light editorialization. The main non-cosmetic change I made was to postpone selection of default line_style until runtime (see get_line_style() in the committed patch). The original coding required knowledge of the line_style default rule not only in three different places in psql, but in every other place using print.c, such as createlang/droplang -l (which dumped core with the patch as submitted). I changed it so that leaving line_style NULL implies the default encoding-driven behavior, so that we don't need to touch any of the callers. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Fri, Oct 09, 2009 at 04:35:46PM -0500, Kevin Grittner wrote: Peter Eisentraut pete...@gmx.net wrote: I think the setting ought be called linestyle unicode (instead of utf8), since the same setting would presumably work in case we ever implement UTF-16 support on the client side. Yeah, anytime one gets sloppy with the distinction between a character set and a character encoding scheme, one tends to regret it, sooner or later. Here's we're talking about which glyphs to show -- that's based on a character set. The attached updated patch renames all user-visible uses of utf8 to unicode. It also updates the documentation regarding locale to psql client character set encoding so the docs now match the code exactly. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 85e9375..c3b2c28 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1690,6 +1690,53 @@ lo_import 152801 /varlistentry varlistentry + termliterallinestyle/literal/term + listitem + para + Sets the line drawing style of text table output to one + of literalascii/literal or literalunicode/literal. + Unique abbreviations are allowed. (That would mean one + letter is enough.) literalunicode/literal will be + selected if used by the psql client character set encoding, + otherwise literalascii/literal will be used by default. + /para + + para + Query result tables are displayed as text for some output + formats (literalunaligned/literal, + literalaligned/literal, and literalwrapped/literal + formats). The tables are drawn using characters of the psql + client character set encoding. By default, + the acronymASCII/acronym character set will be used, + which will display correctly for all encodings. However, if + the user is using an encoding based upon the Universal + Character Set (quoteUnicode/quote), for + example acronymUTF8/acronym, the Unicode box drawing + characters will be used in place of ASCII punctuation in + order to display more readable tables. + /para + + para + This option is useful for overriding the default line style, + for example to force the use of + only acronymASCII/acronym characters when extended + character sets such as Unicode are inappropriate. This + might be the case if preserving output compatibility with + older psql versions is important (prior to 8.5.0). + /para + + para + quoteUnicode/quote use Unicode box drawing characters. + /para + + para + quoteASCII/quote use plain acronymASCII/acronym characters. + /para + + /listitem + /varlistentry + + varlistentry termliteralcolumns/literal/term listitem para diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index da57eb4..6f34e68 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -46,6 +46,7 @@ #include input.h #include large_obj.h #include mainloop.h +#include mbprint.h #include print.h #include psqlscan.h #include settings.h @@ -596,6 +597,14 @@ exec_command(const char *cmd, /* save encoding info into psql internal data */ pset.encoding = PQclientEncoding(pset.db); pset.popt.topt.encoding = pset.encoding; +if (!pset.popt.topt.line_style_set) +{ + if (pset.encoding == get_utf8_id()) + pset.popt.topt.line_style = utf8format; + else + pset.popt.topt.line_style = asciiformat; +} + SetVariable(pset.vars, ENCODING, pg_encoding_to_char(pset.encoding)); } @@ -1432,6 +1441,13 @@ SyncVariables(void) /* get stuff from connection */ pset.encoding = PQclientEncoding(pset.db); pset.popt.topt.encoding = pset.encoding; + if (!pset.popt.topt.line_style_set) + { + if (pset.encoding == get_utf8_id()) + pset.popt.topt.line_style = utf8format; + else + pset.popt.topt.line_style = asciiformat; + } pset.sversion = PQserverVersion(pset.db); SetVariable(pset.vars, DBNAME, PQdb(pset.db)); @@ -1772,6 +1788,27 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) printf(_(Output format is %s.\n), _align2string(popt-topt.format)); } + /* set table line style */ + else if (strcmp(param, linestyle) == 0) + { + if (!value) + ; + else if (pg_strncasecmp(ascii, value, vallen) == 0) + popt-topt.line_style = asciiformat; + else if (pg_strncasecmp(unicode, value, vallen) == 0) + popt-topt.line_style = utf8format; + else + { + psql_error(\\pset: allowed
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, 2009-10-06 at 19:35 +0100, Roger Leigh wrote: This patch included a bit of code not intended for inclusion (setting of client encoding based on locale), which the attached (and hopefully final!) revision of the patch excludes. Well, the documentation still claims that this is dependent on the locale. This should be updated to match the code. I think the setting ought be called linestyle unicode (instead of utf8), since the same setting would presumably work in case we ever implement UTF-16 support on the client side. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Peter Eisentraut pete...@gmx.net wrote: I think the setting ought be called linestyle unicode (instead of utf8), since the same setting would presumably work in case we ever implement UTF-16 support on the client side. Yeah, anytime one gets sloppy with the distinction between a character set and a character encoding scheme, one tends to regret it, sooner or later. Here's we're talking about which glyphs to show -- that's based on a character set. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tuesday 06 October 2009 11:35:03 Roger Leigh wrote: On Tue, Oct 06, 2009 at 10:44:27AM +0100, Roger Leigh wrote: On Mon, Oct 05, 2009 at 04:32:08PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote: Elsewhere in the psql code, notably in mbprint.c, we make the decision on whether to apply certain Unicode-aware processing based on whether the client encoding is UTF8. The same should be done here. There is a patch somewhere in the pipeline that would automatically set the psql client encoding to whatever the locale says, but until that is done, the client encoding should be the sole setting that rules what kind of character set processing is done on the client side. OK, that makes sense to a certain extent. However, the characters used to draw the table lines are not really that related to the client encoding for data sent from the database (IMHO). Huh? The data *in* the table is going to be in the client_encoding, and psql contains no mechanisms that would translate it to something else. Surrounding it with decoration in a different encoding is just a recipe for breakage. Ah, I was under the mistaken assumption that this was iconv()ed or otherwise translated for correct display. In that case, I'll leave the patch as is (using the client encoding for table lines). I've attached an updated copy of the patch (it just removes the now unneeded langinfo.h header). This patch included a bit of code not intended for inclusion (setting of client encoding based on locale), which the attached (and hopefully final!) revision of the patch excludes. Regards, Roger I looked at psql-utf8-table-9.patch. It applies, builds and installs. `gmake check` passes and is not affected by values of LANG or LC_ALL in the environment. HTML and man documentation builds and looks good. The patch doesn't introduce new lint. The psql utility displays UTF8 line art when the client encoding is UTF8, and ASCII line art is displayed otherwise. This can be overridden with `\pset linestyle ASCII` or `\pset linestyle UTF8`. The psql line art is no longer affected by the values of LANG or LC_ALL in the environment. As far as I can tell, everything looks reasonable. Thanks, --bts -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, Oct 05, 2009 at 04:32:08PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote: Elsewhere in the psql code, notably in mbprint.c, we make the decision on whether to apply certain Unicode-aware processing based on whether the client encoding is UTF8. The same should be done here. There is a patch somewhere in the pipeline that would automatically set the psql client encoding to whatever the locale says, but until that is done, the client encoding should be the sole setting that rules what kind of character set processing is done on the client side. OK, that makes sense to a certain extent. However, the characters used to draw the table lines are not really that related to the client encoding for data sent from the database (IMHO). Huh? The data *in* the table is going to be in the client_encoding, and psql contains no mechanisms that would translate it to something else. Surrounding it with decoration in a different encoding is just a recipe for breakage. Ah, I was under the mistaken assumption that this was iconv()ed or otherwise translated for correct display. In that case, I'll leave the patch as is (using the client encoding for table lines). I've attached an updated copy of the patch (it just removes the now unneeded langinfo.h header). Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 85e9375..cd1c137 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1690,6 +1690,54 @@ lo_import 152801 /varlistentry varlistentry + termliterallinestyle/literal/term + listitem + para + Sets the line drawing style of text table output to one + of literalascii/literal, or literalutf8/literal. + Unique abbreviations are allowed. (That would mean one + letter is enough.) literalutf8/literal will be selected + by default if supported by your locale, + otherwise literalascii/literal will be used. + /para + + para + Query result tables are displayed as text for some output + formats (literalunaligned/literal, + literalaligned/literal, and literalwrapped/literal + formats). The tables are drawn using characters of the + user's locale character set. By + default, acronymASCII/acronym characters will be used, + which will display correctly in all locales. However, if + the user is using a locale with a acronymUTF-8/acronym + character set, the default will be to + use acronymUTF-8/acronym box drawing characters in place + of ASCII punctuation to display more readable tables. + /para + + para + This option is useful for overriding the default line + style, for example to force the use of + only acronymASCII/acronym characters when extended + character sets such as acronymUTF-8/acronym are + inappropriate. This might be the case if preserving output + compatibility with older psql versions is important (prior + to 8.5.0). + /para + + para + quoteUTF8/quote use Unicode acronymUTF-8/acronym box + drawing characters. + /para + + para + quoteASCII/quote use plain acronymASCII/acronym characters. + /para + + /listitem + /varlistentry + + varlistentry termliteralcolumns/literal/term listitem para diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index da57eb4..be2b60b 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -33,6 +33,11 @@ #include openssl/ssl.h #endif +#include locale.h +#ifdef HAVE_LANGINFO_H +#include langinfo.h +#endif + #include portability/instr_time.h #include libpq-fe.h @@ -46,6 +51,7 @@ #include input.h #include large_obj.h #include mainloop.h +#include mbprint.h #include print.h #include psqlscan.h #include settings.h @@ -596,6 +602,14 @@ exec_command(const char *cmd, /* save encoding info into psql internal data */ pset.encoding = PQclientEncoding(pset.db); pset.popt.topt.encoding = pset.encoding; +if (!pset.popt.topt.line_style_set) +{ + if (pset.encoding == get_utf8_id()) + pset.popt.topt.line_style = utf8format; + else + pset.popt.topt.line_style = asciiformat; +} + SetVariable(pset.vars, ENCODING, pg_encoding_to_char(pset.encoding)); } @@ -1430,8 +1444,19 @@ void SyncVariables(void) { /* get stuff from connection */ +#if (defined(HAVE_LANGINFO_H) defined(CODESET)) + if (PQsetClientEncoding(pset.db,
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Oct 06, 2009 at 10:44:27AM +0100, Roger Leigh wrote: On Mon, Oct 05, 2009 at 04:32:08PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote: Elsewhere in the psql code, notably in mbprint.c, we make the decision on whether to apply certain Unicode-aware processing based on whether the client encoding is UTF8. The same should be done here. There is a patch somewhere in the pipeline that would automatically set the psql client encoding to whatever the locale says, but until that is done, the client encoding should be the sole setting that rules what kind of character set processing is done on the client side. OK, that makes sense to a certain extent. However, the characters used to draw the table lines are not really that related to the client encoding for data sent from the database (IMHO). Huh? The data *in* the table is going to be in the client_encoding, and psql contains no mechanisms that would translate it to something else. Surrounding it with decoration in a different encoding is just a recipe for breakage. Ah, I was under the mistaken assumption that this was iconv()ed or otherwise translated for correct display. In that case, I'll leave the patch as is (using the client encoding for table lines). I've attached an updated copy of the patch (it just removes the now unneeded langinfo.h header). This patch included a bit of code not intended for inclusion (setting of client encoding based on locale), which the attached (and hopefully final!) revision of the patch excludes. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 85e9375..cd1c137 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1690,6 +1690,54 @@ lo_import 152801 /varlistentry varlistentry + termliterallinestyle/literal/term + listitem + para + Sets the line drawing style of text table output to one + of literalascii/literal, or literalutf8/literal. + Unique abbreviations are allowed. (That would mean one + letter is enough.) literalutf8/literal will be selected + by default if supported by your locale, + otherwise literalascii/literal will be used. + /para + + para + Query result tables are displayed as text for some output + formats (literalunaligned/literal, + literalaligned/literal, and literalwrapped/literal + formats). The tables are drawn using characters of the + user's locale character set. By + default, acronymASCII/acronym characters will be used, + which will display correctly in all locales. However, if + the user is using a locale with a acronymUTF-8/acronym + character set, the default will be to + use acronymUTF-8/acronym box drawing characters in place + of ASCII punctuation to display more readable tables. + /para + + para + This option is useful for overriding the default line + style, for example to force the use of + only acronymASCII/acronym characters when extended + character sets such as acronymUTF-8/acronym are + inappropriate. This might be the case if preserving output + compatibility with older psql versions is important (prior + to 8.5.0). + /para + + para + quoteUTF8/quote use Unicode acronymUTF-8/acronym box + drawing characters. + /para + + para + quoteASCII/quote use plain acronymASCII/acronym characters. + /para + + /listitem + /varlistentry + + varlistentry termliteralcolumns/literal/term listitem para diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index da57eb4..223f11c 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -46,6 +46,7 @@ #include input.h #include large_obj.h #include mainloop.h +#include mbprint.h #include print.h #include psqlscan.h #include settings.h @@ -596,6 +597,14 @@ exec_command(const char *cmd, /* save encoding info into psql internal data */ pset.encoding = PQclientEncoding(pset.db); pset.popt.topt.encoding = pset.encoding; +if (!pset.popt.topt.line_style_set) +{ + if (pset.encoding == get_utf8_id()) + pset.popt.topt.line_style = utf8format; + else + pset.popt.topt.line_style = asciiformat; +} + SetVariable(pset.vars, ENCODING, pg_encoding_to_char(pset.encoding)); } @@ -1432,6 +1441,13 @@ SyncVariables(void) /* get stuff from connection */
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote: I have a comment on this bit: @@ -125,6 +128,17 @@ main(int argc, char *argv[]) /* We rely on unmentioned fields of pset.popt to start out 0/false/NULL */ pset.popt.topt.format = PRINT_ALIGNED; + + /* Default table style to plain ASCII */ + pset.popt.topt.table_style = asciiformat; +#if (defined(HAVE_LANGINFO_H) defined(CODESET)) + /* If a UTF-8 locale is available, switch to UTF-8 box drawing characters */ + if (pg_strcasecmp(nl_langinfo(CODESET), UTF-8) == 0 || + pg_strcasecmp(nl_langinfo(CODESET), utf8) == 0 || + pg_strcasecmp(nl_langinfo(CODESET), CP65001) == 0) + pset.popt.topt.table_style = utf8format; +#endif + pset.popt.topt.border = 1; pset.popt.topt.pager = 1; pset.popt.topt.start_table = true; Elsewhere in the psql code, notably in mbprint.c, we make the decision on whether to apply certain Unicode-aware processing based on whether the client encoding is UTF8. The same should be done here. There is a patch somewhere in the pipeline that would automatically set the psql client encoding to whatever the locale says, but until that is done, the client encoding should be the sole setting that rules what kind of character set processing is done on the client side. OK, that makes sense to a certain extent. However, the characters used to draw the table lines are not really that related to the client encoding for data sent from the database (IMHO). I think that (as you said) making the client encoding the same as the locale character set the same in the future would clear up this discrepancy though. Using the client encoding, there's no guarantee the client locale/terminal can handle UTF-8 when the client encoding is UTF-8. I have attached an updated patch which implements your suggested behaviour. It also renames the option to linestyle rather than tablestyle which I think represents its purpose a bit more clearly. Thanks, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 85e9375..cd1c137 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1690,6 +1690,54 @@ lo_import 152801 /varlistentry varlistentry + termliterallinestyle/literal/term + listitem + para + Sets the line drawing style of text table output to one + of literalascii/literal, or literalutf8/literal. + Unique abbreviations are allowed. (That would mean one + letter is enough.) literalutf8/literal will be selected + by default if supported by your locale, + otherwise literalascii/literal will be used. + /para + + para + Query result tables are displayed as text for some output + formats (literalunaligned/literal, + literalaligned/literal, and literalwrapped/literal + formats). The tables are drawn using characters of the + user's locale character set. By + default, acronymASCII/acronym characters will be used, + which will display correctly in all locales. However, if + the user is using a locale with a acronymUTF-8/acronym + character set, the default will be to + use acronymUTF-8/acronym box drawing characters in place + of ASCII punctuation to display more readable tables. + /para + + para + This option is useful for overriding the default line + style, for example to force the use of + only acronymASCII/acronym characters when extended + character sets such as acronymUTF-8/acronym are + inappropriate. This might be the case if preserving output + compatibility with older psql versions is important (prior + to 8.5.0). + /para + + para + quoteUTF8/quote use Unicode acronymUTF-8/acronym box + drawing characters. + /para + + para + quoteASCII/quote use plain acronymASCII/acronym characters. + /para + + /listitem + /varlistentry + + varlistentry termliteralcolumns/literal/term listitem para diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index da57eb4..223f11c 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -46,6 +46,7 @@ #include input.h #include large_obj.h #include mainloop.h +#include mbprint.h #include print.h #include psqlscan.h #include settings.h @@ -596,6 +597,14 @@ exec_command(const char *cmd, /* save encoding info into psql internal data */ pset.encoding = PQclientEncoding(pset.db);
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: On Sun, Oct 04, 2009 at 11:22:27PM +0300, Peter Eisentraut wrote: Elsewhere in the psql code, notably in mbprint.c, we make the decision on whether to apply certain Unicode-aware processing based on whether the client encoding is UTF8. The same should be done here. There is a patch somewhere in the pipeline that would automatically set the psql client encoding to whatever the locale says, but until that is done, the client encoding should be the sole setting that rules what kind of character set processing is done on the client side. OK, that makes sense to a certain extent. However, the characters used to draw the table lines are not really that related to the client encoding for data sent from the database (IMHO). Huh? The data *in* the table is going to be in the client_encoding, and psql contains no mechanisms that would translate it to something else. Surrounding it with decoration in a different encoding is just a recipe for breakage. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Fri, Oct 02, 2009 at 05:34:16PM -0700, Brad T. Sliger wrote: On Friday 02 October 2009 04:21:35 Roger Leigh wrote: I have attached a patch which implements the feature as a pset variable. This also slightly simplifies some of the patch since the table style is passed to functions directly in printTableContent rather than separately. The psql option '-P tablestyle=ascii' is passed to psql in pg_regress_main.c which means the testsuite doesn't fail any more. The option is documented in the psql docs, and is also tab-completed. Users can just put '\pset tablestyle ascii' in their .psqlrc if they want the old format in a UTF-8 locale. I looked at psql-utf8-table-5.patch. Many thanks for taking the time to do this. I've attached a followup patch which addresses your point below: Lint(1) says there is an extra trailing ',' in src/bin/psql/print.h. in 'typedef enum printTextRule'. The addition to src/bin/psql/command.c could use a comment, like adjacent code. Fixed. 'ASCII' and 'UTF8' may need acronym/acronym tags in doc/src/sgml/ref/psql-ref.sgml, like adjacent code. I'm not sure someone who hasn't seen this patch in action would immediately know what it does from the documentation. `gmake html` works without the patch, but fails with the patch: Also fixed. I also added some additional explanation of the option which hopefully makes its purpose more obvious. The acronym tag isn't used for the itemised option list names, but is used in the descriptive text; I can also add it there if appropriate. It's likely that tablestyle could well be named better. format is already used, but if there's a more intuitive name that fits better, I'm happy to change it. openjade:ref/psql-ref.sgml:1692:15:E: document type does not allow element TERM here; assuming missing VARLISTENTRY start-tag Also fixed. After the patch, `\pset format wrapped` produces '\pset: unknown option: format'. I saw this in interactive psql and from .psqlrc. I think this can be fixed by changing the addition to src/bin/psql/command.c from an 'if' clause to an 'else if' clause. Oops, yes. Sorry about that hiccup. I've also fixed this. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 85e9375..6f1bdc7 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1690,6 +1690,54 @@ lo_import 152801 /varlistentry varlistentry + termliteraltablestyle/literal/term + listitem + para + Sets the style of text table output to one + of literalascii/literal, or literalutf8/literal. + Unique abbreviations are allowed. (That would mean one + letter is enough.) literalutf8/literal will be selected + by default if supported by your locale, + otherwise literalascii/literal will be used. + /para + + para + Query result tables are displayed as text for some output + formats (literalunaligned/literal, + literalaligned/literal, and literalwrapped/literal + formats). The tables are drawn using characters of the + user's locale character set. By + default, acronymASCII/acronym characters will be used, + which will display correctly in all locales. However, if + the user is using a locale with a acronymUTF-8/acronym + character set, the default will be to + use acronymUTF-8/acronym box drawing characters in place + of ASCII punctuation to display more readable tables. + /para + + para + This option is useful for overriding the default table + style, for example to force the use of + only acronymASCII/acronym characters when extended + character sets such as acronymUTF-8/acronym are + inappropriate. This might be the case if preserving output + compatibility with older psql versions is important (prior + to 8.5.0). + /para + + para + quoteUTF8/quote use Unicode acronymUTF-8/acronym box + drawing characters. + /para + + para + quoteASCII/quote use plain acronymASCII/acronym characters. + /para + + /listitem + /varlistentry + + varlistentry termliteralcolumns/literal/term listitem para diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index da57eb4..31631cf 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1772,6 +1772,25 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) printf(_(Output format is %s.\n),
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
I have a comment on this bit: @@ -125,6 +128,17 @@ main(int argc, char *argv[]) /* We rely on unmentioned fields of pset.popt to start out 0/false/NULL */ pset.popt.topt.format = PRINT_ALIGNED; + + /* Default table style to plain ASCII */ + pset.popt.topt.table_style = asciiformat; +#if (defined(HAVE_LANGINFO_H) defined(CODESET)) + /* If a UTF-8 locale is available, switch to UTF-8 box drawing characters */ + if (pg_strcasecmp(nl_langinfo(CODESET), UTF-8) == 0 || + pg_strcasecmp(nl_langinfo(CODESET), utf8) == 0 || + pg_strcasecmp(nl_langinfo(CODESET), CP65001) == 0) + pset.popt.topt.table_style = utf8format; +#endif + pset.popt.topt.border = 1; pset.popt.topt.pager = 1; pset.popt.topt.start_table = true; Elsewhere in the psql code, notably in mbprint.c, we make the decision on whether to apply certain Unicode-aware processing based on whether the client encoding is UTF8. The same should be done here. There is a patch somewhere in the pipeline that would automatically set the psql client encoding to whatever the locale says, but until that is done, the client encoding should be the sole setting that rules what kind of character set processing is done on the client side. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, Sep 30, 2009 at 06:50:46PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. The attached patch implements this feature. It adds a --no-pretty-formatting/-G option to psql (naming isn't my fort�, so feel free to change it!). This is also documented in the SGML docs, and help text. Lastly, this option is used when invoking psql by pg_regress, which results in a working testsuite in a UTF-8 locale. It would be a good idea to tie this to a psql magic variable (like ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc. I'm not actually sure that we need a dedicated command line switch for it, since you could use -v varname instead. I have attached a patch which implements the feature as a pset variable. This also slightly simplifies some of the patch since the table style is passed to functions directly in printTableContent rather than separately. The psql option '-P tablestyle=ascii' is passed to psql in pg_regress_main.c which means the testsuite doesn't fail any more. The option is documented in the psql docs, and is also tab-completed. Users can just put '\pset tablestyle ascii' in their .psqlrc if they want the old format in a UTF-8 locale. To follow up on the comments about the problems of defaulting to UTF-8. There are just two potential problems with defaulting, both of which are problems with the user's mis-configuration of their system and (IMHO) not something that postgresql needs to care about. 1) The user lacks a font containing the line-drawing characters. It's very rare for a fixed-width terminal font to not contain these characters, and the patch as provided sticks to the most basic range from the VT100 set which are most commonly provided. 2) The user's terminal emulator is not configured to accept UTF-8 input. If you're using a UTF-8 locale, then this is necessary to display non-ASCII characters, and is done automatically by almost every terminal emulator out there (on Linux, they default to using nl_langinfo(CODESET) unless configured otherwise, which is a very simple change if required). On any current GNU/Linux system, you have to go out of your way to break the defaults. The patch currently switches to UTF-8 automatically /when available/. IMO this is the correct behaviour since it will work for all but the handful of users who misconfigured their system, and provides an immediate benefit. We spent years making UTF-8 work out of the box on Linux and Unix systems, and it seems a trifle unfair to penalise all users for the sake of the few who just didn't set up their terminal emulator correctly (their setup is already broken, since non-ASCII characters returned by queries are /already/ going to be displayed incorrectly). Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 85e9375..ad297f8 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -1689,6 +1689,28 @@ lo_import 152801 /listitem /varlistentry + termliteraltablestyle/literal/term + listitem + para + Sets the style of text table output to one + of literalascii/literal, or literalutf8/literal. + Unique abbreviations are allowed. (That would mean one + letter is enough.) literalutf8/literal will be selected + by default if supported by your locale, + otherwise literalascii/literal will be used. + /para + + para + quoteUTF8/quote use Unicode box drawing characters. + /para + + para + quoteASCII/quote use plain ASCII characters. + /para + + /listitem + /varlistentry + varlistentry termliteralcolumns/literal/term listitem diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index da57eb4..e9b6fe0 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -1772,6 +1772,24 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet) printf(_(Output format is %s.\n), _align2string(popt-topt.format)); } + if (strcmp(param, tablestyle) == 0) + { + if (!value) + ; + else if (pg_strncasecmp(ascii, value, vallen) == 0) + popt-topt.table_style = asciiformat; + else if (pg_strncasecmp(utf8, value, vallen) == 0) + popt-topt.table_style = utf8format; + else + { + psql_error(\\pset: allowed table styles are ascii, utf8\n); + return
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Friday 02 October 2009 04:21:35 Roger Leigh wrote: On Wed, Sep 30, 2009 at 06:50:46PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. The attached patch implements this feature. It adds a --no-pretty-formatting/-G option to psql (naming isn't my fort�, so feel free to change it!). This is also documented in the SGML docs, and help text. Lastly, this option is used when invoking psql by pg_regress, which results in a working testsuite in a UTF-8 locale. It would be a good idea to tie this to a psql magic variable (like ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc. I'm not actually sure that we need a dedicated command line switch for it, since you could use -v varname instead. I have attached a patch which implements the feature as a pset variable. This also slightly simplifies some of the patch since the table style is passed to functions directly in printTableContent rather than separately. The psql option '-P tablestyle=ascii' is passed to psql in pg_regress_main.c which means the testsuite doesn't fail any more. The option is documented in the psql docs, and is also tab-completed. Users can just put '\pset tablestyle ascii' in their .psqlrc if they want the old format in a UTF-8 locale. To follow up on the comments about the problems of defaulting to UTF-8. There are just two potential problems with defaulting, both of which are problems with the user's mis-configuration of their system and (IMHO) not something that postgresql needs to care about. 1) The user lacks a font containing the line-drawing characters. It's very rare for a fixed-width terminal font to not contain these characters, and the patch as provided sticks to the most basic range from the VT100 set which are most commonly provided. 2) The user's terminal emulator is not configured to accept UTF-8 input. If you're using a UTF-8 locale, then this is necessary to display non-ASCII characters, and is done automatically by almost every terminal emulator out there (on Linux, they default to using nl_langinfo(CODESET) unless configured otherwise, which is a very simple change if required). On any current GNU/Linux system, you have to go out of your way to break the defaults. The patch currently switches to UTF-8 automatically /when available/. IMO this is the correct behaviour since it will work for all but the handful of users who misconfigured their system, and provides an immediate benefit. We spent years making UTF-8 work out of the box on Linux and Unix systems, and it seems a trifle unfair to penalise all users for the sake of the few who just didn't set up their terminal emulator correctly (their setup is already broken, since non-ASCII characters returned by queries are /already/ going to be displayed incorrectly). Regards, Roger I looked at psql-utf8-table-5.patch. Lint(1) says there is an extra trailing ',' in src/bin/psql/print.h. in 'typedef enum printTextRule'. The addition to src/bin/psql/command.c could use a comment, like adjacent code. 'ASCII' and 'UTF8' may need acronym/acronym tags in doc/src/sgml/ref/psql-ref.sgml, like adjacent code. I'm not sure someone who hasn't seen this patch in action would immediately know what it does from the documentation. `gmake html` works without the patch, but fails with the patch: openjade:ref/psql-ref.sgml:1692:15:E: document type does not allow element TERM here; assuming missing VARLISTENTRY start-tag After the patch, `\pset format wrapped` produces '\pset: unknown option: format'. I saw this in interactive psql and from .psqlrc. I think this can be fixed by changing the addition to src/bin/psql/command.c from an 'if' clause to an 'else if' clause. Otherwise, the patch applied, built and installed. The `gmake check` tests all passed with LANG and/or LC_ALL set. The various tablestyle options seem to work. The default behavior with respect to various LANG and LC_ALL values seems reasonable and can be overridden. Thanks, --bts -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Sep 29, 2009 at 04:28:57PM -0400, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote: The bigger question is exactly how we expect this stuff to interact with pg_regress' --no-locale switch. We already do clear all these variables when --no-locale is specified. I am wondering just what --locale is supposed to do, and whether selectively lobotomizing the LC stuff has any real use at all. We should do the LANG or LC_CTYPE thing only on the client, unconditionally. The --no-locale/--locale options should primarily determine what the temporary server uses. Well, that seems fairly reasonable, but it's going to require some refactoring of pg_regress. The initialize_environment function determines what happens in both the client and the temp server. Two possible approaches to fix the tests are as follows: diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index f2f9603..74cdaa2 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -711,8 +711,7 @@ initialize_environment(void) * is actually called.) */ unsetenv(LANGUAGE); - unsetenv(LC_ALL); - putenv(LC_MESSAGES=C); + putenv(LC_ALL=C); /* * Set multibyte as requested Here we just force the locale to C. This does have the disadvantage that --no-locale is made redundant, and any tests which are dependent upon locale (if any?) will be run in the C locale. diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index f2f9603..65fb49a 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -712,6 +712,7 @@ initialize_environment(void) */ unsetenv(LANGUAGE); unsetenv(LC_ALL); + putenv(LC_CTYPE=C); putenv(LC_MESSAGES=C); /* Here we set LC_CTYPE to C in addition to LC_MESSAGES (and for much the same reasons). However, if you test on non-C locales to check for issues with other locale codesets, those tests are all going to be forced to use ASCII. Is this a problem in practice? From the POV of my patch, it's working as designed: if the locale codeset is UTF-8 it's outputting UTF-8. But, due to the way the test machinery is looking at the output, this is breaking the tests. I'm not sure what I can do with my patch to make it behave differently that is both compatible with its intended use and not break the tests-- this is really just breaking an assumption in the testing code that the test output will always be ASCII. Forcing the LC_CTYPE to C will force ASCII output and work around this problem with the tests. Another approach would be to let psql know it's being run in a test environment with a PG_TEST or some other environment variable which we can check for and use to turn off UTF-8 output if set. Would that be better? Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh wrote: Here we just force the locale to C. This does have the disadvantage that --no-locale is made redundant, and any tests which are dependent upon locale (if any?) will be run in the C locale. That is not a solution. We have not that long ago gone to some lengths to provide for buildfarm testing in various locales. We're not going to undo that. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Andrew Dunstan and...@dunslane.net writes: Roger Leigh wrote: Here we just force the locale to C. This does have the disadvantage that --no-locale is made redundant, and any tests which are dependent upon locale (if any?) will be run in the C locale. That is not a solution. Right. I think you may have missed the point of what Peter was saying: it's okay to force the locale to C on the *client* side of the tests. The trick is to not change the environment that the temp server is started with. This will take some redesign inside pg_regress, I think. Probably initialize_environment needs to be split into two functions, one that's run before firing off the temp server and one that runs after. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Sep 29, 2009 at 04:32:49PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: C locale means POSIX behavior and nothing but. Indeed it does. However, making LC_CTYPE be UTF-8 rather than ASCII is both possible and still strictly conforming to the letter of the standard. There would be some collation and other restrictions (digit and other character classes would be contrained to the ASCII characters compared with other UTF-8 locales). However, any existing programs using ASCII would continue to function without any changes to their behaviour. The only observable change will be that nl_langinfo(CODESET) will return UTF-8, and it will be valid for programs to use UTF-8 encoded text in formatted print functions, etc.. I really, really don't believe that that meets either the letter or the spirit of the C standard, at least not if you are intending to silently substitute LC_CTYPE=UTF8 when the program has specified C/POSIX locale. (If this is just a matter of what the default LANG environment is, of course you can do anything.) We have spent some time reading the relevant standards documents (C, POSIX, SUSv2, SUSv3) and haven't found anything yet that would preclude this. While they all specify minimal requirements for what the C locale character set must provide (and POSIX/SUS are the most strict, specifying ASCII outright for each 0-127 codepoint), these are the minimal requirements for the locale, and implementation-specific extensions to ASCII are allowed, which would therefore permit UTF-8. Note that LC_CTYPE=C is not required to specify ASCII in any standard (though POSIX/SUS require that it must contain ASCII as a subset of the whole set). The language in SUSv2 in fact explicitly states that this is allowed. In fact, I've seen documentation that some UNIX systems such as HPUX already do have a UTF-8 C locale as an option. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Andrew Dunstan wrote: Roger Leigh wrote: Here we just force the locale to C. This does have the disadvantage that --no-locale is made redundant, and any tests which are dependent upon locale (if any?) will be run in the C locale. That is not a solution. We have not that long ago gone to some lengths to provide for buildfarm testing in various locales. We're not going to undo that. Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: The language in SUSv2 in fact explicitly states that this is allowed. In fact, I've seen documentation that some UNIX systems such as HPUX already do have a UTF-8 C locale as an option. I don't argue with the concept of a C.UTF8 locale --- in fact I think it sounds pretty useful. What I think is 100% broken is trying to make C locale work that way. C locale is supposed to be the traditional locale-ignorant, characters-are-bytes behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Andrew Dunstan and...@dunslane.net writes: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Yeah, that's not a bad idea. There are likely to be other client programs that won't want this behavioral change either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Yeah, that's not a bad idea. There are likely to be other client programs that won't want this behavioral change either. I'm surprised there isn't one already. I would think that any new format would default to off. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Robert Haas wrote: On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Yeah, that's not a bad idea. There are likely to be other client programs that won't want this behavioral change either. I'm surprised there isn't one already. I would think that any new format would default to off. I haven't looked, but if there is that's so much less work to do ;-) cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Well, it might not be a bad idea, but adding a feature just to satisfy the test suite instead of fixing the test suite doesn't feel satisfying. Is there another use case? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, 2009-09-30 at 12:06 -0400, Robert Haas wrote: On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Yeah, that's not a bad idea. There are likely to be other client programs that won't want this behavioral change either. I'm surprised there isn't one already. I would think that any new format would default to off. Well, this isn't a new format, just an enhancement of an existing format. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Peter Eisentraut wrote: On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Well, it might not be a bad idea, but adding a feature just to satisfy the test suite instead of fixing the test suite doesn't feel satisfying. Is there another use case? Sure, as Tom noted pg_regress probably won't be the only user. There are lots of legacy scripts out there that parse psql output, and it should be of use to them. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Andrew Dunstan escribió: Peter Eisentraut wrote: On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Well, it might not be a bad idea, but adding a feature just to satisfy the test suite instead of fixing the test suite doesn't feel satisfying. Is there another use case? Sure, as Tom noted pg_regress probably won't be the only user. There are lots of legacy scripts out there that parse psql output, and it should be of use to them. All scripts I've seen parsing psql output use unaligned, undecorated mode. I have yet to see one messing with the |'s. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, Sep 30, 2009 at 1:27 PM, Peter Eisentraut pete...@gmx.net wrote: On Wed, 2009-09-30 at 12:06 -0400, Robert Haas wrote: On Wed, Sep 30, 2009 at 11:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Yeah, that's not a bad idea. There are likely to be other client programs that won't want this behavioral change either. I'm surprised there isn't one already. I would think that any new format would default to off. Well, this isn't a new format, just an enhancement of an existing format. I disagree. This patch is about replacing ASCII art constructed using | and + and replacing it with box-drawing characters. I think the chances that this will not look good in one or more of the terminal emulators that I use from time to time are excellent, especially if the approach is at all similar to what pstree does on recent versions of Fedora. I'm 100% not interested in tracking down some obscure encoding/locale issue to make it show up right, and I'm 110% not interested in helping the people who file bug reports to track down the analagous issues in their crazy, broken environments. The way it works right now (and has worked for the last 5 years or more) is reliable, familiar, and, at least IME, bullet-proof. I don't even see a good case for changing the default, let alone not providing a way to retreat to the old behavior. Considering that this patch is nothing more than an exceedingly minor piece of window-dressing, taking even a small risk of breaking things for anyone who as of today can log into psql and get work done without pain seems like a bad trade-off to me. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, Sep 30, 2009 at 01:30:17PM -0400, Andrew Dunstan wrote: Peter Eisentraut wrote: On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. Well, it might not be a bad idea, but adding a feature just to satisfy the test suite instead of fixing the test suite doesn't feel satisfying. Is there another use case? Sure, as Tom noted pg_regress probably won't be the only user. There are lots of legacy scripts out there that parse psql output, and it should be of use to them. The attached patch implements this feature. It adds a --no-pretty-formatting/-G option to psql (naming isn't my forté, so feel free to change it!). This is also documented in the SGML docs, and help text. Lastly, this option is used when invoking psql by pg_regress, which results in a working testsuite in a UTF-8 locale. Hopefully this is OK! I'll be happy to make any amendments required. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 85e9375..95f1365 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -192,6 +192,19 @@ PostgreSQL documentation /varlistentry varlistentry + termoption-G/option/term + termoption--no-pretty-formatting/option/term + para + Only use ASCII characters to draw tables. Without this option, + box drawing characters may be used if avilable which will result + in nicer output. However, if compatibility with the output of + older psql versions, or psql used in other locales is required, it + may be desirable to disable pretty table formatting in these + situations. + /para +/varlistentry + +varlistentry termoption-h replaceable class=parameterhostname/replaceable//term termoption--host replaceable class=parameterhostname/replaceable//term listitem diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index d498d58..d596ece 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -121,6 +121,8 @@ usage(void) set field separator (default: \%s\)\n), DEFAULT_FIELD_SEP); printf(_( -H, --html HTML table output mode\n)); + printf(_( -G, --no-pretty-formatting\n + only use ASCII characters to format tables\n)); printf(_( -P, --pset=VAR[=ARG] set printing option VAR to ARG (see \\pset command)\n)); printf(_( -R, --record-separator=STRING\n set record separator (default: newline)\n)); diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 41d508e..e01e512 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -2290,9 +2290,10 @@ printTable(const printTableContent *cont, FILE *fout, FILE *flog) const printTextFormat *text_format = asciiformat; #if (defined(HAVE_LANGINFO_H) defined(CODESET)) - if (pg_strcasecmp(nl_langinfo(CODESET), UTF-8) == 0 || - pg_strcasecmp(nl_langinfo(CODESET), utf8) == 0 || - pg_strcasecmp(nl_langinfo(CODESET), CP65001) == 0) + if ((pg_strcasecmp(nl_langinfo(CODESET), UTF-8) == 0 || + pg_strcasecmp(nl_langinfo(CODESET), utf8) == 0 || + pg_strcasecmp(nl_langinfo(CODESET), CP65001) == 0) + cont-opt-use_text_format) text_format = utf8format; #endif diff --git a/src/bin/psql/print.h b/src/bin/psql/print.h index ea1dc97..97772a6 100644 --- a/src/bin/psql/print.h +++ b/src/bin/psql/print.h @@ -27,6 +27,7 @@ enum printFormat typedef struct printTableOpt { enum printFormat format; /* one of the above */ + bool use_text_format; /* use text formats for pretty printing */ bool expanded; /* expanded/vertical output (if supported by * output format) */ unsigned short int border; /* Print a border around the table. 0=none, diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c index 429e5d9..3fdf904 100644 --- a/src/bin/psql/startup.c +++ b/src/bin/psql/startup.c @@ -125,6 +125,7 @@ main(int argc, char *argv[]) /* We rely on unmentioned fields of pset.popt to start out 0/false/NULL */ pset.popt.topt.format = PRINT_ALIGNED; + pset.popt.topt.use_text_format = 1; pset.popt.topt.border = 1; pset.popt.topt.pager = 1; pset.popt.topt.start_table = true; @@ -317,6 +318,7 @@ parse_psql_options(int argc, char *argv[], struct adhoc_opts * options) {list, no_argument, NULL, 'l'}, {log-file, required_argument, NULL, 'L'}, {no-readline, no_argument, NULL, 'n'}, + {no-pretty-formatting, no_argument, NULL, 'G'}, {single-transaction, no_argument, NULL, '1'}, {output, required_argument,
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, 2009-09-30 at 14:02 -0400, Alvaro Herrera wrote: All scripts I've seen parsing psql output use unaligned, undecorated mode. I have yet to see one messing with the |'s. Plus, we would have broken that with the : continuation lines. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Robert Haas robertmh...@gmail.com writes: The way it works right now (and has worked for the last 5 years or more) is reliable, familiar, and, at least IME, bullet-proof. I don't even see a good case for changing the default, let alone not providing a way to retreat to the old behavior. This is pretty much my opinion as well. I'd vote for requiring a switch to turn it on. Those who really want the new behavior by default can set it up in their .psqlrc. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Peter Eisentraut pete...@gmx.net writes: On Wed, 2009-09-30 at 14:02 -0400, Alvaro Herrera wrote: All scripts I've seen parsing psql output use unaligned, undecorated mode. I have yet to see one messing with the |'s. Plus, we would have broken that with the : continuation lines. Only for data containing embedded newlines, which is not that common. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: On Wed, 2009-09-30 at 11:03 -0400, Andrew Dunstan wrote: Thinking about this some more, ISTM a much better way of approaching it would be to provide a flag for psql to turn off the fancy formatting, and have pg_regress use that flag. The attached patch implements this feature. It adds a --no-pretty-formatting/-G option to psql (naming isn't my forté, so feel free to change it!). This is also documented in the SGML docs, and help text. Lastly, this option is used when invoking psql by pg_regress, which results in a working testsuite in a UTF-8 locale. It would be a good idea to tie this to a psql magic variable (like ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc. I'm not actually sure that we need a dedicated command line switch for it, since you could use -v varname instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Wed, 2009-09-30 at 18:50 -0400, Tom Lane wrote: It would be a good idea to tie this to a psql magic variable (like ON_ERROR_STOP) so that it could conveniently be set in ~/.psqlrc. I'm not actually sure that we need a dedicated command line switch for it, since you could use -v varname instead. Right. A better yet place would be inside \pset, since that contains the other formatting options. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote: During this review I found that `gmake check` will fail when LANG=en_US.UTF-8 in the environment. In this case the patched psql produces UTF8 line art and the tests expect ASCII line art. pg_regress clears LC_ALL by default, but does not clear LANG by default. Please find attached a patch that causes pg_regress to also clear LANG by default. It probably doesn't matter much, but I think the proper fix would be to unset LC_CTYPE. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Peter Eisentraut pete...@gmx.net writes: On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote: pg_regress clears LC_ALL by default, but does not clear LANG by default. Please find attached a patch that causes pg_regress to also clear LANG by default. It probably doesn't matter much, but I think the proper fix would be to unset LC_CTYPE. Wouldn't we have to clear *all* of these variables to be sure? The bigger question is exactly how we expect this stuff to interact with pg_regress' --no-locale switch. We already do clear all these variables when --no-locale is specified. I am wondering just what --locale is supposed to do, and whether selectively lobotomizing the LC stuff has any real use at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Sep 29, 2009 at 12:01:30PM -0400, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote: pg_regress clears LC_ALL by default, but does not clear LANG by default. Please find attached a patch that causes pg_regress to also clear LANG by default. It probably doesn't matter much, but I think the proper fix would be to unset LC_CTYPE. Wouldn't we have to clear *all* of these variables to be sure? The bigger question is exactly how we expect this stuff to interact with pg_regress' --no-locale switch. We already do clear all these variables when --no-locale is specified. I am wondering just what --locale is supposed to do, and whether selectively lobotomizing the LC stuff has any real use at all. While this is currently just hypothetical, the code is also making the assumption that the C locale is ASCII. However, there is no guarantee that this is the case. ASCII is the minimal requirement, and UTF-8 is a superset (according to SUS/POSIX). In Debian, we do have plans to introduce a C.UTF-8 locale, and perhaps eventually move the C locale to using UTF-8 by default, at which point the tests would again fail because you would still get the UTF-8 formatting despite all of the LC_ etc. munging. (At that point, we would have a C.ASCII for compatibility.) Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: In Debian, we do have plans to introduce a C.UTF-8 locale, Egad, isn't that a contradiction in terms? C locale means POSIX behavior and nothing but. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Sep 29, 2009 at 01:41:27PM -0400, Tom Lane wrote: Roger Leigh rle...@codelibre.net writes: In Debian, we do have plans to introduce a C.UTF-8 locale, Egad, isn't that a contradiction in terms? Not entirely! C locale means POSIX behavior and nothing but. Indeed it does. However, making LC_CTYPE be UTF-8 rather than ASCII is both possible and still strictly conforming to the letter of the standard. There would be some collation and other restrictions (digit and other character classes would be contrained to the ASCII characters compared with other UTF-8 locales). However, any existing programs using ASCII would continue to function without any changes to their behaviour. The only observable change will be that nl_langinfo(CODESET) will return UTF-8, and it will be valid for programs to use UTF-8 encoded text in formatted print functions, etc.. I'd just like to stress that this isn't happening anytime soon! (Well, C.UTF-8 is possible now, it's just not hardcoded into libc.so. The intention is to initially provide C.UTF-8 in addition to C so that programs requiring UTF-8 support have a locale guaranteed to be present on the system to meet their needs.) Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: On Mon, 2009-09-28 at 20:49 -0700, Brad T. Sliger wrote: pg_regress clears LC_ALL by default, but does not clear LANG by default. Please find attached a patch that causes pg_regress to also clear LANG by default. It probably doesn't matter much, but I think the proper fix would be to unset LC_CTYPE. (Actually, I should have said, set LC_CTYPE to C.) Wouldn't we have to clear *all* of these variables to be sure? Not really. The problem here is that psql checks whether the console uses UTF8, and that is controlled by the LC_CTYPE variable on Unix-ish systems. The bigger question is exactly how we expect this stuff to interact with pg_regress' --no-locale switch. We already do clear all these variables when --no-locale is specified. I am wondering just what --locale is supposed to do, and whether selectively lobotomizing the LC stuff has any real use at all. We should do the LANG or LC_CTYPE thing only on the client, unconditionally. The --no-locale/--locale options should primarily determine what the temporary server uses. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Peter Eisentraut pete...@gmx.net writes: On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote: The bigger question is exactly how we expect this stuff to interact with pg_regress' --no-locale switch. We already do clear all these variables when --no-locale is specified. I am wondering just what --locale is supposed to do, and whether selectively lobotomizing the LC stuff has any real use at all. We should do the LANG or LC_CTYPE thing only on the client, unconditionally. The --no-locale/--locale options should primarily determine what the temporary server uses. Well, that seems fairly reasonable, but it's going to require some refactoring of pg_regress. The initialize_environment function determines what happens in both the client and the temp server. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh rle...@codelibre.net writes: C locale means POSIX behavior and nothing but. Indeed it does. However, making LC_CTYPE be UTF-8 rather than ASCII is both possible and still strictly conforming to the letter of the standard. There would be some collation and other restrictions (digit and other character classes would be contrained to the ASCII characters compared with other UTF-8 locales). However, any existing programs using ASCII would continue to function without any changes to their behaviour. The only observable change will be that nl_langinfo(CODESET) will return UTF-8, and it will be valid for programs to use UTF-8 encoded text in formatted print functions, etc.. I really, really don't believe that that meets either the letter or the spirit of the C standard, at least not if you are intending to silently substitute LC_CTYPE=UTF8 when the program has specified C/POSIX locale. (If this is just a matter of what the default LANG environment is, of course you can do anything.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Sep 29, 2009 at 4:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote: The bigger question is exactly how we expect this stuff to interact with pg_regress' --no-locale switch. We already do clear all these variables when --no-locale is specified. I am wondering just what --locale is supposed to do, and whether selectively lobotomizing the LC stuff has any real use at all. We should do the LANG or LC_CTYPE thing only on the client, unconditionally. The --no-locale/--locale options should primarily determine what the temporary server uses. Well, that seems fairly reasonable, but it's going to require some refactoring of pg_regress. The initialize_environment function determines what happens in both the client and the temp server. This seems to mean that we can't apply this patch, since failing the regression tests is not an acceptable behavior. I think that means that the patch author needs to either do the necessary pg_regress refactoring or figure out some other solution that is acceptable to the community. Assuming that non-trivial pg_regress refactoring is required, I think we should mark this patch as Returned with Feedback, because that should really be a separate patch, and it's obviously far too late to submit new patches to this CommitFest. Objections? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Robert Haas escribió: On Tue, Sep 29, 2009 at 4:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On Tue, 2009-09-29 at 12:01 -0400, Tom Lane wrote: The bigger question is exactly how we expect this stuff to interact with pg_regress' --no-locale switch. We already do clear all these variables when --no-locale is specified. I am wondering just what --locale is supposed to do, and whether selectively lobotomizing the LC stuff has any real use at all. We should do the LANG or LC_CTYPE thing only on the client, unconditionally. The --no-locale/--locale options should primarily determine what the temporary server uses. Well, that seems fairly reasonable, but it's going to require some refactoring of pg_regress. The initialize_environment function determines what happens in both the client and the temp server. This seems to mean that we can't apply this patch, since failing the regression tests is not an acceptable behavior. I think that means that the patch author needs to either do the necessary pg_regress refactoring or figure out some other solution that is acceptable to the community. Assuming that non-trivial pg_regress refactoring is required, I think we should mark this patch as Returned with Feedback, because that should really be a separate patch, and it's obviously far too late to submit new patches to this CommitFest. Objections? Does the patch pass regression tests in normal conditions? If it does, I see no reason to reject it. If it fails in --locale only, and even then only when the given locale is UTF8, which IIRC it's a seldom-used case, we can see about fixing that separately. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Alvaro Herrera alvhe...@commandprompt.com writes: Robert Haas escribió: This seems to mean that we can't apply this patch, since failing the regression tests is not an acceptable behavior. Does the patch pass regression tests in normal conditions? If you consider that normal means LANG=C in environment, then yeah. Else not so much. In particular, every one of the buildfarm machines that tests a non-C locale environment can be expected to go red. I think that the required patch might not be a whole lot larger than the one Brad already submitted, so I'm willing to allow a couple more days for the author to investigate the issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Alvaro Herrera wrote: Does the patch pass regression tests in normal conditions? If it does, I see no reason to reject it. If it fails in --locale only, and even then only when the given locale is UTF8, which IIRC it's a seldom-used case, we can see about fixing that separately. Numerous buildfarm machines (including one of mine) run the regression suite using UTF8. So it's far from seldom-used. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Tom Lane escribió: Alvaro Herrera alvhe...@commandprompt.com writes: Robert Haas escribi�: This seems to mean that we can't apply this patch, since failing the regression tests is not an acceptable behavior. Does the patch pass regression tests in normal conditions? If you consider that normal means LANG=C in environment, then yeah. Else not so much. In particular, every one of the buildfarm machines that tests a non-C locale environment can be expected to go red. Huh, you're right, it doesn't work at all. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sunday 27 September 2009 19:03:33 Robert Haas wrote: On Sun, Sep 27, 2009 at 9:24 PM, Selena Deckelmann selenama...@gmail.com wrote: Hi! On Wed, Sep 23, 2009 at 2:16 AM, Roger Leigh rle...@codelibre.net wrote: On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote: Brad says: The patched code compiles without any additional warnings. Lint gripes about a trailing ',' in 'typedef enum printTextRule' in print.h. Other additional lint seem to be false positives. The regression tests pass against the new patch. I've attached a new patch which tidies up those extra commas, plus a patch showing the changes from the previous patch. Great! Thank you. Brad -- can you review this patch? Is it ready for a committer? Brad already marked it that way on the CommitFest application, so I think that probably means yes. :-) ...Robert I looked at the new (psql-utf8-table-4.patch) patch. I think the style issues are fixed. During this review I found that `gmake check` will fail when LANG=en_US.UTF-8 in the environment. In this case the patched psql produces UTF8 line art and the tests expect ASCII line art. pg_regress clears LC_ALL by default, but does not clear LANG by default. Please find attached a patch that causes pg_regress to also clear LANG by default. Thanks, --bts diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index f2f9603..68ba30b 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -697,11 +697,6 @@ initialize_environment(void) unsetenv(LC_MONETARY); unsetenv(LC_NUMERIC); unsetenv(LC_TIME); - unsetenv(LANG); - /* On Windows the default locale cannot be English, so force it */ -#if defined(WIN32) || defined(__CYGWIN__) - putenv(LANG=en); -#endif } /* @@ -712,8 +707,14 @@ initialize_environment(void) */ unsetenv(LANGUAGE); unsetenv(LC_ALL); + unsetenv(LANG); + /* On Windows the default locale cannot be English, so force it */ +#if defined(WIN32) || defined(__CYGWIN__) + putenv(LANG=en); +#endif putenv(LC_MESSAGES=C); + /* * Set multibyte as requested */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Hi! On Wed, Sep 23, 2009 at 2:16 AM, Roger Leigh rle...@codelibre.net wrote: On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote: Brad says: The patched code compiles without any additional warnings. Lint gripes about a trailing ',' in 'typedef enum printTextRule' in print.h. Other additional lint seem to be false positives. The regression tests pass against the new patch. I've attached a new patch which tidies up those extra commas, plus a patch showing the changes from the previous patch. Great! Thank you. Brad -- can you review this patch? Is it ready for a committer? Thanks, -selena -- http://chesnok.com/daily - me http://endpoint.com - work -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sun, Sep 27, 2009 at 9:24 PM, Selena Deckelmann selenama...@gmail.com wrote: Hi! On Wed, Sep 23, 2009 at 2:16 AM, Roger Leigh rle...@codelibre.net wrote: On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote: Brad says: The patched code compiles without any additional warnings. Lint gripes about a trailing ',' in 'typedef enum printTextRule' in print.h. Other additional lint seem to be false positives. The regression tests pass against the new patch. I've attached a new patch which tidies up those extra commas, plus a patch showing the changes from the previous patch. Great! Thank you. Brad -- can you review this patch? Is it ready for a committer? Brad already marked it that way on the CommitFest application, so I think that probably means yes. :-) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Fri, Sep 18, 2009 at 11:30:05AM -0700, Selena Deckelmann wrote: Brad says: The patched code compiles without any additional warnings. Lint gripes about a trailing ',' in 'typedef enum printTextRule' in print.h. Other additional lint seem to be false positives. The regression tests pass against the new patch. I've attached a new patch which tidies up those extra commas, plus a patch showing the changes from the previous patch. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 7505cd4..41d508e 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -21,6 +21,9 @@ #endif #include locale.h +#ifdef HAVE_LANGINFO_H +#include langinfo.h +#endif #include catalog/pg_type.h #include pqsignal.h @@ -356,38 +359,75 @@ print_unaligned_vertical(const printTableContent *cont, FILE *fout) /* Aligned text */ // +static const printTextFormat asciiformat = +{ + { + { -, +, +, + }, + { -, +, +, + }, + { -, +, +, + }, + { , |, |, | } + }, + :, + ;, + +}; + +static const struct printTextFormat utf8format = +{ + { + /* ─, ┌, ┬, ┐ */ + { \342\224\200, \342\224\214, \342\224\254, \342\224\220 }, + /* ━, ┝, ┿, ┥ */ + { \342\224\201, \342\224\235, \342\224\277, \342\224\245 }, + /* ─, └, ┴, ┘ */ + { \342\224\200, \342\224\224, \342\224\264, \342\224\230 }, + /* N/A, │, │, │ */ + { , \342\224\202, \342\224\202, \342\224\202 } + }, + /* ╎ */ + \342\225\216, + /* ┊ */ + \342\224\212, + /* ╷ */ + \342\225\267 +}; /* draw line */ static void _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, - unsigned short border, FILE *fout) + unsigned short border, printTextRule pos, + const printTextFormat *format, + FILE *fout) { unsigned int i, j; + const printTextLineFormat *lformat = format-lrule[pos]; + if (border == 1) - fputc('-', fout); + fputs(lformat-hrule, fout); else if (border == 2) - fputs(+-, fout); + fprintf(fout, %s%s, lformat-leftvrule, lformat-hrule); for (i = 0; i ncolumns; i++) { for (j = 0; j widths[i]; j++) - fputc('-', fout); + fputs(lformat-hrule, fout); if (i ncolumns - 1) { if (border == 0) fputc(' ', fout); else -fputs(-+-, fout); +fprintf(fout, %s%s%s, lformat-hrule, + lformat-midvrule, lformat-hrule); } } if (border == 2) - fputs(-+, fout); + fprintf(fout, %s%s, lformat-hrule, lformat-rightvrule); else if (border == 1) - fputc('-', fout); + fputs(lformat-hrule, fout); fputc('\n', fout); } @@ -397,7 +437,8 @@ _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, * Print pretty boxes around cells. */ static void -print_aligned_text(const printTableContent *cont, FILE *fout) +print_aligned_text(const printTableContent *cont, const printTextFormat *format, + FILE *fout) { bool opt_tuples_only = cont-opt-tuples_only; bool opt_numeric_locale = cont-opt-numericLocale; @@ -431,6 +472,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) int *bytes_output; /* Bytes output for column value */ int output_columns = 0; /* Width of interactive console */ bool is_pager = false; + const printTextLineFormat *dformat = format-lrule[PRINT_RULE_DATA]; if (cancel_pressed) return; @@ -709,7 +751,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) int curr_nl_line; if (opt_border == 2) -_print_horizontal_line(col_count, width_wrap, opt_border, fout); +_print_horizontal_line(col_count, width_wrap, opt_border, PRINT_RULE_TOP, format, fout); for (i = 0; i col_count; i++) pg_wcsformat((unsigned char *) cont-headers[i], @@ -722,7 +764,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) while (more_col_wrapping) { if (opt_border == 2) - fprintf(fout, |%c, curr_nl_line ? '+' : ' '); + fprintf(fout, %s%c, dformat-leftvrule, curr_nl_line ? '+' : ' '); else if (opt_border == 1) fputc(curr_nl_line ? '+' : ' ', fout); @@ -753,19 +795,20 @@ print_aligned_text(const printTableContent *cont, FILE *fout) if (opt_border == 0) fputc(curr_nl_line ? '+' : ' ', fout); else - fprintf(fout, |%c, curr_nl_line ? '+' : ' '); + fprintf(fout, %s%c, dformat-midvrule, curr_nl_line ? '+' : ' '); } } curr_nl_line++; if (opt_border == 2) - fputs( |, fout); + fprintf(fout, %s, + dformat-rightvrule); else if (opt_border == 1) fputc(' ', fout); fputc('\n', fout); } - _print_horizontal_line(col_count, width_wrap, opt_border, fout); + _print_horizontal_line(col_count, width_wrap, opt_border, PRINT_RULE_MIDDLE,
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Hi! Below is from Brad Slinger. I'm just forwarding his message so that the review will be in the thread for the archives. (Sorry, Brad that I missed this earlier.. I thought you'd already replied to the list for some reason.) Brad says: Please forgive my top posting. Below is my patch review. The purpose of this patch is to improve psql output readability. This is accomplished through table formatting improvements and line art improvements. The patch is a unified diff, rather than a context diff. The patch applies from the project root with with -p1 set. Patch(1) reports with fuzz 1 when patching src/bin/psql/print.h. The patch touches two files: src/bin/psql/print.h src/bin/psql/print.c The patch does not include any documentation or tests. Comments and style are similar to existing code. The patched code compiles without any additional warnings. Lint gripes about a trailing ',' in 'typedef enum printTextRule' in print.h. Other additional lint seem to be false positives. The regression tests pass against the new patch. The patched code installs and runs. There were no crashes, although I didn't try very hard. The ASCII line art works: createdb test psql test \pset border 2 \pset format wrapped SELECT 'short' AS short, lpad('', 1000, 'long') AS long; I can trigger the new line art by setting LC_ALL=en_US.UTF-8 in the environment. As far as I can tell, the patch works as advertised. The patch does not change the server/backend. I don't see any performance problems. Thanks, --bts -- http://chesnok.com/daily - me http://endpoint.com - work -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Tue, Aug 25, 2009 at 10:32:50PM -0400, Alvaro Herrera wrote: Roger Leigh escribió: An updated copy of the patch is attached. Did you give expanded output a look? (\x) I find it a bit weird that the first line shows a single-pixel wide line but the subsequent ones are thicker. Yes, it's just due to the fact that the middle lines are using a thicker line character, while the top and bottom lines are thin. This can easily be changed to be e.g. all thin. BTW I think you should also look at multiline fields, a │ b ━━━┿━━━ 4 │ some text : and some more : and then some (1 filas) And wrapped: alvherre=# select * from foo; a │ b ━━━┿ 5 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho ; tiempo que vivía un hidalgo (1 fila) I initially left these exactly the same as for ASCII (the ':' and ';' usage). However, it's quite possible to make it use other characters. We could use the same lines, or two, three or four dashed lines ('╎' and '╏', or ┆' and '┇' or '┊' and '┋'). There are also additional characters such as half-lines ('╶', '╷', '╹' and '╻'). Is this the kind of this you are referring to? The wrapping code also appears slightly broken anyway, since continuation lines don't get '|' printed for subsequent blank columns on the same line: # SELECT * FROM testw; a │ b │ c │ kfduagahkjdfghalksdkfhajsdkl ━━━┿━┿━━━┿━━ 1 │ This is a test stri │ │ ; ng 2 │ sbuild-createchroot │ │ ; installs build-ess ; entials so your wor ; king 3 │ sbuild-createchroot │ │ ; installs build-ess ; entials so your wor ; king environment sh ; ould already be ok. ; You will need only ; to add a few more ; packages in the chr ; oot, using e.g. (3 rows) (this is unchanged from psql behaviour in CVS.) I also see in the code that under some circumstances (curr_nl_line), a '+' is used instead of a space when printing table headers, but I haven't been able to trigger this yet. We could also use dashed horizontal rules here. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. signature.asc Description: Digital signature
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh escribió: On Tue, Aug 25, 2009 at 10:32:50PM -0400, Alvaro Herrera wrote: Roger Leigh escribió: An updated copy of the patch is attached. Did you give expanded output a look? (\x) I find it a bit weird that the first line shows a single-pixel wide line but the subsequent ones are thicker. Yes, it's just due to the fact that the middle lines are using a thicker line character, while the top and bottom lines are thin. This can easily be changed to be e.g. all thin. Yeah, I think expanded output should use the same char for all lines. BTW I think you should also look at multiline fields, a │ b ━━━┿━━━ 4 │ some text : and some more : and then some (1 filas) And wrapped: alvherre=# select * from foo; a │ b ━━━┿ 5 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho ; tiempo que vivía un hidalgo (1 fila) I initially left these exactly the same as for ASCII (the ':' and ';' usage). However, it's quite possible to make it use other characters. We could use the same lines, or two, three or four dashed lines ('╎' and '╏', or ┆' and '┇' or '┊' and '┋'). This works for me (say ╎ for newline-separated strings and ┊ for wrapped output). The wrapping code also appears slightly broken anyway, since continuation lines don't get '|' printed for subsequent blank columns on the same line: Yeah ... maybe there's a point to this, or maybe it's just a bug -- I don't know. I also see in the code that under some circumstances (curr_nl_line), a '+' is used instead of a space when printing table headers, but I haven't been able to trigger this yet. We could also use dashed horizontal rules here. Hmm, can't say I know what's this about. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Alvaro Herrera escribió: I initially left these exactly the same as for ASCII (the ':' and ';' usage). However, it's quite possible to make it use other characters. We could use the same lines, or two, three or four dashed lines ('╎' and '╏', or ┆' and '┇' or '┊' and '┋'). This works for me (say ╎ for newline-separated strings and ┊ for wrapped output). So it'd look like this: alvherre=# select * from foo; a │ b ━━━┿ 1 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho ┊ tiempo que vivía un hidalgo de los de lanza en astillero, adarga antigua, ┊ rocín flaco y galgo corredor. Una olla de algo más vaca que carnero, salpi ┊ cón las más noches, duelos y quebrantos los sábados, lantejas los viernes, ┊ algún palomino de añadidura los domingos, consumían las tres partes de su ┊ hacienda. El resto della concluían sayo de velarte, calzas de velludo par ┊ a las fiestas, con sus pantuflos de lo mesmo, y los días de entresemana se ┊ honraba con su vellorí de lo más fino. Tenía en su casa una ama que pasa ┊ ba de los cuarenta, y una sobrina que no llegaba a los veinte, y un mozo d ┊ e campo y plaza, que así ensillaba el rocín como tomaba la podadera. 2 │ —«Nunca fuera caballero de ╎ damas tan bien servido ╎ como fuera don Quijote ╎ cuando de su aldea vino: ╎ doncellas curaban dél; ╎ princesas, del su rocino» (2 filas) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Thu, Aug 27, 2009 at 10:17:32AM -0400, Alvaro Herrera wrote: Alvaro Herrera escribió: I initially left these exactly the same as for ASCII (the ':' and ';' usage). However, it's quite possible to make it use other characters. We could use the same lines, or two, three or four dashed lines ('╎' and '╏', or ┆' and '┇' or '┊' and '┋'). This works for me (say ╎ for newline-separated strings and ┊ for wrapped output). So it'd look like this: alvherre=# select * from foo; a │ b ━━━┿ 1 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho ┊ tiempo que vivía un hidalgo de los de lanza en astillero, adarga antigua, ┊ rocín flaco y galgo corredor. Una olla de algo más vaca que carnero, salpi ┊ cón las más noches, duelos y quebrantos los sábados, lantejas los viernes, ┊ algún palomino de añadidura los domingos, consumían las tres partes de su ┊ hacienda. El resto della concluían sayo de velarte, calzas de velludo par ┊ a las fiestas, con sus pantuflos de lo mesmo, y los días de entresemana se ┊ honraba con su vellorí de lo más fino. Tenía en su casa una ama que pasa ┊ ba de los cuarenta, y una sobrina que no llegaba a los veinte, y un mozo d ┊ e campo y plaza, que así ensillaba el rocín como tomaba la podadera. 2 │ —«Nunca fuera caballero de ╎ damas tan bien servido ╎ como fuera don Quijote ╎ cuando de su aldea vino: ╎ doncellas curaban dél; ╎ princesas, del su rocino» (2 filas) The attached patch adds support for the above output in wrapped mode. It also uses single dashes (half lines) to represent lines which contain no data, just vertical padding. Output is unchanged for ASCII output. The last bit I'm missing is how to handle column wrapping for aligned text output. This is was introduced on 8th May 2008 by Bryce Nesbitt, committed by Bruce Momjian (git commit f92f424d). This patch introduced the wrapped format, but I have not been able to activate the codepaths that wrap column names in the table header (adding '+'). The logic in print_aligned_text goes like this: col_count = cont-ncolumns; ... more_col_wrapping = col_count; curr_nl_line = 0; while (more_col_wrapping) { for (i = 0; i cont-ncolumns; i++) if (!(this_line + 1)-ptr) // Surely only true once at end of columns? more_col_wrapping--; curr_nl_line++; } I'm not convinced looking at the code that more_col_wrapping is ever true except for the first time through the loop, and it's reached zero after completion of the inner loop, so all of the code used when curr_nl_line is 0 is never called. I'm sure I'm interpreting this incorrectly, but I can't see under what circumstances I would get the column name header line wrapped over multiple lines. It may be that I just haven't got a table with column names of the correct number and lengths to trigger it? Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 7505cd4..f85e507 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -21,6 +21,9 @@ #endif #include locale.h +#ifdef HAVE_LANGINFO_H +#include langinfo.h +#endif #include catalog/pg_type.h #include pqsignal.h @@ -356,38 +359,75 @@ print_unaligned_vertical(const printTableContent *cont, FILE *fout) /* Aligned text */ // +static const printTextFormat asciiformat = +{ + { + { -, +, +, + }, + { -, +, +, + }, + { -, +, +, + }, + { , |, |, | } + }, + :, + ;, + +}; + +static const struct printTextFormat utf8format = +{ + { + /* ─, ┌, ┬, ┐ */ + { \342\224\200, \342\224\214, \342\224\254, \342\224\220 }, + /* ━, ┝, ┿, ┥ */ + { \342\224\201, \342\224\235, \342\224\277, \342\224\245 }, + /* ─, └, ┴, ┘ */ + { \342\224\200, \342\224\224, \342\224\264, \342\224\230 }, + /* N/A, │, │, │ */ + { , \342\224\202, \342\224\202, \342\224\202 }, + }, + /* ╎ */ + \342\225\216, + /* ┊ */ + \342\224\212, + /* ╷ */ + \342\225\267 +}; /* draw line */ static void _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, - unsigned short border, FILE *fout) + unsigned short border, printTextRule pos, + const printTextFormat *format, + FILE *fout) { unsigned int i, j; + const printTextLineFormat *lformat = format-lrule[pos]; + if (border == 1) - fputc('-', fout); + fputs(lformat-hrule, fout); else if (border == 2) - fputs(+-, fout); + fprintf(fout, %s%s, lformat-leftvrule, lformat-hrule); for (i = 0; i ncolumns; i++) { for (j = 0; j widths[i]; j++) -
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sun, Aug 23, 2009 at 06:33:49PM -0400, Alvaro Herrera wrote: Roger Leigh escribió: +#if (defined(HAVE_LANGINFO_H) defined(CODESET)) + if (!strcmp(nl_langinfo(CODESET), UTF-8)) + text_format = utf8format; +#endif I think you should also try to match to UTF8, and do it in case-insensitive manner (pg_strcasecmp), just like chklocale.c. An updated copy of the patch is attached. This has the following changes: - matches UTF-8, utf8 and CP65001 case insensitively as for chklocale.c - uses fprintf in place of repeated fputs/fputc calls - moved hrule from table to line formatting - moved vrule from table to separate line format The latter two allow complete specification of all the characters used to draw. You can now have different horizonal rule types for the different line types (demonstrated by making PRINT_RULE_MIDDLE thick rather than thin). The same applies to the vertical rules, but since these are drawn intermixed with data these are a separate line format in their own right (PRINT_RULE_DATA). You can if you like make incredibly ugly tables consisting of all sorts of thin, thick and double ruled lines running horizontally and vertically at all edges and positions. The patch as it is keeps this subtle! But, a potential future extension to this could be a \pset command to customise the table style in addition to border printing, since the code is now there to do this. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 7505cd4..1740109 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -21,6 +21,9 @@ #endif #include locale.h +#ifdef HAVE_LANGINFO_H +#include langinfo.h +#endif #include catalog/pg_type.h #include pqsignal.h @@ -356,38 +359,66 @@ print_unaligned_vertical(const printTableContent *cont, FILE *fout) /* Aligned text */ // +static const printTextFormat asciiformat = +{ + { + { -, +, +, + }, + { -, +, +, + }, + { -, +, +, + }, + { , |, |, | } + } +}; + +static const struct printTextFormat utf8format = +{ + { + /* ─, ┌, ┬, ┐ */ + { \342\224\200, \342\224\214, \342\224\254, \342\224\220 }, + /* ━, ┝, ┿, ┥ */ + { \342\224\201, \342\224\235, \342\224\277, \342\224\245 }, + /* ─, └, ┴, ┘ */ + { \342\224\200, \342\224\224, \342\224\264, \342\224\230 }, + /* N/A, │, │, │ */ + { , \342\224\202, \342\224\202, \342\224\202 } + }, +}; /* draw line */ static void _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, - unsigned short border, FILE *fout) + unsigned short border, printTextRule pos, + const printTextFormat *format, + FILE *fout) { unsigned int i, j; + const printTextLineFormat *lformat = format-lrule[pos]; + if (border == 1) - fputc('-', fout); + fputs(lformat-hrule, fout); else if (border == 2) - fputs(+-, fout); + fprintf(fout, %s%s, lformat-leftvrule, lformat-hrule); for (i = 0; i ncolumns; i++) { for (j = 0; j widths[i]; j++) - fputc('-', fout); + fputs(lformat-hrule, fout); if (i ncolumns - 1) { if (border == 0) fputc(' ', fout); else -fputs(-+-, fout); +fprintf(fout, %s%s%s, lformat-hrule, + lformat-midvrule, lformat-hrule); } } if (border == 2) - fputs(-+, fout); + fprintf(fout, %s%s, lformat-hrule, lformat-rightvrule); else if (border == 1) - fputc('-', fout); + fputs(lformat-hrule, fout); fputc('\n', fout); } @@ -397,7 +428,8 @@ _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, * Print pretty boxes around cells. */ static void -print_aligned_text(const printTableContent *cont, FILE *fout) +print_aligned_text(const printTableContent *cont, const printTextFormat *format, + FILE *fout) { bool opt_tuples_only = cont-opt-tuples_only; bool opt_numeric_locale = cont-opt-numericLocale; @@ -431,6 +463,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) int *bytes_output; /* Bytes output for column value */ int output_columns = 0; /* Width of interactive console */ bool is_pager = false; + const printTextLineFormat *dformat = format-lrule[PRINT_RULE_DATA]; if (cancel_pressed) return; @@ -709,7 +742,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) int curr_nl_line; if (opt_border == 2) -_print_horizontal_line(col_count, width_wrap, opt_border, fout); +_print_horizontal_line(col_count, width_wrap, opt_border, PRINT_RULE_TOP, format, fout); for (i = 0; i col_count; i++) pg_wcsformat((unsigned char *) cont-headers[i], @@ -722,7 +755,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) while (more_col_wrapping) { if (opt_border == 2) -
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh escribió: An updated copy of the patch is attached. I give it a try. It looks very reasonable. We could argue about the exact chars to use but that can be changed later. Did you give expanded output a look? (\x) I find it a bit weird that the first line shows a single-pixel wide line but the subsequent ones are thicker. BTW I think you should also look at multiline fields, a │ b ━━━┿━━━ 4 │ some text : and some more : and then some (1 filas) And wrapped: alvherre=# select * from foo; a │ b ━━━┿ 5 │ En un lugar de la Mancha, de cuyo nombre no quiero acordarme, no ha mucho ; tiempo que vivía un hidalgo (1 fila) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sun, Aug 23, 2009 at 01:49:02AM +0100, Roger Leigh wrote: On Sat, Aug 22, 2009 at 07:42:10PM -0400, Robert Haas wrote: On Sat, Aug 22, 2009 at 2:13 PM, Roger Leighrle...@debian.org wrote: Further minor cleanups to tweak column alignment in a corner case, and to correct indentation to match the rest of the code. Please read the guidelines here: http://wiki.postgresql.org/wiki/Submitting_a_Patch I don't think it's very helpful to submit a patch intended to do basically one thing split up over 10 threads. The PostgreSQL style is heavyweight commits, and I don't believe that there's any reason to suppose that someone would want to commit any one of these 9 pieces without the other 8. OK. I have attached a single patch which combines the nine patches into one. I did read the patch page above, but it didn't mention A second patch is attached. This is identical to the previous one, except that it's re-diffed against the REL8_4_STABLE branch for anyone who wishes to use this with 8.4. Again, apologies for the noise with my first set of patches. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 7505cd4..10faeb3 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -21,6 +21,9 @@ #endif #include locale.h +#ifdef HAVE_LANGINFO_H +#include langinfo.h +#endif #include catalog/pg_type.h #include pqsignal.h @@ -356,38 +359,76 @@ print_unaligned_vertical(const printTableContent *cont, FILE *fout) /* Aligned text */ // +static const printTextFormat asciiformat = +{ + { + { +, +, + }, + { +, +, + }, + { +, +, + } + }, + -, + | +}; + +static const struct printTextFormat utf8format = +{ + { + /* ┌, ┬, ┐ */ + { \342\224\214, \342\224\254, \342\224\220 }, + /* ├, ┼, ┤ */ + { \342\224\234, \342\224\274, \342\224\244 }, + /* └, ┴, ┘ */ + { \342\224\224, \342\224\264, \342\224\230 } + }, + \342\224\200, /* ─ */ + \342\224\202 /* │ */ +}; /* draw line */ static void _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, - unsigned short border, FILE *fout) + unsigned short border, printTextRule pos, + const printTextFormat *format, + FILE *fout) { unsigned int i, j; + const printTextLineFormat *lformat = format-lrule[pos]; + if (border == 1) - fputc('-', fout); + fputs(format-hrule, fout); else if (border == 2) - fputs(+-, fout); + { + fputs(lformat-leftvrule, fout); + fputs(format-hrule, fout); + } for (i = 0; i ncolumns; i++) { for (j = 0; j widths[i]; j++) - fputc('-', fout); + fputs(format-hrule, fout); if (i ncolumns - 1) { if (border == 0) fputc(' ', fout); else -fputs(-+-, fout); + { +fputs(format-hrule, fout); +fputs(lformat-midvrule, fout); +fputs(format-hrule, fout); + } } } if (border == 2) - fputs(-+, fout); + { + fputs(format-hrule, fout); + fputs(lformat-rightvrule, fout); + } else if (border == 1) - fputc('-', fout); + fputs(format-hrule, fout); fputc('\n', fout); } @@ -397,7 +438,8 @@ _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, * Print pretty boxes around cells. */ static void -print_aligned_text(const printTableContent *cont, FILE *fout) +print_aligned_text(const printTableContent *cont, const printTextFormat *format, + FILE *fout) { bool opt_tuples_only = cont-opt-tuples_only; bool opt_numeric_locale = cont-opt-numericLocale; @@ -709,7 +751,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) int curr_nl_line; if (opt_border == 2) -_print_horizontal_line(col_count, width_wrap, opt_border, fout); +_print_horizontal_line(col_count, width_wrap, opt_border, PRINT_RULE_TOP, format, fout); for (i = 0; i col_count; i++) pg_wcsformat((unsigned char *) cont-headers[i], @@ -722,7 +764,7 @@ print_aligned_text(const printTableContent *cont, FILE *fout) while (more_col_wrapping) { if (opt_border == 2) - fprintf(fout, |%c, curr_nl_line ? '+' : ' '); + fprintf(fout, %s%c, format-vrule, curr_nl_line ? '+' : ' '); else if (opt_border == 1) fputc(curr_nl_line ? '+' : ' ', fout); @@ -753,19 +795,22 @@ print_aligned_text(const printTableContent *cont, FILE *fout) if (opt_border == 0) fputc(curr_nl_line ? '+' : ' ', fout); else - fprintf(fout, |%c, curr_nl_line ? '+' : ' '); + fprintf(fout, %s%c, format-vrule, curr_nl_line ? '+' : ' '); } } curr_nl_line++; if (opt_border == 2) - fputs( |, fout); +{ + fputc(' ', fout); + fputs(format-vrule, fout); +} else if (opt_border == 1) fputc(' ', fout);
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Roger Leigh escribió: +#if (defined(HAVE_LANGINFO_H) defined(CODESET)) + if (!strcmp(nl_langinfo(CODESET), UTF-8)) + text_format = utf8format; +#endif I think you should also try to match to UTF8, and do it in case-insensitive manner (pg_strcasecmp), just like chklocale.c. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sat, Aug 22, 2009 at 2:13 PM, Roger Leighrle...@debian.org wrote: Further minor cleanups to tweak column alignment in a corner case, and to correct indentation to match the rest of the code. Please read the guidelines here: http://wiki.postgresql.org/wiki/Submitting_a_Patch I don't think it's very helpful to submit a patch intended to do basically one thing split up over 10 threads. The PostgreSQL style is heavyweight commits, and I don't believe that there's any reason to suppose that someone would want to commit any one of these 9 pieces without the other 8. So I'd suggest you resend it as a single patch and then add it here. https://commitfest.postgresql.org/action/commitfest_view/open ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sat, Aug 22, 2009 at 07:42:10PM -0400, Robert Haas wrote: On Sat, Aug 22, 2009 at 2:13 PM, Roger Leighrle...@debian.org wrote: Further minor cleanups to tweak column alignment in a corner case, and to correct indentation to match the rest of the code. Please read the guidelines here: http://wiki.postgresql.org/wiki/Submitting_a_Patch I don't think it's very helpful to submit a patch intended to do basically one thing split up over 10 threads. The PostgreSQL style is heavyweight commits, and I don't believe that there's any reason to suppose that someone would want to commit any one of these 9 pieces without the other 8. OK. I have attached a single patch which combines the nine patches into one. I did read the patch page above, but it didn't mention anything on patch size, so I split it into incremental logical changes in order to make it easily reviewable. 6-9 can be viewed as a whole, since 7-9 are minor fixes to 6. The other information requested on that page: project: postgresql patch name: psql-utf8-table-1.patch purpose: adds support for Unicode UTF-8 box drawing to the text table drawing functions print_aligned_text and print_aligned_vertical. for: discussion or application. Testing shows identical formatting to current code. branch: HEAD compiles: yes platform-specific: POSIX, but contains appropriate preprocessor conditionals for portability. Not tested on non-POSIX platform, but conditional taken from elsewhere (port/chklocale.c). Only two lines of the patch are platform-specific, the rest is plain portable C. regression tests: No. I'm not aware of any psql regression tests testing output formatting. Manually confirmed output is unchanged with border=0/1/2 and expanded=0/1. use: Use a locale with LC_CTYPE CODESET=UTF-8. performance: Not tested, but should be minimal effect. There's an additional pointer dereference during string formatting in some places. Temporary copies of data are used in some functions e.g. of format-lrule[] for convenience and to potentially save on the number of dereferences, though any optimising compiler should make this cost zero (all data and pointers are const). comments: psql: Abstract table formatting characters used for different line types. printTextLineFormat describes the characters used to draw vertical lines across a horizontal rule at the left side, middle and right hand side. These are included in the formatting for an entire table (printTextFormat). The printTextRule enum is used as an offset into the printTextFormat line rules (lrule), allowing specification of line styles for the top, middle and bottom horizontal lines in a table. The other printTextFormat members, hrule and vrule define the formatting needed to draw horizontal and vertical rules. Add table formats for ASCII and UTF-8. Default to using the ASCII table. However, if a UTF-8 locale codeset is in use, switch to the UTF-8 table. print_aligned_text and print_aligned_vertical, and their helper fuctions pass the table formatting and (where applicable) line style information to allow correct printing of table lines. Convert print_aligned_text, and its helper function, to use table formatting in place of hardcoded ASCII characters. Convert print_aligned_vertical to use table formatting in place of hardcoded ASCII characters, and add helper function print_aligned_vertical_line to format individual lines. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `-GPG Public Key: 0x25BFB848 Please GPG sign your mail. diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c index 7505cd4..10faeb3 100644 --- a/src/bin/psql/print.c +++ b/src/bin/psql/print.c @@ -21,6 +21,9 @@ #endif #include locale.h +#ifdef HAVE_LANGINFO_H +#include langinfo.h +#endif #include catalog/pg_type.h #include pqsignal.h @@ -356,38 +359,76 @@ print_unaligned_vertical(const printTableContent *cont, FILE *fout) /* Aligned text */ // +static const printTextFormat asciiformat = +{ + { + { +, +, + }, + { +, +, + }, + { +, +, + } + }, + -, + | +}; + +static const struct printTextFormat utf8format = +{ + { + /* ┌, ┬, ┐ */ + { \342\224\214, \342\224\254, \342\224\220 }, + /* ├, ┼, ┤ */ + { \342\224\234, \342\224\274, \342\224\244 }, + /* └, ┴, ┘ */ + { \342\224\224, \342\224\264, \342\224\230 } + }, + \342\224\200, /* ─ */ + \342\224\202 /* │ */ +}; /* draw line */ static void _print_horizontal_line(const unsigned int ncolumns, const unsigned int *widths, - unsigned short border, FILE *fout) + unsigned short border, printTextRule pos, + const printTextFormat *format, + FILE *fout) { unsigned int i, j; + const printTextLineFormat *lformat = format-lrule[pos]; + if (border == 1) - fputc('-', fout); + fputs(format-hrule, fout); else
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
On Sun, Aug 23, 2009 at 1:49 AM, Roger Leighrle...@codelibre.net wrote: Please read the guidelines here: http://wiki.postgresql.org/wiki/Submitting_a_Patch I don't think it's very helpful to submit a patch intended to do basically one thing split up over 10 threads. The PostgreSQL style is heavyweight commits, and I don't believe that there's any reason to suppose that someone would want to commit any one of these 9 pieces without the other 8. OK. I have attached a single patch which combines the nine patches into one. I did read the patch page above, but it didn't mention anything on patch size, so I split it into incremental logical changes in order to make it easily reviewable. 6-9 can be viewed as a whole, since 7-9 are minor fixes to 6. I don't really have an opinion between one big patch versus multiple smaller patches. That will come down to the code and whether the separate patches are easier to read. It is sometimes hard to review a patch when its separated from the code which needs the new functionality though. What I do think is getting my inbox suddenly blatted with a screenful of patches from one author on one topicis pretty annoying. At the very least please send all the patches as attachments to a single email. I find it hard to understand how the linux-kernel list functions if everyone's patches are all mixed up together and you can't save a message and have all the related code saved but have to go back to the list to find emails with similar names. But the more fundamental failure here is that this email should have been sent first. You dumped a whole pile of changes on us without ever asking what we thought of the idea which is not how it's done. It's actually not as bad as it looked because it's actually not as many changes as it looked like at first, but still. Now I happen to like the idea of using UTF8 border characters (I hacked emacs-w3 to use the terminal font in X to do something similar bitd so maybe I'm biased by nostalgia though). And the changes are actually pretty simple despite looking bigger than they really are. They're actually only a few dozen lines. I wonder if there's a UTF8 tab character or monospaced space character or something which would make these tables display better in my mail client. Currently they're completely messed up because the display engine doesn't recognize they should be monospaced whereas the usual output does get recognized. As a result the padding all gets collapsed and everything is misaligned. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unicode UTF-8 table formatting for psql text output
Greg Stark wrote: I don't really have an opinion between one big patch versus multiple smaller patches. That will come down to the code and whether the separate patches are easier to read. It is sometimes hard to review a patch when its separated from the code which needs the new functionality though. I have an opinion. These patches involved multiple changes to the same few files. Please, don't do that. In general one patch per feature is what makes life for reviewers and committers easier. For big features, sometimes we ask people to break it up into several patches, but this wasn't a big feature at all, and there is not the slightest need for more than one patch. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers