Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2021-12-13 Thread Tom Lane
Huansong Fu  writes:
> We recently saw a similar issue in v12 and wondered why the corresponding fix 
> for v14 (https://github.com/postgres/postgres/commit/16e3ad5d143) was not 
> backported to v13 and before. The commit message did mention that this fix 
> might have problem with translatable string messages - would you mind 
> providing a bit more context about what is needed to backport this fix? Thank 
> you.

Well, the commit message lists the reasons for not back-patching:

* we've seen few field complaints about such problems
* it'd add work for translators
* it wouldn't work reliably before v12.

Perhaps there's a case for back-patching as far as v12,
but I can't get very excited about it.

regards, tom lane




Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2021-12-13 Thread Huansong Fu
Hi, 

We recently saw a similar issue in v12 and wondered why the corresponding fix 
for v14 (https://github.com/postgres/postgres/commit/16e3ad5d143) was not 
backported to v13 and before. The commit message did mention that this fix 
might have problem with translatable string messages - would you mind providing 
a bit more context about what is needed to backport this fix? Thank you.

Regards,
Huansong
https://vmware.com/ 

> On Jun 29, 2020, at 11:42 AM, Tom Lane  wrote:
> 
> Quan Zongliang  writes:
>> I tested it, and it looks fine.
> 
> Pushed, thanks for reporting the issue!
> 
>   regards, tom lane
> 
> 
> 
> 



Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-29 Thread Tom Lane
Quan Zongliang  writes:
> I tested it, and it looks fine.

Pushed, thanks for reporting the issue!

regards, tom lane




Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-29 Thread Quan Zongliang

Good.

I tested it, and it looks fine.

Thank you.


On 2020/6/29 1:10 上午, Tom Lane wrote:

I wrote:

Even granting the premise, the proposed patch seems like a significant
decrease in user-friendliness for typical cases.  I'd rather see us
make an effort to print one valid-per-the-DB-encoding character.
Now that we can rely on snprintf to count %s restrictions in bytes,
I think something like this should work:
  errmsg("\"%.*s\" is not a valid binary digit",
 pg_mblen(sp), sp)));
But the real problem is that this is only the tip of the iceberg.
You didn't even hit all the %c usages in varbit.c.

I went through all the %c format sequences in the backend to see which
ones could use this type of fix.  There were not as many as I'd expected,
but still a fair number.  (I skipped cases where the input was coming from
the catalogs, as well as some non-user-facing debug printouts.)  That
leads to the attached patch, which seems to do the job without breaking
anything that works today.

regards, tom lane

PS: I failed to resist the temptation to improve some shoddy error
messages nearby in pageinspect/heapfuncs.c.



Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-28 Thread Tom Lane
I wrote:
> Even granting the premise, the proposed patch seems like a significant
> decrease in user-friendliness for typical cases.  I'd rather see us
> make an effort to print one valid-per-the-DB-encoding character.
> Now that we can rely on snprintf to count %s restrictions in bytes,
> I think something like this should work:
>  errmsg("\"%.*s\" is not a valid binary digit",
> pg_mblen(sp), sp)));
> But the real problem is that this is only the tip of the iceberg.
> You didn't even hit all the %c usages in varbit.c.

I went through all the %c format sequences in the backend to see which
ones could use this type of fix.  There were not as many as I'd expected,
but still a fair number.  (I skipped cases where the input was coming from
the catalogs, as well as some non-user-facing debug printouts.)  That
leads to the attached patch, which seems to do the job without breaking
anything that works today.

regards, tom lane

PS: I failed to resist the temptation to improve some shoddy error
messages nearby in pageinspect/heapfuncs.c.

diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 60bdbea46b..b3304ff844 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -80,7 +80,9 @@ get_val(HSParser *state, bool ignoreeq, bool *escaped)
 			}
 			else if (*(state->ptr) == '=' && !ignoreeq)
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 			else if (*(state->ptr) == '\\')
 			{
@@ -219,7 +221,9 @@ parse_hstore(HSParser *state)
 			}
 			else if (!isspace((unsigned char) *(state->ptr)))
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 		}
 		else if (st == WGT)
@@ -234,7 +238,9 @@ parse_hstore(HSParser *state)
 			}
 			else
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 		}
 		else if (st == WVAL)
@@ -267,7 +273,9 @@ parse_hstore(HSParser *state)
 			}
 			else if (!isspace((unsigned char) *(state->ptr)))
 			{
-elog(ERROR, "Syntax error near '%c' at position %d", *(state->ptr), (int32) (state->ptr - state->begin));
+elog(ERROR, "Syntax error near \"%.*s\" at position %d",
+	 pg_mblen(state->ptr), state->ptr,
+	 (int32) (state->ptr - state->begin));
 			}
 		}
 		else
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 11a910184b..f04455da12 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -30,6 +30,7 @@
 #include "catalog/pg_am_d.h"
 #include "catalog/pg_type.h"
 #include "funcapi.h"
+#include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pageinspect.h"
 #include "port/pg_bitutils.h"
@@ -99,7 +100,8 @@ text_to_bits(char *str, int len)
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("illegal character '%c' in t_bits string", str[off])));
+	 errmsg("invalid character \"%.*s\" in t_bits string",
+			pg_mblen(str + off), str + off)));
 
 		if (off % 8 == 7)
 			bits[off / 8] = byte;
@@ -460,14 +462,13 @@ tuple_data_split(PG_FUNCTION_ARGS)
 		if (!t_bits_str)
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("argument of t_bits is null, but it is expected to be null and %d character long",
-			bits_len)));
+	 errmsg("t_bits string must not be NULL")));
 
 		bits_str_len = strlen(t_bits_str);
 		if (bits_len != bits_str_len)
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("unexpected length of t_bits %u, expected %d",
+	 errmsg("unexpected length of t_bits string: %u, expected %u",
 			bits_str_len, bits_len)));
 
 		/* do the conversion */
@@ -478,7 +479,7 @@ tuple_data_split(PG_FUNCTION_ARGS)
 		if (t_bits_str)
 			ereport(ERROR,
 	(errcode(ERRCODE_DATA_CORRUPTED),
-	 errmsg("t_bits string is expected to be NULL, but instead it is %zu bytes length",
+	 errmsg("t_bits string is expected to be NULL, but instead it is %zu bytes long",
 			strlen(t_bits_str;
 	}
 
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 61d318d93c..a609d49c12 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -15,6 +15,7 @@
 
 #include 
 
+#include "mb/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 
@@ -171,17 +172,19 @@ hex_encode(const char *src, size_t len, char 

Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-26 Thread Tom Lane
Quan Zongliang  writes:
> The reason is that the error message in the bit_in / varbit_in function 
> is output directly using %c. Causes the log file to not be decoded 
> correctly.

> The attachment is a patch.

I'm really quite skeptical of the premise here.  We do not guarantee that
the postmaster log file is valid in any particular encoding; it'd be
nearly impossible to do so if the cluster contains databases using
different encodings.  So I think you'd be way better off to reformulate
your log-reading code to be less fragile.

Even granting the premise, the proposed patch seems like a significant
decrease in user-friendliness for typical cases.  I'd rather see us
make an effort to print one valid-per-the-DB-encoding character.
Now that we can rely on snprintf to count %s restrictions in bytes,
I think something like this should work:

 errmsg("\"%.*s\" is not a valid binary digit",
pg_mblen(sp), sp)));

But the real problem is that this is only the tip of the iceberg.
You didn't even hit all the %c usages in varbit.c.  A quick grep finds
these other spots that can doubtless be made to do the same thing:

acl.c:899:  elog(ERROR, "unrecognized objtype abbreviation: 
%c", objtypec);
arrayfuncs.c:507:   
 errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:554:   
 errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:584:   
 errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:591:   
 errdetail("Unmatched \"%c\" character.", '}')));
arrayfuncs.c:633:   
 errdetail("Unexpected \"%c\" character.",
encode.c:184:errmsg("invalid hexadecimal digit: 
\"%c\"", c)));
encode.c:341:errmsg("invalid symbol 
\"%c\" while decoding base64 sequence", (int) c)));
formatting.c:3298:  
  errmsg("unmatched format separator \"%c\"",
jsonpath_gram.c:2390:
errdetail("unrecognized flag character \"%c\" in LIKE_REGEX predicate",
regexp.c:426:
errmsg("invalid regular expression option: \"%c\"",
tsvector_op.c:312:  elog(ERROR, "unrecognized weight: %c", 
char_weight);
tsvector_op.c:872:   
errmsg("unrecognized weight: \"%c\"", char_weight)));
varbit.c:233:errmsg("\"%c\" is not 
a valid binary digit",
varbit.c:258:errmsg("\"%c\" is not 
a valid hexadecimal digit",
varbit.c:534:errmsg("\"%c\" is not 
a valid binary digit",
varbit.c:559:errmsg("\"%c\" is not 
a valid hexadecimal digit",
varlena.c:5589:  errmsg("unrecognized format() 
type specifier \"%c\"",
varlena.c:5710:  errmsg("unrecognized 
format() type specifier \"%c\"",

and that's just in src/backend/utils/adt/.

regards, tom lane




bugfix: invalid bit/varbit input causes the log file to be unreadable

2020-06-26 Thread Quan Zongliang


The bit/varbit type input functions cause file_fdw to fail to read the 
logfile normally.


1. Server conf:
server_encoding = UTF8
locale = zh_CN.UTF-8

2. Create external tables using file_fdw

CREATE EXTENSION file_fdw;
CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE pglog (
 log_time timestamp(3) with time zone,
 user_name text,
 database_name text,
 process_id integer,
 connection_from text,
 session_id text,
 session_line_num bigint,
 command_tag text,
 session_start_time timestamp with time zone,
 virtual_transaction_id text,
 transaction_id bigint,
 error_severity text,
 sql_state_code text,
 message text,
 detail text,
 hint text,
 internal_query text,
 internal_query_pos integer,
 context text,
 query text,
 query_pos integer,
 location text,
 application_name text
) SERVER pglog
OPTIONS ( filename 'log/postgresql-2020-06-16_213409.csv',
 format 'csv');

It's normal to be here.

3. bit/varbit input
 select b'Ù';

The foreign table cannot be accessed. SELECT * FROM pglog will get:
invalid byte sequence for encoding "UTF8": 0xc3 0x22


The reason is that the error message in the bit_in / varbit_in function 
is output directly using %c. Causes the log file to not be decoded 
correctly.


The attachment is a patch.



diff --git a/src/backend/utils/adt/varbit.c b/src/backend/utils/adt/varbit.c
index f0c6a44b84..506cc35446 100644
--- a/src/backend/utils/adt/varbit.c
+++ b/src/backend/utils/adt/varbit.c
@@ -230,8 +230,8 @@ bit_in(PG_FUNCTION_ARGS)
else if (*sp != '0')
ereport(ERROR,

(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-errmsg("\"%c\" is not a valid 
binary digit",
-   *sp)));
+errmsg("\"0x%02X\" is not a 
valid binary digit",
+   (unsigned 
char)*sp)));
 
x >>= 1;
if (x == 0)
@@ -531,8 +531,8 @@ varbit_in(PG_FUNCTION_ARGS)
else if (*sp != '0')
ereport(ERROR,

(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-errmsg("\"%c\" is not a valid 
binary digit",
-   *sp)));
+errmsg("\"0x%02X\" is not a 
valid binary digit",
+   (unsigned 
char)*sp)));
 
x >>= 1;
if (x == 0)