Quan Zongliang <quanzongli...@gmail.com> writes:
> The reason is that the error message in the bit_in / varbit_in function 
> is output directly using %c. Causes the log file to not be decoded 
> correctly.

> The attachment is a patch.

I'm really quite skeptical of the premise here.  We do not guarantee that
the postmaster log file is valid in any particular encoding; it'd be
nearly impossible to do so if the cluster contains databases using
different encodings.  So I think you'd be way better off to reformulate
your log-reading code to be less fragile.

Even granting the premise, the proposed patch seems like a significant
decrease in user-friendliness for typical cases.  I'd rather see us
make an effort to print one valid-per-the-DB-encoding character.
Now that we can rely on snprintf to count %s restrictions in bytes,
I think something like this should work:

                         errmsg("\"%.*s\" is not a valid binary digit",
                                pg_mblen(sp), sp)));

But the real problem is that this is only the tip of the iceberg.
You didn't even hit all the %c usages in varbit.c.  A quick grep finds
these other spots that can doubtless be made to do the same thing:

acl.c:899:                      elog(ERROR, "unrecognized objtype abbreviation: 
%c", objtypec);
arrayfuncs.c:507:                                                               
 errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:554:                                                               
         errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:584:                                                               
         errdetail("Unexpected \"%c\" character.",
arrayfuncs.c:591:                                                               
         errdetail("Unmatched \"%c\" character.", '}')));
arrayfuncs.c:633:                                                               
                 errdetail("Unexpected \"%c\" character.",
encode.c:184:                            errmsg("invalid hexadecimal digit: 
\"%c\"", c)));
encode.c:341:                                            errmsg("invalid symbol 
\"%c\" while decoding base64 sequence", (int) c)));
formatting.c:3298:                                                              
                  errmsg("unmatched format separator \"%c\"",
jsonpath_gram.c:2390:                                            
errdetail("unrecognized flag character \"%c\" in LIKE_REGEX predicate",
regexp.c:426:                                                    
errmsg("invalid regular expression option: \"%c\"",
tsvector_op.c:312:                      elog(ERROR, "unrecognized weight: %c", 
char_weight);
tsvector_op.c:872:                                               
errmsg("unrecognized weight: \"%c\"", char_weight)));
varbit.c:233:                                            errmsg("\"%c\" is not 
a valid binary digit",
varbit.c:258:                                            errmsg("\"%c\" is not 
a valid hexadecimal digit",
varbit.c:534:                                            errmsg("\"%c\" is not 
a valid binary digit",
varbit.c:559:                                            errmsg("\"%c\" is not 
a valid hexadecimal digit",
varlena.c:5589:                                  errmsg("unrecognized format() 
type specifier \"%c\"",
varlena.c:5710:                                          errmsg("unrecognized 
format() type specifier \"%c\"",

and that's just in src/backend/utils/adt/.

                        regards, tom lane


Reply via email to