Re: [GENERAL] main log encoding problem

2012-07-19 Thread Alexander Law

Hello,


Implementing any of these isn't trivial - especially making sure 
messages emitted to stderr from things like segfaults and dynamic 
linker messages are always correct. Ensuring that the logging 
collector knows when setlocale() has been called to change the 
encoding and translation of system messages, handling the different 
logging output methods, etc - it's going to be fiddly.


I have some performance concerns about the transcoding required for 
(b) or (c), but realistically it's already the norm to convert all the 
data sent to and from clients. Conversion for logging should not be a 
significant additional burden. Conversion can be short-circuited out 
when source and destination encodings are the same for the common case 
of logging in utf-8 or to a dedicated file.


The initial issue was that log file contains messages in different 
encodings. So transcoding is performed already, but it's not consistent 
and in my opinion this is the main problem.



I suspect the eventual choice will be "all of the above":

- Default to (b) or (c), both have pros and cons. I favour (c) with a 
UTF-8 BOM to warn editors, but (b) is nice for people whose DBs are 
all in the system locale.
As I understand UTF-8 is the default encoding for databases. And even 
when a database is in  the system encoding, translated postgres messages 
still come in UTF-8 and will go through UTF-8 -> System locale 
conversion within gettext.


- Allow (a) for people who have many different DBs in many different 
encodings, do high volume logging, and want to avoid conversion 
overhead. Let them deal with the mess, just provide an additional % 
code for the encoding so they can name their per-DB log files to 
indicate the encoding.


I think that (a) solution can be an evolvement of the logging mechanism 
if there will be a need for it.
The main issue is just that code needs to be prototyped, cleaned up, 
and submitted. So far nobody's cared enough to design it, build it, 
and get it through patch review. I've just foolishly volunteered 
myself to work on an automated crash-test system for virtual plug-pull 
testing, so I'm not stepping up.


I see you point and I can prepare a prototype if the proposed (c) 
solution seems reasonable enough and can be accepted.


Best regards,
Alexander


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] main log encoding problem

2012-07-18 Thread Craig Ringer

On 07/18/2012 11:16 PM, Alexander Law wrote:

Hello!

May I to propose a solution and to step up?

I've read a discussion of the bug #5800 and here is my 2 cents.
To make things clear let me give an example.
I am a PostgreSQL hosting provider and I let my customers to create 
any databases they wish.
I have clients all over the world (so they can create databases with 
different encoding).


The question is - what I (as admin) want to see in my postgresql log, 
containing errors from all the databases?

IMHO we should consider two requirements for the log.
First, The file should be readable with a generic text viewer. Second, 
It should be useful and complete as possible.


Now I see following solutions.
A. We have different logfiles for each database with different encodings.
Then all our logs will be readable, but we have to look at them one by 
onе and it's inconvenient at least.
Moreover, our log reader should understand what encoding to use for 
each file.


B. We have one logfile with the operating system encoding.
First downside is that the logs can be different for different OSes.
The second is that Windows has non-Unicode system encoding.
And such an encoding can't represent all the national characters. So 
at best I will get ??? in the log.


C. We have one logfile with UTF-8.
Pros: Log messages of all our clients can fit in it. We can use any 
generic editor/viewer to open it.

Nothing changes for Linux (and other OSes with UTF-8 encoding).
Cons: All the strings written to log file should go through some 
conversation function.


I think that the last solution is the solution. What is your opinion?


Implementing any of these isn't trivial - especially making sure 
messages emitted to stderr from things like segfaults and dynamic linker 
messages are always correct. Ensuring that the logging collector knows 
when setlocale() has been called to change the encoding and translation 
of system messages, handling the different logging output methods, etc - 
it's going to be fiddly.


I have some performance concerns about the transcoding required for (b) 
or (c), but realistically it's already the norm to convert all the data 
sent to and from clients. Conversion for logging should not be a 
significant additional burden. Conversion can be short-circuited out 
when source and destination encodings are the same for the common case 
of logging in utf-8 or to a dedicated file.


I suspect the eventual choice will be "all of the above":

- Default to (b) or (c), both have pros and cons. I favour (c) with a 
UTF-8 BOM to warn editors, but (b) is nice for people whose DBs are all 
in the system locale.


- Allow (a) for people who have many different DBs in many different 
encodings, do high volume logging, and want to avoid conversion 
overhead. Let them deal with the mess, just provide an additional % code 
for the encoding so they can name their per-DB log files to indicate the 
encoding.


The main issue is just that code needs to be prototyped, cleaned up, and 
submitted. So far nobody's cared enough to design it, build it, and get 
it through patch review. I've just foolishly volunteered myself to work 
on an automated crash-test system for virtual plug-pull testing, so I'm 
not stepping up.


--
Craig Ringer



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] main log encoding problem

2012-07-18 Thread Alexander Law

Hello!

May I to propose a solution and to step up?

I've read a discussion of the bug #5800 and here is my 2 cents.
To make things clear let me give an example.
I am a PostgreSQL hosting provider and I let my customers to create any 
databases they wish.
I have clients all over the world (so they can create databases with 
different encoding).


The question is - what I (as admin) want to see in my postgresql log, 
containing errors from all the databases?

IMHO we should consider two requirements for the log.
First, The file should be readable with a generic text viewer. Second, 
It should be useful and complete as possible.


Now I see following solutions.
A. We have different logfiles for each database with different encodings.
Then all our logs will be readable, but we have to look at them one by 
onе and it's inconvenient at least.
Moreover, our log reader should understand what encoding to use for each 
file.


B. We have one logfile with the operating system encoding.
First downside is that the logs can be different for different OSes.
The second is that Windows has non-Unicode system encoding.
And such an encoding can't represent all the national characters. So at 
best I will get ??? in the log.


C. We have one logfile with UTF-8.
Pros: Log messages of all our clients can fit in it. We can use any 
generic editor/viewer to open it.

Nothing changes for Linux (and other OSes with UTF-8 encoding).
Cons: All the strings written to log file should go through some 
conversation function.


I think that the last solution is the solution. What is your opinion?

In fact the problem exists even with a simple installation on Windows 
when you use non-English locale.

So the solution would be useful for many of us.

Best regards,
Alexander

P.S. sorry for the wrong subject in my previous message sent to 
pgsql-general



On 05/23/2012 09:15 AM, yi huang wrote:

I'm using postgresql 9.1.3 from debian squeeze-backports with
zh_CN.UTF-8 locale, i find my main log (which is
"/var/log/postgresql/postgresql-9.1-main.log") contains "???" which
indicate some sort of charset encoding problem.


It's a known issue, I'm afraid. The PostgreSQL postmaster logs in the
system locale, and the PostgreSQL backends log in whatever encoding
their database is in. They all write to the same log file, producing a
log file full of mixed encoding data that'll choke many text editors.

If you force your editor to re-interpret the file according to the
encoding your database(s) are in, this may help.

In the future it's possible that this may be fixed by logging output to
different files on a per-database basis or by converting the text
encoding of log messages, but no agreement has been reached on the
correct approach and nobody has stepped up to implement it.

--
Craig Ringer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] main log encoding problem

2012-05-29 Thread Craig Ringer

On 05/23/2012 09:15 AM, yi huang wrote:
I'm using postgresql 9.1.3 from debian squeeze-backports with 
zh_CN.UTF-8 locale, i find my main log (which is 
"/var/log/postgresql/postgresql-9.1-main.log") contains "???" which 
indicate some sort of charset encoding problem. 


It's a known issue, I'm afraid. The PostgreSQL postmaster logs in the 
system locale, and the PostgreSQL backends log in whatever encoding 
their database is in. They all write to the same log file, producing a 
log file full of mixed encoding data that'll choke many text editors.


If you force your editor to re-interpret the file according to the 
encoding your database(s) are in, this may help.


In the future it's possible that this may be fixed by logging output to 
different files on a per-database basis or by converting the text 
encoding of log messages, but no agreement has been reached on the 
correct approach and nobody has stepped up to implement it.


--
Craig Ringer

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] main log encoding problem

2012-05-22 Thread yi huang
I'm using postgresql 9.1.3 from debian squeeze-backports with zh_CN.UTF-8
locale, i find my main log (which is
"/var/log/postgresql/postgresql-9.1-main.log") contains "???" which
indicate some sort of charset encoding problem.
But error messages related to pgsql is fine, only other system messages
have this problem, for example:

2012-05-19 16:06:12 CST ??:  ?? 2012-05-19 16:06:10 CST
> 2012-05-19 16:06:12 CST ??:  ???
> 2012-05-19 16:06:12 CST ??:  ???autovacuum
> 2012-05-19 16:06:12 CST ??:  ???
> 2012-05-19 16:07:16 CST 错误:  角色"postgres" 已经存在(in english: Error: role
> "postgres" already exists)
> 2012-05-19 16:07:16 CST 语句:  CREATE ROLE postgres;
> 2012-05-19 16:07:16 CST 错误:  语言 "plpgsql" 已经存在 (in
> english: Error: language "plpgsql" already exists)
> 2012-05-19 16:07:16 CST 语句:  CREATE PROCEDURAL LANGUAGE plpgsql;
> 2012-05-19 16:08:23 CST :  ?? "huangyi" ???
> 2012-05-19 16:08:52 CST :  ?? "huangyi" ???
> 2012-05-19 16:09:01 CST ??:  ???(zlfund)(huangyi) ???
> 2012-05-19 16:09:01 CST :  Peer authentication failed for user "zlfund"
> 2012-05-19 16:09:34 CST ??:  ???(zlfund)(huangyi) ???
> 2012-05-19 16:09:34 CST :  Peer authentication failed for user "zlfund"


I guess it has something to do with packaging problem rather than
postgresql itself, but it would be great if you can give me some clue where
the problem might be.

My best regards.
Yi Huang.