Hi, On Mon, 20 Apr 2026 at 19:09, Tom Lane <[email protected]> wrote:
> Ayush Tiwari <[email protected]> writes: > > COPY TO FORMAT JSON silently accepts the ENCODING option but doesn't > > perform encoding conversion(?) CopyToJsonOneRow() sends the output of > > composite_to_json() via CopySendData() without calling > > pg_server_to_any(), unlike the text and CSV paths. > > > COPY t TO '/tmp/out.json' WITH (FORMAT json, ENCODING 'LATIN1'); > > > On a UTF-8 server this produces UTF-8 output, not LATIN1. > > Seems to me the correct thing here is to make it work like the other > cases, ie perform pg_server_to_any(). I have exactly no sympathy for > the argument about the RFC saying it must be UTF-8, not least because > that's not in fact what is implemented (what if the server encoding > isn't UTF-8?). > Agreed. I initially thought rejecting the option was the safer route given the RFC, but as you pointed out, we aren't enforcing UTF-8 strictly on the server side anyway. > Rejecting this option altogether doesn't improve anything, not > functionally, not specs-compliance-wise, nor according to the > principle of least surprise. > Makes sense. Implementing the conversion properly keeps JSON format consistent with how the text and CSV formats behave. > > The attached patch rejects the explicit ENCODING option for JSON > > mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are > > already rejected. The implicit client_encoding case is a separate > > design question (should COPY TO JSON always emit UTF-8 regardless > > of client_encoding?) that maybe we should address separately and not as > > part of v19. > > No, you don't get to punt this till later. Once we ship v19 there's > going to be a strong expectation of backwards compatibility. > > The idea of sending UTF-8 to a client that's set client_encoding to > something else would be risible, if it weren't a security hazard. > I agree sending unconverted bytes to a mismatched client encoding is clearly a security hazard that needs addressing. Did not consider the backward compatibility part, my bad. Was trying out adding pg_server_to_any() to the json_buf after composite_to_json() returns, correctly covering both explicit ENCODING option specifications and implicit client_encoding mismatches. Let me send a patch with code and associated test cases. Regards, Ayush
