Re: Emitting JSON to file using COPY TO
> Am I understanding something incorrectly? No, you've got it. You already covered the concerns there. > That seems quite absurd, TBH. I know we've catered for some absurdity in > the CSV code (much of it down to me), so maybe we need to be liberal in > what we accept here too. IMNSHO, we should produce either a single JSON > document (the ARRAY case) or a series of JSON documents, one per row > (the LINES case). For what it's worth, I agree with Andrew on this. I also agree with COPY FROM allowing for potentially bogus commas at the end of non-arrays for interop with other products, but to not do that in COPY TO (unless there is some real compelling case to do so). Emitting bogus JSON (non-array with commas) feels wrong and would be nice to not perpetuate that, if possible. Thanks again for doing this. If I can be of any help, let me know. If\When this makes it into the production product, I'll be using this feature for sure. -Davin
Re: Emitting JSON to file using COPY TO
Hi Joe, In reviewing the 005 patch, I think that when used with FORCE ARRAY, we should also _imply_ FORCE ROW DELIMITER. I can't envision a use case where someone would want to use FORCE ARRAY without also using FORCE ROW DELIMITER. I can, however, envision a use case where someone would want FORCE ROW DELIMITER without FORCE ARRAY, like maybe including into a larger array. I definitely appreciate these options and the flexibility that they afford from a user perspective. In the test output, will you also show the different variations with FORCE ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false), (false, true), (true, true)}? Technically you've already shown me the (false, false) case as those are the defaults. Thanks!
Re: Emitting JSON to file using COPY TO
Thanks for the wayback machine link Andrew. I read it, understood it, and will comply. Joe, those test cases look great and the outputs are the same as `jq`. As for forward slashes being escaped, I found this: https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped . Forward slash escaping is optional, so not escaping them in Postgres is okay. The important thing is that the software _reading_ JSON interprets both '\/' and '/' as '/'.
Re: Emitting JSON to file using COPY TO
Sorry about the top posting / top quoting... the link you sent me gives me a 404. I'm not exactly sure what top quoting / posting means and Googling those terms wasn't helpful for me, but I've removed the quoting that my mail client is automatically "helpfully" adding to my emails. I mean no offense. Okay, digging in more... If the value contains text that has BOMs [footnote 1] in it, it must be preserved (the database doesn't need to interpret them or do anything special with them - just store it and fetch it). There are however a few characters that need to be escaped (per https://www.w3docs.com/snippets/java/how-should-i-escape-strings-in-json.html) so that the JSON format isn't broken. They are: 1. " (double quote) 2. \ (backslash) 3. / (forward slash) 4. \b (backspace) 5. \f (form feed) 6. \n (new line) 7. \r (carriage return) 8. \t (horizontal tab) These characters should be represented in the test cases to see how the escaping behaves and to ensure that the escaping is done properly per JSON requirements. Forward slash comes as a bit of a surprise to me, but `jq` handles it either way: ➜ echo '{"key": "this / is a forward slash"}' | jq . { "key": "this / is a forward slash" } ➜ echo '{"key": "this \/ is a forward slash"}' | jq . { "key": "this / is a forward slash" } Hope it helps, and thank you! 1. I don't disagree that BOMs shouldn't be used for UTF-8, but I'm also processing UTF-16{BE,LE} and UTF-32{BE,LE} (as well as other textural formats that are neither ASCII or Unicode). I don't have the luxury of changing the data that is given.
Re: Emitting JSON to file using COPY TO
Looking great! For testing, in addition to the quotes, include DOS and Unix EOL, \ and /, Byte Order Markers, and mulitbyte characters like UTF-8. Essentially anything considered textural is fair game to be a value. On Mon, Dec 4, 2023, 10:46 Joe Conway wrote: > On 12/4/23 09:25, Andrew Dunstan wrote: > > > > On 2023-12-04 Mo 08:37, Joe Conway wrote: > >> On 12/4/23 07:41, Andrew Dunstan wrote: > >>> > >>> On 2023-12-03 Su 20:14, Joe Conway wrote: > >>>> (please don't top quote on the Postgres lists) > >>>> > >>>> On 12/3/23 17:38, Davin Shearer wrote: > >>>>> " being quoted as \\" breaks the JSON. It needs to be \". This has > >>>>> been my whole problem with COPY TO for JSON. > >>>>> > >>>>> Please validate that the output is in proper format with correct > >>>>> quoting for special characters. I use `jq` on the command line to > >>>>> validate and format the output. > >>>> > >>>> I just hooked existing "row-to-json machinery" up to the "COPY TO" > >>>> statement. If the output is wrong (just for for this use case?), > >>>> that would be a missing feature (or possibly a bug?). > >>>> > >>>> Davin -- how did you work around the issue with the way the built in > >>>> functions output JSON? > >>>> > >>>> Andrew -- comments/thoughts? > >>> > >>> I meant to mention this when I was making comments yesterday. > >>> > >>> The patch should not be using CopyAttributeOutText - it will try to > >>> escape characters such as \, which produces the effect complained of > >>> here, or else we need to change its setup so we have a way to inhibit > >>> that escaping. > >> > >> > >> Interesting. > >> > >> I am surprised this has never been raised as a problem with COPY TO > >> before. > >> > >> Should the JSON output, as produced by composite_to_json(), be sent > >> as-is with no escaping at all? If yes, is JSON somehow unique in this > >> regard? > > > > > > Text mode output is in such a form that it can be read back in using > > text mode input. There's nothing special about JSON in this respect - > > any text field will be escaped too. But output suitable for text mode > > input is not what you're trying to produce here; you're trying to > > produce valid JSON. > > > > So, yes, the result of composite_to_json, which is already suitably > > escaped, should not be further escaped in this case. > > Gotcha. > > This patch version uses CopySendData() instead and includes > documentation changes. Still lacks regression tests. > > Hopefully this looks better. Any other particular strings I ought to > test with? > > 8<-- > test=# copy (select * from foo limit 4) to stdout (format json, > force_array true); > [ > {"id":1,"f1":"line with \" in it: > 1","f2":"2023-12-03T12:26:41.596053-05:00"} > ,{"id":2,"f1":"line with ' in it: > 2","f2":"2023-12-03T12:26:41.596173-05:00"} > ,{"id":3,"f1":"line with \" in it: > 3","f2":"2023-12-03T12:26:41.596179-05:00"} > ,{"id":4,"f1":"line with ' in it: > 4","f2":"2023-12-03T12:26:41.596182-05:00"} > ] > 8<-- > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >
Re: Emitting JSON to file using COPY TO
I worked around it by using select json_agg(t)... and redirecting it to file via psql on the command line. COPY TO was working until we ran into broken JSON and discovered the double quoting issue due to some values containing " in them.
Re: Emitting JSON to file using COPY TO
" being quoted as \\" breaks the JSON. It needs to be \". This has been my whole problem with COPY TO for JSON. Please validate that the output is in proper format with correct quoting for special characters. I use `jq` on the command line to validate and format the output. On Sun, Dec 3, 2023, 10:51 Joe Conway wrote: > On 12/3/23 10:31, Davin Shearer wrote: > > Please be sure to include single and double quotes in the test values > > since that was the original problem (double quoting in COPY TO breaking > > the JSON syntax). > > test=# copy (select * from foo limit 4) to stdout (format json); > {"id":2456092,"f1":"line with ' in it: > 2456092","f2":"2023-12-03T10:44:40.9712-05:00"} > {"id":2456093,"f1":"line with \\" in it: > 2456093","f2":"2023-12-03T10:44:40.971221-05:00"} > {"id":2456094,"f1":"line with ' in it: > 2456094","f2":"2023-12-03T10:44:40.971225-05:00"} > {"id":2456095,"f1":"line with \\" in it: > 2456095","f2":"2023-12-03T10:44:40.971228-05:00"} > > -- > Joe Conway > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > >
Re: Emitting JSON to file using COPY TO
Please be sure to include single and double quotes in the test values since that was the original problem (double quoting in COPY TO breaking the JSON syntax). On Sun, Dec 3, 2023, 10:11 Andrew Dunstan wrote: > > On 2023-12-01 Fr 14:28, Joe Conway wrote: > > On 11/29/23 10:32, Davin Shearer wrote: > >> Thanks for the responses everyone. > >> > >> I worked around the issue using the `psql -tc` method as Filip > >> described. > >> > >> I think it would be great to support writing JSON using COPY TO at > >> some point so I can emit JSON to files using a PostgreSQL function > >> directly. > >> > >> -Davin > >> > >> On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák >> <mailto:fi...@sedlakovi.org>> wrote: > >> > >> This would be a very special case for COPY. It applies only to a > >> single > >> column of JSON values. The original problem can be solved with psql > >> --tuples-only as David wrote earlier. > >> > >> > >> $ psql -tc 'select json_agg(row_to_json(t)) > >>from (select * from public.tbl_json_test) t;' > >> > >>[{"id":1,"t_test":"here's a \"string\""}] > >> > >> > >> Special-casing any encoding/escaping scheme leads to bugs and harder > >> parsing. > > > > (moved to hackers) > > > > I did a quick PoC patch (attached) -- if there interest and no hard > > objections I would like to get it up to speed for the January commitfest. > > > > Currently the patch lacks documentation and regression test support. > > > > Questions: > > -- > > 1. Is supporting JSON array format sufficient, or does it need to > > support some other options? How flexible does the support scheme need > > to be? > > > > 2. This only supports COPY TO and we would undoubtedly want to support > > COPY FROM for JSON as well, but is that required from the start? > > > > Thanks for any feedback. > > > I realize this is just a POC, but I'd prefer to see composite_to_json() > not exposed. You could use the already public datum_to_json() instead, > passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third > arguments. > > I think JSON array format is sufficient. > > I can see both sides of the COPY FROM argument, but I think insisting on > that makes this less doable for release 17. On balance I would stick to > COPY TO for now. > > > cheers > > > andrew > > > -- > Andrew Dunstan > EDB: https://www.enterprisedb.com > >
Re: Emitting JSON to file using COPY TO
I'm really glad to see this taken up as a possible new feature and will definitely use it if it gets released. I'm impressed with how clean, understandable, and approachable the postgres codebase is in general and how easy it is to read and understand this patch. I reviewed the patch (though I didn't build and test the code) and have a concern with adding the '[' at the beginning and ']' at the end of the json output. Those are already added by `json_agg` ( https://www.postgresql.org/docs/current/functions-aggregate.html) as you can see in my initial email. Adding them in the COPY TO may be redundant (e.g., [[{"key":"value"...}]]). I think COPY TO makes good sense to support, though COPY FROM maybe not so much as JSON isn't necessarily flat and rectangular like CSV. For my use-case, I'm emitting JSON files to Apache NiFi for processing, and NiFi has superior handling of JSON (via JOLT parsers) versus CSV where parsing is generally done with regex. I want to be able to emit JSON using a postgres function and thus COPY TO. Definitely +1 for COPY TO. I don't think COPY FROM will work out well unless the JSON is required to be flat and rectangular. I would vote -1 to leave it out due to the necessary restrictions making it not generally useful. Hope it helps, Davin On Fri, Dec 1, 2023 at 6:10 PM Nathan Bossart wrote: > On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote: > > I did a quick PoC patch (attached) -- if there interest and no hard > > objections I would like to get it up to speed for the January commitfest. > > Cool. I would expect there to be interest, given all the other JSON > support that has been added thus far. > > I noticed that, with the PoC patch, "json" is the only format that must be > quoted. Without quotes, I see a syntax error. I'm assuming there's a > conflict with another json-related rule somewhere in gram.y, but I haven't > tracked down exactly which one is causing it. > > > 1. Is supporting JSON array format sufficient, or does it need to support > > some other options? How flexible does the support scheme need to be? > > I don't presently have a strong opinion on this one. My instinct would be > start with something simple, though. I don't think we offer any special > options for log_destination... > > > 2. This only supports COPY TO and we would undoubtedly want to support > COPY > > FROM for JSON as well, but is that required from the start? > > I would vote for including COPY FROM support from the start. > > > ! if (!cstate->opts.json_mode) > > I think it's unfortunate that this further complicates the branching in > CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's > worth refactoring a bunch of stuff to make this nicer. > > -- > Nathan Bossart > Amazon Web Services: https://aws.amazon.com >