Re: Emitting JSON to file using COPY TO

2023-12-05 Thread Davin Shearer
> Am I understanding something incorrectly?

No, you've got it.  You already covered the concerns there.

> That seems quite absurd, TBH. I know we've catered for some absurdity in
> the CSV code (much of it down to me), so maybe we need to be liberal in
> what we accept here too. IMNSHO, we should produce either a single JSON
> document (the ARRAY case) or a series of JSON documents, one per row
> (the LINES case).

For what it's worth, I agree with Andrew on this.  I also agree with COPY
FROM allowing for potentially bogus commas at the end of non-arrays for
interop with other products, but to not do that in COPY TO (unless there is
some real compelling case to do so).  Emitting bogus JSON (non-array with
commas) feels wrong and would be nice to not perpetuate that, if possible.

Thanks again for doing this.  If I can be of any help, let me know.
If\When this makes it into the production product, I'll be using this
feature for sure.

-Davin


Re: Emitting JSON to file using COPY TO

2023-12-05 Thread Davin Shearer
Hi Joe,

In reviewing the 005 patch, I think that when used with FORCE ARRAY, we
should also _imply_ FORCE ROW DELIMITER.  I can't envision a use case where
someone would want to use FORCE ARRAY without also using FORCE ROW
DELIMITER.  I can, however, envision a use case where someone would want
FORCE ROW DELIMITER without FORCE ARRAY, like maybe including into a larger
array.  I definitely appreciate these options and the flexibility that they
afford from a user perspective.

In the test output, will you also show the different variations with FORCE
ARRAY and FORCE ROW DELIMITER => {(false, false), (true, false), (false,
true), (true, true)}?  Technically you've already shown me the (false,
false) case as those are the defaults.

Thanks!


Re: Emitting JSON to file using COPY TO

2023-12-05 Thread Davin Shearer
Thanks for the wayback machine link Andrew.  I read it, understood it, and
will comply.

Joe, those test cases look great and the outputs are the same as `jq`.

As for forward slashes being escaped, I found this:
https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped
.

Forward slash escaping is optional, so not escaping them in Postgres is
okay.   The important thing is that the software _reading_ JSON interprets
both '\/' and '/' as '/'.


Re: Emitting JSON to file using COPY TO

2023-12-04 Thread Davin Shearer
Sorry about the top posting / top quoting... the link you sent me gives me
a 404.  I'm not exactly sure what top quoting / posting means and Googling
those terms wasn't helpful for me, but I've removed the quoting that my
mail client is automatically "helpfully" adding to my emails.  I mean no
offense.

Okay, digging in more...

If the value contains text that has BOMs [footnote 1] in it, it must be
preserved (the database doesn't need to interpret them or do anything
special with them - just store it and fetch it).  There are however a few
characters that need to be escaped (per
https://www.w3docs.com/snippets/java/how-should-i-escape-strings-in-json.html)
so that the JSON format isn't broken.  They are:


   1. " (double quote)
   2. \ (backslash)
   3. / (forward slash)
   4. \b (backspace)
   5. \f (form feed)
   6. \n (new line)
   7. \r (carriage return)
   8. \t (horizontal tab)

These characters should be represented in the test cases to see how the
escaping behaves and to ensure that the escaping is done properly per JSON
requirements.  Forward slash comes as a bit of a surprise to me, but `jq`
handles it either way:

➜ echo '{"key": "this / is a forward slash"}' | jq .
{
  "key": "this / is a forward slash"
}
➜ echo '{"key": "this \/ is a forward slash"}' | jq .
{
  "key": "this / is a forward slash"
}

Hope it helps, and thank you!

1. I don't disagree that BOMs shouldn't be used for UTF-8, but I'm also
processing UTF-16{BE,LE} and UTF-32{BE,LE} (as well as other textural
formats that are neither ASCII or Unicode).  I don't have the luxury of
changing the data that is given.


Re: Emitting JSON to file using COPY TO

2023-12-04 Thread Davin Shearer
Looking great!

For testing, in addition to the quotes, include DOS and Unix EOL, \ and /,
Byte Order Markers, and mulitbyte characters like UTF-8.

Essentially anything considered textural is fair game to be a value.

On Mon, Dec 4, 2023, 10:46 Joe Conway  wrote:

> On 12/4/23 09:25, Andrew Dunstan wrote:
> >
> > On 2023-12-04 Mo 08:37, Joe Conway wrote:
> >> On 12/4/23 07:41, Andrew Dunstan wrote:
> >>>
> >>> On 2023-12-03 Su 20:14, Joe Conway wrote:
> >>>> (please don't top quote on the Postgres lists)
> >>>>
> >>>> On 12/3/23 17:38, Davin Shearer wrote:
> >>>>> " being quoted as \\" breaks the JSON. It needs to be \".  This has
> >>>>> been my whole problem with COPY TO for JSON.
> >>>>>
> >>>>> Please validate that the output is in proper format with correct
> >>>>> quoting for special characters. I use `jq` on the command line to
> >>>>> validate and format the output.
> >>>>
> >>>> I just hooked existing "row-to-json machinery" up to the "COPY TO"
> >>>> statement. If the output is wrong (just for for this use case?),
> >>>> that would be a missing feature (or possibly a bug?).
> >>>>
> >>>> Davin -- how did you work around the issue with the way the built in
> >>>> functions output JSON?
> >>>>
> >>>> Andrew -- comments/thoughts?
> >>>
> >>> I meant to mention this when I was making comments yesterday.
> >>>
> >>> The patch should not be using CopyAttributeOutText - it will try to
> >>> escape characters such as \, which produces the effect complained of
> >>> here, or else we need to change its setup so we have a way to inhibit
> >>> that escaping.
> >>
> >>
> >> Interesting.
> >>
> >> I am surprised this has never been raised as a problem with COPY TO
> >> before.
> >>
> >> Should the JSON output, as produced by composite_to_json(), be sent
> >> as-is with no escaping at all? If yes, is JSON somehow unique in this
> >> regard?
> >
> >
> > Text mode output is in such a form that it can be read back in using
> > text mode input. There's nothing special about JSON in this respect -
> > any text field will be escaped too. But output suitable for text mode
> > input is not what you're trying to produce here; you're trying to
> > produce valid JSON.
> >
> > So, yes, the result of composite_to_json, which is already suitably
> > escaped, should not be further escaped in this case.
>
> Gotcha.
>
> This patch version uses CopySendData() instead and includes
> documentation changes. Still lacks regression tests.
>
> Hopefully this looks better. Any other particular strings I ought to
> test with?
>
> 8<--
> test=# copy (select * from foo limit 4) to stdout (format json,
> force_array true);
> [
>   {"id":1,"f1":"line with \" in it:
> 1","f2":"2023-12-03T12:26:41.596053-05:00"}
> ,{"id":2,"f1":"line with ' in it:
> 2","f2":"2023-12-03T12:26:41.596173-05:00"}
> ,{"id":3,"f1":"line with \" in it:
> 3","f2":"2023-12-03T12:26:41.596179-05:00"}
> ,{"id":4,"f1":"line with ' in it:
> 4","f2":"2023-12-03T12:26:41.596182-05:00"}
> ]
> 8<--
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>


Re: Emitting JSON to file using COPY TO

2023-12-03 Thread Davin Shearer
I worked around it by using select json_agg(t)... and redirecting it to
file via psql on the command line. COPY TO was working until we ran into
broken JSON and discovered the double quoting issue due to some values
containing " in them.


Re: Emitting JSON to file using COPY TO

2023-12-03 Thread Davin Shearer
" being quoted as \\" breaks the JSON. It needs to be \".  This has been my
whole problem with COPY TO for JSON.

Please validate that the output is in proper format with correct quoting
for special characters. I use `jq` on the command line to validate and
format the output.

On Sun, Dec 3, 2023, 10:51 Joe Conway  wrote:

> On 12/3/23 10:31, Davin Shearer wrote:
> > Please be sure to include single and double quotes in the test values
> > since that was the original problem (double quoting in COPY TO breaking
> > the JSON syntax).
>
> test=# copy (select * from foo limit 4) to stdout (format json);
> {"id":2456092,"f1":"line with ' in it:
> 2456092","f2":"2023-12-03T10:44:40.9712-05:00"}
> {"id":2456093,"f1":"line with \\" in it:
> 2456093","f2":"2023-12-03T10:44:40.971221-05:00"}
> {"id":2456094,"f1":"line with ' in it:
> 2456094","f2":"2023-12-03T10:44:40.971225-05:00"}
> {"id":2456095,"f1":"line with \\" in it:
> 2456095","f2":"2023-12-03T10:44:40.971228-05:00"}
>
> --
> Joe Conway
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>
>


Re: Emitting JSON to file using COPY TO

2023-12-03 Thread Davin Shearer
Please be sure to include single and double quotes in the test values since
that was the original problem (double quoting in COPY TO breaking the JSON
syntax).

On Sun, Dec 3, 2023, 10:11 Andrew Dunstan  wrote:

>
> On 2023-12-01 Fr 14:28, Joe Conway wrote:
> > On 11/29/23 10:32, Davin Shearer wrote:
> >> Thanks for the responses everyone.
> >>
> >> I worked around the issue using the `psql -tc` method as Filip
> >> described.
> >>
> >> I think it would be great to support writing JSON using COPY TO at
> >> some point so I can emit JSON to files using a PostgreSQL function
> >> directly.
> >>
> >> -Davin
> >>
> >> On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák  >> <mailto:fi...@sedlakovi.org>> wrote:
> >>
> >> This would be a very special case for COPY. It applies only to a
> >> single
> >> column of JSON values. The original problem can be solved with psql
> >> --tuples-only as David wrote earlier.
> >>
> >>
> >> $ psql -tc 'select json_agg(row_to_json(t))
> >>from (select * from public.tbl_json_test) t;'
> >>
> >>[{"id":1,"t_test":"here's a \"string\""}]
> >>
> >>
> >> Special-casing any encoding/escaping scheme leads to bugs and harder
> >> parsing.
> >
> > (moved to hackers)
> >
> > I did a quick PoC patch (attached) -- if there interest and no hard
> > objections I would like to get it up to speed for the January commitfest.
> >
> > Currently the patch lacks documentation and regression test support.
> >
> > Questions:
> > --
> > 1. Is supporting JSON array format sufficient, or does it need to
> > support some other options? How flexible does the support scheme need
> > to be?
> >
> > 2. This only supports COPY TO and we would undoubtedly want to support
> > COPY FROM for JSON as well, but is that required from the start?
> >
> > Thanks for any feedback.
>
>
> I  realize this is just a POC, but I'd prefer to see composite_to_json()
> not exposed. You could use the already public datum_to_json() instead,
> passing JSONTYPE_COMPOSITE and F_RECORD_OUT as the second and third
> arguments.
>
> I think JSON array format is sufficient.
>
> I can see both sides of the COPY FROM argument, but I think insisting on
> that makes this less doable for release 17. On balance I would stick to
> COPY TO for now.
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


Re: Emitting JSON to file using COPY TO

2023-12-01 Thread Davin Shearer
I'm really glad to see this taken up as a possible new feature and will
definitely use it if it gets released.  I'm impressed with how clean,
understandable, and approachable the postgres codebase is in general and
how easy it is to read and understand this patch.

I reviewed the patch (though I didn't build and test the code) and have a
concern with adding the '[' at the beginning and ']' at the end of the json
output.  Those are already added by `json_agg` (
https://www.postgresql.org/docs/current/functions-aggregate.html) as you
can see in my initial email.  Adding them in the COPY TO may be redundant
(e.g., [[{"key":"value"...}]]).

I think COPY TO makes good sense to support, though COPY FROM maybe not so
much as JSON isn't necessarily flat and rectangular like CSV.

For my use-case, I'm emitting JSON files to Apache NiFi for processing, and
NiFi has superior handling of JSON (via JOLT parsers) versus CSV where
parsing is generally done with regex.  I want to be able to emit JSON using
a postgres function and thus COPY TO.

Definitely +1 for COPY TO.

I don't think COPY FROM will work out well unless the JSON is required to
be flat and rectangular.  I would vote -1 to leave it out due to the
necessary restrictions making it not generally useful.

Hope it helps,
Davin

On Fri, Dec 1, 2023 at 6:10 PM Nathan Bossart 
wrote:

> On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:
> > I did a quick PoC patch (attached) -- if there interest and no hard
> > objections I would like to get it up to speed for the January commitfest.
>
> Cool.  I would expect there to be interest, given all the other JSON
> support that has been added thus far.
>
> I noticed that, with the PoC patch, "json" is the only format that must be
> quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
> conflict with another json-related rule somewhere in gram.y, but I haven't
> tracked down exactly which one is causing it.
>
> > 1. Is supporting JSON array format sufficient, or does it need to support
> > some other options? How flexible does the support scheme need to be?
>
> I don't presently have a strong opinion on this one.  My instinct would be
> start with something simple, though.  I don't think we offer any special
> options for log_destination...
>
> > 2. This only supports COPY TO and we would undoubtedly want to support
> COPY
> > FROM for JSON as well, but is that required from the start?
>
> I would vote for including COPY FROM support from the start.
>
> > ! if (!cstate->opts.json_mode)
>
> I think it's unfortunate that this further complicates the branching in
> CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
> worth refactoring a bunch of stuff to make this nicer.
>
> --
> Nathan Bossart
> Amazon Web Services: https://aws.amazon.com
>