On Sun, Oct 6, 2024, at 15:12, Andrew Dunstan wrote:
On 2024-10-04 Fr 12:19 PM, Joel Jacobson wrote:
2. Avoid needing hacks like using E'\x01' as quoting char.
Introduce QUOTE NONE and DELIMITER NONE,
to allow raw lines to be imported "as is" into a single text column.
As I think I previously indicated, I'm perfectly happy about 2, because
it replaces a far from obvious hack, but I am at best dubious about 1.
I've looked at how to implement this, and there is quite a lot of
complexity
having to do with quoting and escaping.
Need guidance on what you think would be best to do:
2a) Should we aim to support all NONE combinations, at the cost of
increasing the
complexity at all code having to do with quoting, escaping and
delimiters?
2b) Should we aim to only support the QUOTE NONE DELIMITER NONE
ESCAPE NONE case,
useful to the real-life scenario we've identified, that is, importing
raw log
lines into a single column, which could then be handed by a much
simpler and
probably faster version of CopyReadAttributesCSV(),
e.g. named CopyReadAttributesUnquotedUnDelimited() or
maybe CopyReadAttributesRaw()?
(We also need to modify CopyReadLineText(), but seems we only need a
quote_none bool, to skip over the quoting code there, so don't think a
separate function is warranted there.)
I think ESCAPE NONE should be implied from QUOTE NONE, since the
default escape
character is the same as the quote character, so if there isn't any
quote character, then I think that would imply no escape character
either.
Can we think of any other valid, useful, realistic, and safe
combinations of
QUOTE NONE, DELIMITER NONE and ESCAPE NONE, that would be interesting
to support?
If not, then I think 2b looks more interesting, to reduce risk of
accidental
misuse, simpler implementation, and since it also should allow importing
raw log files faster, thanks to the reduced complexity.