[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
[ https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618165#comment-17618165 ] Gary D. Gregory commented on CSV-254: - Hello [~lbruun] Please test git master or https://repository.apache.org/content/repositories/snapshots/, this area has recently changed for CSV-290 with https://github.com/apache/commons-csv/pull/265 > POSTGRESQL_CSV cannot parse correctly (null vs zero-length string) > -- > > Key: CSV-254 > URL: https://issues.apache.org/jira/browse/CSV-254 > Project: Commons CSV > Issue Type: Bug >Reporter: Lars Bruun-Hansen >Priority: Major > > > PostgresSQL by default creates CSV files which uses absent value for > database NULL values. Consider the following test in PostgresSQL: > {code:sql} > CREATE TABLE mytab ( > col1 varchar(10), > col2 int, > col3 varchar(10) > ); > INSERT INTO mytab VALUES ('AA', 33, null); > INSERT INTO mytab VALUES ('AA', null, ''); > INSERT INTO mytab VALUES (null, 33, 'CC'); > {code} > and then exporting it to CSV: > {code:sql} > \copy mytab TO STDOUT WITH csv > {code} > will produce CSV output as follows: > {noformat} > AA,33, > AA,,"" > ,33,CC > {noformat} > which the library currently will parse as follows using the current > {{POSTGRESQL_CSV}} format: > {noformat} > "AA","33",null > "AA",null,null > null,"33","CC" > {noformat} > Row no 2 is incorrect when comparing to the actual database table contents. > h2. The fix > Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to: > {code:java} > public static final CSVFormat POSTGRESQL_CSV = DEFAULT > .withDelimiter(COMMA) > .withEscape(DOUBLE_QUOTE_CHAR) > .withIgnoreEmptyLines(false) > .withQuote(DOUBLE_QUOTE_CHAR) > .withRecordSeparator(LF) > .withAbsentMeansNull(true) // added > .withNullString(EMPTY) > .withQuoteMode(QuoteMode.ALL_NON_NULL); > {code} > (this depends on [PR51|https://github.com/apache/commons-csv/pull/51]) > With the above the parser will give the following result instead: > {noformat} > "AA","33",null > "AA",null,"" > null,"33","CC" > {noformat} > which is the expected result. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
[ https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964633#comment-16964633 ] Lars Bruun-Hansen commented on CSV-254: --- {quote}Not quite, see https://issues.apache.org/jira/browse/CSV-253 An empty string is not the same as a {{null}}. {quote} Exactly! And this is essentially what this ticket is about. Not sure I understand the comment. ;) The {{POSTGRESQL_CSV}} format cannot properly distinguish between empty string and null while parsing. So it gets it wrong. The reason why {{POSTGRESQL_CSV}} format can't is not due as much to the format itself, but to an underlying deficiency in the Parser which is proposed fixed in [PR51|https://github.com/apache/commons-csv/pull/51] / [CSV-253|https://issues.apache.org/jira/browse/CSV-253]. Please suggest how to proceed. I'll be happy to do more work if required or change stuff in PR51or whatever. Thx. > POSTGRESQL_CSV cannot parse correctly (null vs zero-length string) > -- > > Key: CSV-254 > URL: https://issues.apache.org/jira/browse/CSV-254 > Project: Commons CSV > Issue Type: Bug >Reporter: Lars Bruun-Hansen >Priority: Major > > > PostgresSQL by default creates CSV files which uses absent value for > database NULL values. Consider the following test in PostgresSQL: > {code:sql} > CREATE TABLE mytab ( > col1 varchar(10), > col2 int, > col3 varchar(10) > ); > INSERT INTO mytab VALUES ('AA', 33, null); > INSERT INTO mytab VALUES ('AA', null, ''); > INSERT INTO mytab VALUES (null, 33, 'CC'); > {code} > and then exporting it to CSV: > {code:sql} > \copy mytab TO STDOUT WITH csv > {code} > will produce CSV output as follows: > {noformat} > AA,33, > AA,,"" > ,33,CC > {noformat} > which the library currently will parse as follows using the current > {{POSTGRESQL_CSV}} format: > {noformat} > "AA","33",null > "AA",null,null > null,"33","CC" > {noformat} > Row no 2 is incorrect when comparing to the actual database table contents. > h2. The fix > Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to: > {code:java} > public static final CSVFormat POSTGRESQL_CSV = DEFAULT > .withDelimiter(COMMA) > .withEscape(DOUBLE_QUOTE_CHAR) > .withIgnoreEmptyLines(false) > .withQuote(DOUBLE_QUOTE_CHAR) > .withRecordSeparator(LF) > .withAbsentMeansNull(true) // added > .withNullString(EMPTY) > .withQuoteMode(QuoteMode.ALL_NON_NULL); > {code} > (this depends on [PR51|https://github.com/apache/commons-csv/pull/51]) > With the above the parser will give the following result instead: > {noformat} > "AA","33",null > "AA",null,"" > null,"33","CC" > {noformat} > which is the expected result. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
[ https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963478#comment-16963478 ] Gary D. Gregory commented on CSV-254: - Not quite, see https://issues.apache.org/jira/browse/CSV-253 An empty string is not the same as a {{null}}. > POSTGRESQL_CSV cannot parse correctly (null vs zero-length string) > -- > > Key: CSV-254 > URL: https://issues.apache.org/jira/browse/CSV-254 > Project: Commons CSV > Issue Type: Bug >Reporter: Lars Bruun-Hansen >Priority: Major > > > PostgresSQL by default creates CSV files which uses absent value for > database NULL values. Consider the following test in PostgresSQL: > {code:sql} > CREATE TABLE mytab ( > col1 varchar(10), > col2 int, > col3 varchar(10) > ); > INSERT INTO mytab VALUES ('AA', 33, null); > INSERT INTO mytab VALUES ('AA', null, ''); > INSERT INTO mytab VALUES (null, 33, 'CC'); > {code} > and then exporting it to CSV: > {code:sql} > \copy mytab TO STDOUT WITH csv > {code} > will produce CSV output as follows: > {noformat} > AA,33, > AA,,"" > ,33,CC > {noformat} > which the library currently will parse as follows using the current > {{POSTGRESQL_CSV}} format: > {noformat} > "AA","33",null > "AA",null,null > null,"33","CC" > {noformat} > Row no 2 is incorrect when comparing to the actual database table contents. > h2. The fix > Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to: > {code:java} > public static final CSVFormat POSTGRESQL_CSV = DEFAULT > .withDelimiter(COMMA) > .withEscape(DOUBLE_QUOTE_CHAR) > .withIgnoreEmptyLines(false) > .withQuote(DOUBLE_QUOTE_CHAR) > .withRecordSeparator(LF) > .withAbsentMeansNull(true) // added > .withNullString(EMPTY) > .withQuoteMode(QuoteMode.ALL_NON_NULL); > {code} > (this depends on [PR51|https://github.com/apache/commons-csv/pull/51]) > With the above the parser will give the following result instead: > {noformat} > "AA","33",null > "AA",null,"" > null,"33","CC" > {noformat} > which is the expected result. -- This message was sent by Atlassian Jira (v8.3.4#803005)