[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)

2022-10-15 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618165#comment-17618165
 ] 

Gary D. Gregory commented on CSV-254:
-

Hello [~lbruun] 

Please test git master or 
https://repository.apache.org/content/repositories/snapshots/, this area has 
recently changed for CSV-290 with https://github.com/apache/commons-csv/pull/265

> POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
> --
>
> Key: CSV-254
> URL: https://issues.apache.org/jira/browse/CSV-254
> Project: Commons CSV
>  Issue Type: Bug
>Reporter: Lars Bruun-Hansen
>Priority: Major
>
>  
>  PostgresSQL by default creates CSV files which uses absent value for 
> database NULL values. Consider the following test in PostgresSQL:
> {code:sql}
> CREATE TABLE mytab (
>   col1   varchar(10),
>   col2   int,
>   col3   varchar(10)
> );
> INSERT INTO mytab VALUES ('AA', 33, null);
> INSERT INTO mytab VALUES ('AA', null, '');
> INSERT INTO mytab VALUES (null, 33, 'CC');
> {code}
> and then exporting it to CSV:
> {code:sql}
> \copy mytab TO STDOUT WITH csv
> {code}
> will produce CSV output as follows:
> {noformat}
> AA,33,
> AA,,""
> ,33,CC
> {noformat}
> which the library currently will parse as follows using the current 
> {{POSTGRESQL_CSV}} format:
> {noformat}
> "AA","33",null
> "AA",null,null
> null,"33","CC"
> {noformat}
> Row no 2 is incorrect when comparing to the actual database table contents.
> h2. The fix
> Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to:
> {code:java}
> public static final CSVFormat POSTGRESQL_CSV = DEFAULT
> .withDelimiter(COMMA)
> .withEscape(DOUBLE_QUOTE_CHAR)
> .withIgnoreEmptyLines(false)
> .withQuote(DOUBLE_QUOTE_CHAR)
> .withRecordSeparator(LF)
> .withAbsentMeansNull(true)   // added
> .withNullString(EMPTY)
> .withQuoteMode(QuoteMode.ALL_NON_NULL);
> {code}
> (this depends on [PR51|https://github.com/apache/commons-csv/pull/51])
>  With the above the parser will give the following result instead: 
> {noformat}
> "AA","33",null
> "AA",null,""
> null,"33","CC"
> {noformat}
> which is the expected result.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)

2019-10-31 Thread Lars Bruun-Hansen (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964633#comment-16964633
 ] 

Lars Bruun-Hansen commented on CSV-254:
---

 
{quote}Not quite, see https://issues.apache.org/jira/browse/CSV-253
 An empty string is not the same as a {{null}}.
{quote}
Exactly! And this is essentially what this ticket is about. Not sure I 
understand the comment. ;)

The {{POSTGRESQL_CSV}} format cannot properly distinguish between empty string 
and null while parsing. So it gets it wrong. The reason why {{POSTGRESQL_CSV}} 
format can't is not due as much to the format itself, but to an underlying 
deficiency in the Parser which is proposed fixed in 
[PR51|https://github.com/apache/commons-csv/pull/51] / 
[CSV-253|https://issues.apache.org/jira/browse/CSV-253].

Please suggest how to proceed. I'll be happy to do more work if required or 
change stuff in PR51or whatever.

Thx.



> POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
> --
>
> Key: CSV-254
> URL: https://issues.apache.org/jira/browse/CSV-254
> Project: Commons CSV
>  Issue Type: Bug
>Reporter: Lars Bruun-Hansen
>Priority: Major
>
>  
>  PostgresSQL by default creates CSV files which uses absent value for 
> database NULL values. Consider the following test in PostgresSQL:
> {code:sql}
> CREATE TABLE mytab (
>   col1   varchar(10),
>   col2   int,
>   col3   varchar(10)
> );
> INSERT INTO mytab VALUES ('AA', 33, null);
> INSERT INTO mytab VALUES ('AA', null, '');
> INSERT INTO mytab VALUES (null, 33, 'CC');
> {code}
> and then exporting it to CSV:
> {code:sql}
> \copy mytab TO STDOUT WITH csv
> {code}
> will produce CSV output as follows:
> {noformat}
> AA,33,
> AA,,""
> ,33,CC
> {noformat}
> which the library currently will parse as follows using the current 
> {{POSTGRESQL_CSV}} format:
> {noformat}
> "AA","33",null
> "AA",null,null
> null,"33","CC"
> {noformat}
> Row no 2 is incorrect when comparing to the actual database table contents.
> h2. The fix
> Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to:
> {code:java}
> public static final CSVFormat POSTGRESQL_CSV = DEFAULT
> .withDelimiter(COMMA)
> .withEscape(DOUBLE_QUOTE_CHAR)
> .withIgnoreEmptyLines(false)
> .withQuote(DOUBLE_QUOTE_CHAR)
> .withRecordSeparator(LF)
> .withAbsentMeansNull(true)   // added
> .withNullString(EMPTY)
> .withQuoteMode(QuoteMode.ALL_NON_NULL);
> {code}
> (this depends on [PR51|https://github.com/apache/commons-csv/pull/51])
>  With the above the parser will give the following result instead: 
> {noformat}
> "AA","33",null
> "AA",null,""
> null,"33","CC"
> {noformat}
> which is the expected result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CSV-254) POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)

2019-10-30 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963478#comment-16963478
 ] 

Gary D. Gregory commented on CSV-254:
-

Not quite, see https://issues.apache.org/jira/browse/CSV-253
An empty string is not the same as a {{null}}.

> POSTGRESQL_CSV cannot parse correctly (null vs zero-length string)
> --
>
> Key: CSV-254
> URL: https://issues.apache.org/jira/browse/CSV-254
> Project: Commons CSV
>  Issue Type: Bug
>Reporter: Lars Bruun-Hansen
>Priority: Major
>
>  
>  PostgresSQL by default creates CSV files which uses absent value for 
> database NULL values. Consider the following test in PostgresSQL:
> {code:sql}
> CREATE TABLE mytab (
>   col1   varchar(10),
>   col2   int,
>   col3   varchar(10)
> );
> INSERT INTO mytab VALUES ('AA', 33, null);
> INSERT INTO mytab VALUES ('AA', null, '');
> INSERT INTO mytab VALUES (null, 33, 'CC');
> {code}
> and then exporting it to CSV:
> {code:sql}
> \copy mytab TO STDOUT WITH csv
> {code}
> will produce CSV output as follows:
> {noformat}
> AA,33,
> AA,,""
> ,33,CC
> {noformat}
> which the library currently will parse as follows using the current 
> {{POSTGRESQL_CSV}} format:
> {noformat}
> "AA","33",null
> "AA",null,null
> null,"33","CC"
> {noformat}
> Row no 2 is incorrect when comparing to the actual database table contents.
> h2. The fix
> Therefore the declaration of {{POSTGRESQL_CSV}} must be changed to:
> {code:java}
> public static final CSVFormat POSTGRESQL_CSV = DEFAULT
> .withDelimiter(COMMA)
> .withEscape(DOUBLE_QUOTE_CHAR)
> .withIgnoreEmptyLines(false)
> .withQuote(DOUBLE_QUOTE_CHAR)
> .withRecordSeparator(LF)
> .withAbsentMeansNull(true)   // added
> .withNullString(EMPTY)
> .withQuoteMode(QuoteMode.ALL_NON_NULL);
> {code}
> (this depends on [PR51|https://github.com/apache/commons-csv/pull/51])
>  With the above the parser will give the following result instead: 
> {noformat}
> "AA","33",null
> "AA",null,""
> null,"33","CC"
> {noformat}
> which is the expected result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)