[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes

2019-06-14 Thread GitBox
aherbert commented on a change in pull request #44: Post release fixes
URL: https://github.com/apache/commons-csv/pull/44#discussion_r293838167
 
 

 ##
 File path: src/main/java/org/apache/commons/csv/CSVParser.java
 ##
 @@ -495,13 +495,14 @@ private Headers createHeaders() throws IOException {
 if (headerRecord != null) {
 for (int i = 0; i < headerRecord.length; i++) {
 final String header = headerRecord[i];
-final boolean containsHeader = header == null ? false : 
hdrMap.containsKey(header);
+final boolean containsHeader = header != null && 
hdrMap.containsKey(header);
 final boolean emptyHeader = header == null || 
header.trim().isEmpty();
 if (containsHeader) {
 if (!emptyHeader && 
!this.format.getAllowDuplicateHeaderNames()) {
 throw new IllegalArgumentException(
 String.format(
-"The header contains a duplicate name: 
\"%s\" in %s. If this is valid then use 
CSVFormat.withAllowDuplicateHeaderNames().",
+"The header contains a duplicate name: 
\"%s\" in %s. " +
+"If this is valid then use 
CSVFormat.withAllowDuplicateHeaderNames().",
 
 Review comment:
   I've looked at this again and the logic is a bit convoluted:
   
   The `containsHeader` flag is set when the header is already in the header 
map.
   The `emptyHeader flag is set when the header is null or only whitespace. 
   
   Then there is a check on the `containsHeader` flag. If true then this has 
been seen before and some cases are tested.
   An exception is thrown if the header is not empty and settings do not allow 
duplicates. This means that empty is an allowed duplicate.
   An exception is thrown if the header is empty and settings do not allow 
missing column names. But this can only be thrown if the header has been seen 
before (is in the header map).
   
   This means that a single missing column name would be allowed if the 
settings allowed duplicates but not missing column names. This does not seem 
right. If missing column names are not allowed then this should always be 
checked irrespective of what is in the header map. The unit tests only checks 
for two missing column names:
   
   ```
   // This passes as the exception is thrown
   @Test(expected = IllegalArgumentException.class)
   public void testHeadersMissingException() throws Exception {
   final Reader in = new StringReader("a,,c,,d\n1,2,3,4\nx,y,z,zz");
   CSVFormat.DEFAULT.withHeader().parse(in).iterator();
   }
   
   // This fails to throw an exception.
   // Only 1 column name is missing and this is not recognised as a problem.
   @Test(expected = IllegalArgumentException.class)
   public void testHeadersMissing1ColumnException() throws Exception {
   final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz");
   CSVFormat.DEFAULT.withHeader().parse(in).iterator();
   }
   ```
   
   This seems to be a bug so should be under a Jira ticket.
   
   You also only add to the header map when the header is not null. But if you 
are allowing missing column names then a null is a valid header and your list 
of header names does not match the order of the header record. Thus 
getHeaderNames() will not return a list of the correct size (number of 
columns). So perhaps `""` should be added to the list of header names. This is 
a change outside the scope of this PR, and since I am not familiar with what 
the header names are meant to achieve perhaps this is intended (i.e. only 
return a list of header names that are not null).
   
   A final issue is that the header map exists to convert a column header into 
an index. If you are allowing duplicates then the mapping is one to many and 
the map may not function as expected when trying to obtain the column index 
using a column header. Perhaps this hole in functionality should be documented 
in the `getHeaderMap` javadoc, i.e. the map of column name to column index 
cannot represent a one to many relationship when duplicate column names are 
present. This could be solved with user beware documentation that is this case 
the map will contain the index of the last matching column with that header if 
duplicates have been allowed. Or more drastically by controlling the Map with a 
specialisation that does special behaviour when a request with a duplicate 
column name occurs: throws an exception; returns as array of all matching 
columns, etc. Again outside the scope of this PR.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:

[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes

2019-06-14 Thread GitBox
aherbert commented on a change in pull request #44: Post release fixes
URL: https://github.com/apache/commons-csv/pull/44#discussion_r293785511
 
 

 ##
 File path: pom.xml
 ##
 @@ -153,8 +153,9 @@ CSV files of various types.
 
 3.0.0
 
${basedir}/LICENSE-header.txt
-LICENSE.txt, 
NOTICE.txt
+LICENSE.txt, NOTICE.txt, 
**/maven-archiver/pom.properties
 
 Review comment:
   AFAICS there is no checkstyle configuration in commons-parent, or PMD. I 
thought it was just up to projects to configure as appropriate. Using the maven 
plugin property `checkstyle.resourceExcludes` is not done on other commons 
projects I know. These use a `` section and `` 
tag. There is more than one way to skin a cat.
   
   Are you suggesting to add profiles like those for other optional plugins 
such as jacoco? Each project is of a different age and coding style. So a 
standard checkstyle probably would not work as is and a transition period to a 
common style would be needed. A common style would certainly make browsing 
source from different projects easier. I'd say the discussion on this is 
outside the scope of this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes

2019-06-14 Thread GitBox
aherbert commented on a change in pull request #44: Post release fixes
URL: https://github.com/apache/commons-csv/pull/44#discussion_r293781762
 
 

 ##
 File path: src/main/java/org/apache/commons/csv/CSVParser.java
 ##
 @@ -495,13 +495,14 @@ private Headers createHeaders() throws IOException {
 if (headerRecord != null) {
 for (int i = 0; i < headerRecord.length; i++) {
 final String header = headerRecord[i];
-final boolean containsHeader = header == null ? false : 
hdrMap.containsKey(header);
+final boolean containsHeader = header != null && 
hdrMap.containsKey(header);
 final boolean emptyHeader = header == null || 
header.trim().isEmpty();
 if (containsHeader) {
 if (!emptyHeader && 
!this.format.getAllowDuplicateHeaderNames()) {
 throw new IllegalArgumentException(
 String.format(
-"The header contains a duplicate name: 
\"%s\" in %s. If this is valid then use 
CSVFormat.withAllowDuplicateHeaderNames().",
+"The header contains a duplicate name: 
\"%s\" in %s. " +
+"If this is valid then use 
CSVFormat.withAllowDuplicateHeaderNames().",
 
 Review comment:
   OK. This was just to fix checkstyle line length and allow checkstyle:check 
to complete. I can put in a checkstyle exclusion instead. But if targeted to a 
specific line it is a pain to maintain in-line with the code. Targeted to a 
method then the scope of checking is often too wide. I could move these two 
checks within the `if (containsHeader)` statement to methods:
   ```
   allowDuplicateHeaderNameOrThrow(boolean emptyHeader)
   allowMissingColumnNameOrThrow(boolean emptyHeader)
   ```
   The line length violation will be in a single short method that can then be 
allowed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes

2019-06-14 Thread GitBox
aherbert commented on a change in pull request #44: Post release fixes
URL: https://github.com/apache/commons-csv/pull/44#discussion_r29319
 
 

 ##
 File path: src/test/java/org/apache/commons/csv/CSVPrinterTest.java
 ##
 @@ -1471,4 +1471,31 @@ public void testTrimOnTwoColumns() throws IOException {
 return CSVParser.parse(expected, format).getRecords().get(0).values();
 }
 
+@Test
+public void testPrintReaderWithoutQuoteToWriter() throws IOException {
+// Test to target the use of IOUtils::copyLarge.
+// Requires the format to have no quote or escape character,
+// value to be a java.io.Reader and the output to be a java.io.Writer.
+final StringWriter sw = new StringWriter();
+final String content = "testValue";
+try (final CSVPrinter printer = new CSVPrinter(sw, 
CSVFormat.DEFAULT.withQuote(null))) {
+final StringReader value = new StringReader(content);
+printer.print(value);
+}
+assertEquals(content, sw.toString());
+}
+
+@Test
+public void testPrintReaderWithoutQuoteToAppendable() throws IOException {
+// Test to target the use of IOUtils::copy.
 
 Review comment:
   Sure. None of the other methods had Javadoc and when I initially added it as 
a Javadoc it was more lines than the 3 lines inside the method. It looked odd. 
I'll change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services