[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes
aherbert commented on a change in pull request #44: Post release fixes URL: https://github.com/apache/commons-csv/pull/44#discussion_r293838167 ## File path: src/main/java/org/apache/commons/csv/CSVParser.java ## @@ -495,13 +495,14 @@ private Headers createHeaders() throws IOException { if (headerRecord != null) { for (int i = 0; i < headerRecord.length; i++) { final String header = headerRecord[i]; -final boolean containsHeader = header == null ? false : hdrMap.containsKey(header); +final boolean containsHeader = header != null && hdrMap.containsKey(header); final boolean emptyHeader = header == null || header.trim().isEmpty(); if (containsHeader) { if (!emptyHeader && !this.format.getAllowDuplicateHeaderNames()) { throw new IllegalArgumentException( String.format( -"The header contains a duplicate name: \"%s\" in %s. If this is valid then use CSVFormat.withAllowDuplicateHeaderNames().", +"The header contains a duplicate name: \"%s\" in %s. " + +"If this is valid then use CSVFormat.withAllowDuplicateHeaderNames().", Review comment: I've looked at this again and the logic is a bit convoluted: The `containsHeader` flag is set when the header is already in the header map. The `emptyHeader flag is set when the header is null or only whitespace. Then there is a check on the `containsHeader` flag. If true then this has been seen before and some cases are tested. An exception is thrown if the header is not empty and settings do not allow duplicates. This means that empty is an allowed duplicate. An exception is thrown if the header is empty and settings do not allow missing column names. But this can only be thrown if the header has been seen before (is in the header map). This means that a single missing column name would be allowed if the settings allowed duplicates but not missing column names. This does not seem right. If missing column names are not allowed then this should always be checked irrespective of what is in the header map. The unit tests only checks for two missing column names: ``` // This passes as the exception is thrown @Test(expected = IllegalArgumentException.class) public void testHeadersMissingException() throws Exception { final Reader in = new StringReader("a,,c,,d\n1,2,3,4\nx,y,z,zz"); CSVFormat.DEFAULT.withHeader().parse(in).iterator(); } // This fails to throw an exception. // Only 1 column name is missing and this is not recognised as a problem. @Test(expected = IllegalArgumentException.class) public void testHeadersMissing1ColumnException() throws Exception { final Reader in = new StringReader("a,,c,d\n1,2,3,4\nx,y,z,zz"); CSVFormat.DEFAULT.withHeader().parse(in).iterator(); } ``` This seems to be a bug so should be under a Jira ticket. You also only add to the header map when the header is not null. But if you are allowing missing column names then a null is a valid header and your list of header names does not match the order of the header record. Thus getHeaderNames() will not return a list of the correct size (number of columns). So perhaps `""` should be added to the list of header names. This is a change outside the scope of this PR, and since I am not familiar with what the header names are meant to achieve perhaps this is intended (i.e. only return a list of header names that are not null). A final issue is that the header map exists to convert a column header into an index. If you are allowing duplicates then the mapping is one to many and the map may not function as expected when trying to obtain the column index using a column header. Perhaps this hole in functionality should be documented in the `getHeaderMap` javadoc, i.e. the map of column name to column index cannot represent a one to many relationship when duplicate column names are present. This could be solved with user beware documentation that is this case the map will contain the index of the last matching column with that header if duplicates have been allowed. Or more drastically by controlling the Map with a specialisation that does special behaviour when a request with a duplicate column name occurs: throws an exception; returns as array of all matching columns, etc. Again outside the scope of this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at:
[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes
aherbert commented on a change in pull request #44: Post release fixes URL: https://github.com/apache/commons-csv/pull/44#discussion_r293785511 ## File path: pom.xml ## @@ -153,8 +153,9 @@ CSV files of various types. 3.0.0 ${basedir}/LICENSE-header.txt -LICENSE.txt, NOTICE.txt +LICENSE.txt, NOTICE.txt, **/maven-archiver/pom.properties Review comment: AFAICS there is no checkstyle configuration in commons-parent, or PMD. I thought it was just up to projects to configure as appropriate. Using the maven plugin property `checkstyle.resourceExcludes` is not done on other commons projects I know. These use a `` section and `` tag. There is more than one way to skin a cat. Are you suggesting to add profiles like those for other optional plugins such as jacoco? Each project is of a different age and coding style. So a standard checkstyle probably would not work as is and a transition period to a common style would be needed. A common style would certainly make browsing source from different projects easier. I'd say the discussion on this is outside the scope of this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes
aherbert commented on a change in pull request #44: Post release fixes URL: https://github.com/apache/commons-csv/pull/44#discussion_r293781762 ## File path: src/main/java/org/apache/commons/csv/CSVParser.java ## @@ -495,13 +495,14 @@ private Headers createHeaders() throws IOException { if (headerRecord != null) { for (int i = 0; i < headerRecord.length; i++) { final String header = headerRecord[i]; -final boolean containsHeader = header == null ? false : hdrMap.containsKey(header); +final boolean containsHeader = header != null && hdrMap.containsKey(header); final boolean emptyHeader = header == null || header.trim().isEmpty(); if (containsHeader) { if (!emptyHeader && !this.format.getAllowDuplicateHeaderNames()) { throw new IllegalArgumentException( String.format( -"The header contains a duplicate name: \"%s\" in %s. If this is valid then use CSVFormat.withAllowDuplicateHeaderNames().", +"The header contains a duplicate name: \"%s\" in %s. " + +"If this is valid then use CSVFormat.withAllowDuplicateHeaderNames().", Review comment: OK. This was just to fix checkstyle line length and allow checkstyle:check to complete. I can put in a checkstyle exclusion instead. But if targeted to a specific line it is a pain to maintain in-line with the code. Targeted to a method then the scope of checking is often too wide. I could move these two checks within the `if (containsHeader)` statement to methods: ``` allowDuplicateHeaderNameOrThrow(boolean emptyHeader) allowMissingColumnNameOrThrow(boolean emptyHeader) ``` The line length violation will be in a single short method that can then be allowed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-csv] aherbert commented on a change in pull request #44: Post release fixes
aherbert commented on a change in pull request #44: Post release fixes URL: https://github.com/apache/commons-csv/pull/44#discussion_r29319 ## File path: src/test/java/org/apache/commons/csv/CSVPrinterTest.java ## @@ -1471,4 +1471,31 @@ public void testTrimOnTwoColumns() throws IOException { return CSVParser.parse(expected, format).getRecords().get(0).values(); } +@Test +public void testPrintReaderWithoutQuoteToWriter() throws IOException { +// Test to target the use of IOUtils::copyLarge. +// Requires the format to have no quote or escape character, +// value to be a java.io.Reader and the output to be a java.io.Writer. +final StringWriter sw = new StringWriter(); +final String content = "testValue"; +try (final CSVPrinter printer = new CSVPrinter(sw, CSVFormat.DEFAULT.withQuote(null))) { +final StringReader value = new StringReader(content); +printer.print(value); +} +assertEquals(content, sw.toString()); +} + +@Test +public void testPrintReaderWithoutQuoteToAppendable() throws IOException { +// Test to target the use of IOUtils::copy. Review comment: Sure. None of the other methods had Javadoc and when I initially added it as a Javadoc it was more lines than the 3 lines inside the method. It looked odd. I'll change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services