[ https://issues.apache.org/jira/browse/NIFI-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Payne updated NIFI-5525: ----------------------------- Resolution: Fixed Fix Version/s: 1.8.0 Status: Resolved (was: Patch Available) > CSVRecordReader fails with StringIndexOutOfBoundsException when field is a > double quote > --------------------------------------------------------------------------------------- > > Key: NIFI-5525 > URL: https://issues.apache.org/jira/browse/NIFI-5525 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.7.1 > Reporter: Vadim > Priority: Major > Labels: easyfix, pull-request-available > Fix For: 1.8.0 > > > *Bug description:* > When trying to parse a CSV file given in RFC4180 format and one of its fields > is a double quote, CSVRecordReader fails with the following exception: > {quote}java.lang.StringIndexOutOfBoundsException: String index out of range: > -1 > at java.lang.String.substring(String.java:1967) > at > org.apache.nifi.csv.AbstractCSVRecordReader.convert(AbstractCSVRecordReader.java:82) > at org.apache.nifi.csv.CSVRecordReader.nextRecord(CSVRecordReader.java:102) > at org.apache.nifi.serialization.RecordReader.nextRecord(RecordReader.java:50) > at > org.apache.nifi.csv.TestCSVRecordReader.testQuote(TestCSVRecordReader.java:610) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at org.junit.runner.JUnitCore.run(JUnitCore.java:137) > at > com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68) > at > com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47) > at > com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242) > at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) > {quote} > > Note, that according to RFC4180: > > If double-quotes are used to enclose fields, then a double-quote > appearing inside a field must be escaped by preceding it with > another double quote. > [https://tools.ietf.org/html/rfc4180#page-2] > > Then a field whose value is a double quote character would be encoded like > this: > """" > (4 double quote characters) > *How to reproduce* > Add the following method to TestCSVRecordReader.java and run the test: > > {code:java} > @Test > public void testQuote() throws IOException, MalformedRecordException { > final CSVFormat format = > CSVFormat.RFC4180.withFirstRecordAsHeader().withTrim().withQuote('"'); > final String text = "\"name\"\n\"\"\"\""; > final List<RecordField> fields = new ArrayList<>(); > fields.add(new RecordField("name", RecordFieldType.STRING.getDataType())); > final RecordSchema schema = new SimpleRecordSchema(fields); > try (final InputStream bais = new > ByteArrayInputStream(text.getBytes(StandardCharsets.UTF_8)); > final CSVRecordReader reader = new CSVRecordReader(bais, > Mockito.mock(ComponentLog.class), schema, format, true, false, > RecordFieldType.DATE.getDefaultFormat(), > RecordFieldType.TIME.getDefaultFormat(), > RecordFieldType.TIMESTAMP.getDefaultFormat(), StandardCharsets.UTF_8.name())) > { > final Record record = reader.nextRecord(); > final String name = (String)record.getValue("name"); > assertEquals("\"", name); > } > } > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)