[jira] [Commented] (NIFI-2841) SplitAvro Processor is Broken
[ https://issues.apache.org/jira/browse/NIFI-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542468#comment-15542468 ] David Hicks commented on NIFI-2841: --- Good and bad: 1) The bad is that I can't figure out how to get a file that isn't sensitive that this fails on. 2) The good is that I confirmed your patch works. Thanks for the fast turnaround on that. > SplitAvro Processor is Broken > - > > Key: NIFI-2841 > URL: https://issues.apache.org/jira/browse/NIFI-2841 > Project: Apache NiFi > Issue Type: Bug >Reporter: David Hicks >Priority: Critical > Attachments: NIFI-2841.patch > > > This is largely the fault of the Avro DataFileStream reader, but it's making > the processor unusable. The problem appears to occur when you make the > following series of calls (which happens because of the splitSize comparison): > reader.next() -> returns last element > reader.hasNext() -> returns false > reader.hasNext() -> returns true > reader.next() -> EOFException > org.apache.nifi.processor.exception.ProcessException: IOException thrown from > SplitAvro[id=22e03ca4-0151-4474-92fc-040e1fe12ab9]: java.io.EOFException > at > org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2013) > ~[na:na] > at > org.apache.nifi.processors.avro.SplitAvro$RecordSplitter$1.process(SplitAvro.java:250) > ~[nifi-avro-processors-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851) > ~[na:na] > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1822) > ~[na:na] > at > org.apache.nifi.processors.avro.SplitAvro$RecordSplitter.split(SplitAvro.java:236) > ~[nifi-avro-processors-0.7.0.jar:0.7.0] > at > org.apache.nifi.processors.avro.SplitAvro.onTrigger(SplitAvro.java:202) > ~[nifi-avro-processors-0.7.0.jar:0.7.0] > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > [nifi-api-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_101] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_101] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_101] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_101] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] > Caused by: java.io.EOFException: null > at > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) > ~[avro-1.7.7.jar:1.7.7] > at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) > ~[avro-1.7.7.jar:1.7.7] > at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:363) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:355) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > ~[avro-1.7.7.jar:1.7.7] > at
[jira] [Commented] (NIFI-2841) SplitAvro Processor is Broken
[ https://issues.apache.org/jira/browse/NIFI-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532676#comment-15532676 ] David Hicks commented on NIFI-2841: --- Ah, you're right. I shouldn't have included the bit about any and all files. I removed it from the description. I'm dealing with a sensitive data set, so I can't just send the file. I'm trying to track down how my files are created so I can send you one. > SplitAvro Processor is Broken > - > > Key: NIFI-2841 > URL: https://issues.apache.org/jira/browse/NIFI-2841 > Project: Apache NiFi > Issue Type: Bug >Reporter: David Hicks >Priority: Critical > > This is largely the fault of the Avro DataFileStream reader, but it's making > the processor unusable. The problem appears to occur when you make the > following series of calls (which happens because of the splitSize comparison): > reader.next() -> returns last element > reader.hasNext() -> returns false > reader.hasNext() -> returns true > reader.next() -> EOFException > org.apache.nifi.processor.exception.ProcessException: IOException thrown from > SplitAvro[id=22e03ca4-0151-4474-92fc-040e1fe12ab9]: java.io.EOFException > at > org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2013) > ~[na:na] > at > org.apache.nifi.processors.avro.SplitAvro$RecordSplitter$1.process(SplitAvro.java:250) > ~[nifi-avro-processors-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851) > ~[na:na] > at > org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1822) > ~[na:na] > at > org.apache.nifi.processors.avro.SplitAvro$RecordSplitter.split(SplitAvro.java:236) > ~[nifi-avro-processors-0.7.0.jar:0.7.0] > at > org.apache.nifi.processors.avro.SplitAvro.onTrigger(SplitAvro.java:202) > ~[nifi-avro-processors-0.7.0.jar:0.7.0] > at > org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) > [nifi-api-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127) > [nifi-framework-core-0.7.0.jar:0.7.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_101] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_101] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_101] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_101] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] > Caused by: java.io.EOFException: null > at > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) > ~[avro-1.7.7.jar:1.7.7] > at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) > ~[avro-1.7.7.jar:1.7.7] > at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:363) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:355) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) > ~[avro-1.7.7.jar:1.7.7] > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > ~[avro-1.7.7.jar:1.7.7] > at
[jira] [Updated] (NIFI-2841) SplitAvro Processor is Broken
[ https://issues.apache.org/jira/browse/NIFI-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Hicks updated NIFI-2841: -- Description: This is largely the fault of the Avro DataFileStream reader, but it's making the processor unusable. The problem appears to occur when you make the following series of calls (which happens because of the splitSize comparison): reader.next() -> returns last element reader.hasNext() -> returns false reader.hasNext() -> returns true reader.next() -> EOFException org.apache.nifi.processor.exception.ProcessException: IOException thrown from SplitAvro[id=22e03ca4-0151-4474-92fc-040e1fe12ab9]: java.io.EOFException at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2013) ~[na:na] at org.apache.nifi.processors.avro.SplitAvro$RecordSplitter$1.process(SplitAvro.java:250) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851) ~[na:na] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1822) ~[na:na] at org.apache.nifi.processors.avro.SplitAvro$RecordSplitter.split(SplitAvro.java:236) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.processors.avro.SplitAvro.onTrigger(SplitAvro.java:202) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-0.7.0.jar:0.7.0] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054) [nifi-framework-core-0.7.0.jar:0.7.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-0.7.0.jar:0.7.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-0.7.0.jar:0.7.0] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127) [nifi-framework-core-0.7.0.jar:0.7.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] Caused by: java.io.EOFException: null at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:363) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:355) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) ~[avro-1.7.7.jar:1.7.7] at org.apache.nifi.processors.avro.SplitAvro$RecordSplitter$1$1.process(SplitAvro.java:259) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:1998) ~[na:na] ... 17 common frames omitted was: This is largely the fault of the Avro DataFileStream reader, but it's making the processor unusable. The problem appears to occur when you make the following series of calls (which happens because of the splitSize comparison): reader.next() -> returns last element reader.hasNext() -> returns false reader.hasNext() ->
[jira] [Created] (NIFI-2842) Would like InferAvroSchema and ConvertCSVToAvro to handle numbers better
David Hicks created NIFI-2842: - Summary: Would like InferAvroSchema and ConvertCSVToAvro to handle numbers better Key: NIFI-2842 URL: https://issues.apache.org/jira/browse/NIFI-2842 Project: Apache NiFi Issue Type: New Feature Reporter: David Hicks Assume the following CSV: field1,field2 10,17 7,18.4 InferAvroSchema will parse both as an integer field, but ConvertCSVtoAvro will explode when trying to convert the second line, because it's a float. One recommendation would be to allow the user to specify multiple lines and choose the least restrictive one. If it parses a field as an integer and then a double, the double will override the integer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (NIFI-2841) SplitAvro Processor is Broken
David Hicks created NIFI-2841: - Summary: SplitAvro Processor is Broken Key: NIFI-2841 URL: https://issues.apache.org/jira/browse/NIFI-2841 Project: Apache NiFi Issue Type: Bug Reporter: David Hicks Priority: Critical This is largely the fault of the Avro DataFileStream reader, but it's making the processor unusable. The problem appears to occur when you make the following series of calls (which happens because of the splitSize comparison): reader.next() -> returns last element reader.hasNext() -> returns false reader.hasNext() -> returns true reader.next() -> EOFException This should be reproducible with any and all avro files. org.apache.nifi.processor.exception.ProcessException: IOException thrown from SplitAvro[id=22e03ca4-0151-4474-92fc-040e1fe12ab9]: java.io.EOFException at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2013) ~[na:na] at org.apache.nifi.processors.avro.SplitAvro$RecordSplitter$1.process(SplitAvro.java:250) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851) ~[na:na] at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1822) ~[na:na] at org.apache.nifi.processors.avro.SplitAvro$RecordSplitter.split(SplitAvro.java:236) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.processors.avro.SplitAvro.onTrigger(SplitAvro.java:202) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-0.7.0.jar:0.7.0] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054) [nifi-framework-core-0.7.0.jar:0.7.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-0.7.0.jar:0.7.0] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-0.7.0.jar:0.7.0] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127) [nifi-framework-core-0.7.0.jar:0.7.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] Caused by: java.io.EOFException: null at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:363) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:355) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:157) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) ~[avro-1.7.7.jar:1.7.7] at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) ~[avro-1.7.7.jar:1.7.7] at org.apache.nifi.processors.avro.SplitAvro$RecordSplitter$1$1.process(SplitAvro.java:259) ~[nifi-avro-processors-0.7.0.jar:0.7.0] at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:1998) ~[na:na] ... 17 common frames omitted -- This message was sent by Atlassian JIRA (v6.3.4#6332)