[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-15 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Looks good to me, was able to verify the functionality, going to merge


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-12 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Updated the rest of the backticks, thanks @VikingK and @bbende for your 
reviews!


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-12 Thread VikingK
Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@mattyb149 I checked out the new backtick ddl patch you applied. Its seems 
it only backticks the complex data structures, the "top" level ones are for 
example not ticked.
For example ProductId,Items,PropertyMap,Serial and Metadata should be 
backticked as well to protect from bad names.
```
hive.ddl
CREATE EXTERNAL TABLE IF NOT EXISTS Sales (ProductId INT, Items 
ARRAY>, PropertyMap MAP, 
Serial STRUCT<`Serial`:BIGINT, `Date`:TIMESTAMP, `SystemId`:BIGINT, `T`:BIGINT, 
`IncludeStartTx`:BOOLEAN>, Metadata STRUCT<`AvroTs`:TIMESTAMP>) STORED AS ORC
```


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-11 Thread VikingK
Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@mattyb149 asome work, I was gonna suggest the backtick feature since its a 
pain when downstream systems sends all kinds of weird names, like Timestamp 
etc. My solution right now is an ugly groovy script that eliminates them.

I'll run some testing tomorrow on the fixes.


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-11 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/3057
  
I found the issue and pushed a new commit with the fix. Also your test data 
exercised another part of the code I hadn't thought of, your "as" field is a 
keyword in Hive so when I used the generated hive.ddl attribute to create the 
table on top of the ORC files, it didn't work. The other change in this commit 
is to backtick-quote the field names to protect against field names that are 
reserved words.


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-11 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Ok using your test.avro going into PutORC with an AvroReader that uses 
embedded schema I can get the error you showed earlier. 

I also tested ConvertRecord using the AvroReader and JsonWrtier and that 
works and produces the following JSON:

```
[
  {
"OItems": [{
  "Od" : 9,
  "HS" : [47,119,61,61],
  "AS" : [65,65,61,61],
  "NS" : "0"
}]
   }
]
```


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-11 Thread VikingK
Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Maybe it's because I am missing a name: on the array level that's causing 
the Jason to pick 'array'. I'll test later tonight.


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-11 Thread VikingK
Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@bbende wierd, I tried it again and I got the same error, here is the 
output from avro tools and I also attached my test.avro message

```
java -jar avro-tools-1.8.2.jar getschema test.avro
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.
{
  "type" : "record",
  "name" : "IOlist",
  "namespace" : "analytics.models.its",
  "fields" : [ {
"name" : "OItems",
"type" : [ "null", {
  "type" : "array",
  "items" : {
"type" : "record",
"name" : "ISC",
"namespace" : "analytics.models.its.iolist.oitems",
"fields" : [ {
  "name" : "Od",
  "type" : [ "null", "long" ]
}, {
  "name" : "HS",
  "type" : [ "null", "bytes" ]
}, {
  "name" : "AS",
  "type" : [ "null", "bytes" ]
}, {
  "name" : "NS",
  "type" : [ "null", "string" ]
} ]
  }
} ]
  } ]
}

java -jar avro-tools-1.8.2.jar tojson test.avro
log4j:WARN No appenders could be found for logger 
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.

{"OItems":{"array":[{"Od":{"long":9},"HS":{"bytes":"/w=="},"AS":{"bytes":"AA=="},"NS":{"string":"0"}}]}}
```

[test.zip](https://github.com/apache/nifi/files/2469204/test.zip)


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-11 Thread bbende
Github user bbende commented on the issue:

https://github.com/apache/nifi/pull/3057
  
@VikingK in your schema it has OItems defined as an array, but then in the 
JSON OItems is not an array, its an object with a field called array. So 
running with that schema and example JSON I get: 
```

Caused by: java.lang.ClassCastException: 
org.codehaus.jackson.node.ObjectNode cannot be cast to 
org.codehaus.jackson.node.ArrayNode
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertField(JsonTreeRowRecordReader.java:188)
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:118)
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:83)
at 
org.apache.nifi.json.JsonTreeRowRecordReader.convertJsonNodeToRecord(JsonTreeRowRecordReader.java:74)
at 
org.apache.nifi.json.AbstractJsonRowRecordReader.nextRecord(AbstractJsonRowRecordReader.java:92)
```
Which makes sense because the OItems field is not an array, but the schema 
says it is.

I'm trying to figure out how to reproduce the other error you showed.


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-10 Thread VikingK
Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Sorry for the late answer got bogged down.

Avro:
```
{
  "name": "IOlist",
  "namespace": "analytics.models.its",
  "type": "record",
  "fields": [
{
  "name": "OItems",
  "type": [
"null",
{
  "type": "array",
  "items": {
"name": "ISC",
"namespace": "analytics.models.its.iolist.oitems",
"type": "record",
"fields": [
  {
"name": "Od",
"type": [
  "null",
  "long"
]
  },
  {
"name": "HS",
"type": [
  "null",
  "bytes"
]
  },
  {
"name": "AS",
"type": [
  "null",
  "bytes"
]
  },
  {
"name": "NS",
"type": [
  "null",
  "string"
]
  }
]
  }
}
  ]
}
  ]
}
```

JSON
```
{
  "OItems" : {
"array" : [ {
  "Od" : {
"long" : 9
  },
  "HS" : {
"bytes" : "/w=="
  },
  "AS" : {
"bytes" : "AA=="
  },
  "NS" : {
"string" : "0"
  }
} ]
  }
}
```


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-10 Thread mattyb149
Github user mattyb149 commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Can you share the schema of the data, and possibly a JSON export of your 
Avro file? I couldn't reproduce this with an array of ints, and @bbende ran 
successfully with an array of records.


---


[GitHub] nifi issue #3057: NIFI-5667: Add nested record support for PutORC

2018-10-10 Thread VikingK
Github user VikingK commented on the issue:

https://github.com/apache/nifi/pull/3057
  
Hi, I have been verifying the pull request and I came across this error for 
one of my Avro messages,
I am currently working to figure out which part of the message caused this.

```
2018-10-10 14:14:42,489 ERROR [Timer-Driven Process Thread-6] 
org.apache.nifi.processors.orc.PutORC 
PutORC[id=0430e7ab-99f1-3e25-c491-935245567fa3] Failed to write due to 
java.lang.ClassCa
stException: org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo 
cannot be cast to org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo: 
java.lang.ClassCastException: org.apache.hadoop
.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to 
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo
java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo cannot be cast to 
org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:127)
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:177)
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.lambda$convertToORCObject$0(NiFiOrcUtils.java:129)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:130)
at 
org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:73)
at 
org.apache.nifi.processors.orc.record.ORCHDFSRecordWriter.write(ORCHDFSRecordWriter.java:94)
at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$null$0(AbstractPutHDFSRecord.java:324)
at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2235)
at 
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2203)
at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.lambda$onTrigger$1(AbstractPutHDFSRecord.java:305)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1662)
at 
org.apache.nifi.processors.hadoop.AbstractPutHDFSRecord.onTrigger(AbstractPutHDFSRecord.java:272)
at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

```


---