Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/3057#discussion_r223893228
  
    --- Diff: 
nifi-nar-bundles/nifi-hive-bundle/nifi-hive3-processors/src/main/java/org/apache/nifi/processors/orc/PutORC.java
 ---
    @@ -157,19 +155,17 @@ public String getDefaultCompressionType(final 
ProcessorInitializationContext con
         public HDFSRecordWriter createHDFSRecordWriter(final ProcessContext 
context, final FlowFile flowFile, final Configuration conf, final Path path, 
final RecordSchema schema)
                 throws IOException, SchemaNotFoundException {
     
    -        final Schema avroSchema = AvroTypeUtil.extractAvroSchema(schema);
    -
             final long stripeSize = 
context.getProperty(STRIPE_SIZE).asDataSize(DataUnit.B).longValue();
             final int bufferSize = 
context.getProperty(BUFFER_SIZE).asDataSize(DataUnit.B).intValue();
             final CompressionKind compressionType = 
CompressionKind.valueOf(context.getProperty(COMPRESSION_TYPE).getValue());
             final boolean normalizeForHive = 
context.getProperty(HIVE_FIELD_NAMES).asBoolean();
    -        TypeInfo orcSchema = NiFiOrcUtils.getOrcField(avroSchema, 
normalizeForHive);
    +        TypeInfo orcSchema = NiFiOrcUtils.getOrcSchema(schema, 
normalizeForHive);
             final Writer orcWriter = NiFiOrcUtils.createWriter(path, conf, 
orcSchema, stripeSize, compressionType, bufferSize);
             final String hiveTableName = 
context.getProperty(HIVE_TABLE_NAME).isSet()
                     ? 
context.getProperty(HIVE_TABLE_NAME).evaluateAttributeExpressions(flowFile).getValue()
    -                : 
NiFiOrcUtils.normalizeHiveTableName(avroSchema.getFullName());
    +                : 
NiFiOrcUtils.normalizeHiveTableName(schema.toString());// TODO
    --- End diff --
    
    I admit I hadn't tested this part, the TODO should be removed but we likely 
need a way to get at the "name" of the top-level record if the Hive Table Name 
property is not set. Then again, I haven't seen anyone rely on the schema's 
full name as the table name, the Hive Table Name property is the recommended 
way to set this for the generated DDL. Welcome all comments though :)


---

Reply via email to