[ https://issues.apache.org/jira/browse/NIFI-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685210#comment-16685210 ]
ASF GitHub Bot commented on NIFI-5791: -------------------------------------- Github user stevedlawrence commented on the issue: https://github.com/apache/nifi/pull/3130 After some research and reading the [Avro specification](https://avro.apache.org/docs/1.8.1/spec.html) , I'd agree that the DFDL infoset does seem somewhat similar to a Record. DFDL does support all the primitive types (null, boolean, int, long, float, double, bytes, string) and logical types (date, time, decimal), plus a few others (integer, byte, short, signed/unsigned) . But as far as complex types, it only really supports "records" and arrays. Below is the list of things in Avro that the DFDL infoset does not support: * It sort of supports enums, but only in the sense that it can validate that a primitive type is one of the valid enum values via the xsd:restriction. * Maps. In DFDL, a map would be implemented as a sequence of key/value pairs, so there wouldn't be any enforcement of unique keys. * Unions. Each element in the infoset must have an explicit primitive type. Each element can be optional or nulled, but cannot be a union of multiple primitive types. In DFDL, that is handled by an xs:choice of two different elements with different types and some method (often a discriminator) to determine which branch of the choice. * Namespaces are slightly different, but probably similar enough. DFDL uses XML namespacing. * Aliases are not supported. * Sort order. DFDL outputs infoset elements in the order in which they appear in schema. * The DFDL infoset does not contain the schema. One must keep track of the associated schema outside of the data. * The isn't really a concept of different serializations like Avro looks to have. Instead, the DFDL schema defines the physical data format via DFDL annotations, which are used to determine how to serialize/deserialize data. Theoretically, one could have different schemas with the same logical format but with with different DFDL annotations to describe physical formats, but that isn't a comment use case we've come across. The Daffodil devs would be happen to discuss the possibility of integrating Daffodil/DFDL as an alternative to Avro. Not sure if any of the above limitations are blockers. The core of Avro and DFDL definitely do seem to have some overlap. > Add Apache Daffodil parse/unparse processor > ------------------------------------------- > > Key: NIFI-5791 > URL: https://issues.apache.org/jira/browse/NIFI-5791 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions > Reporter: Steve Lawrence > Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)