[ 
https://issues.apache.org/jira/browse/NIFI-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685210#comment-16685210
 ] 

ASF GitHub Bot commented on NIFI-5791:
--------------------------------------

Github user stevedlawrence commented on the issue:

    https://github.com/apache/nifi/pull/3130
  
    After some research and reading the [Avro 
specification](https://avro.apache.org/docs/1.8.1/spec.html) , I'd agree that 
the DFDL infoset does seem somewhat similar to a Record.
    
    DFDL does support all the primitive types (null, boolean, int, long, float, 
double, bytes, string) and logical types (date, time, decimal), plus a few 
others (integer, byte, short, signed/unsigned) . But as far as complex types, 
it only really supports "records" and arrays. Below is the list of things in 
Avro that the DFDL infoset does not support:
    
    * It sort of supports enums, but only in the sense that it can validate 
that a primitive type is one of the valid enum values via the xsd:restriction.
    * Maps. In DFDL, a map would be implemented as a sequence of key/value 
pairs, so there wouldn't be any enforcement of unique keys.
    * Unions. Each element in the infoset must have an explicit primitive type. 
Each element can be optional or nulled, but cannot be a union of multiple 
primitive types. In DFDL, that is handled by an xs:choice of two different 
elements with different types and some method (often a discriminator) to 
determine which branch of the choice.
    * Namespaces are slightly different, but probably similar enough. DFDL uses 
XML namespacing.
    * Aliases are not supported.
    * Sort order. DFDL outputs infoset elements in the order in which they 
appear in schema.
    * The DFDL infoset does not contain the schema. One must keep track of the 
associated schema outside of the data.
    * The isn't really a concept of different serializations like Avro looks to 
have. Instead, the DFDL schema defines the physical data format via DFDL 
annotations, which are used to determine how to serialize/deserialize data. 
Theoretically, one could have different schemas with the same logical format 
but with with different DFDL annotations to describe physical formats, but that 
isn't a comment use case we've come across.
    
    The Daffodil devs would be happen to discuss the possibility of integrating 
Daffodil/DFDL as an alternative to Avro. Not sure if any of the above 
limitations are blockers. The core of Avro and DFDL definitely do seem to have 
some overlap.



> Add Apache Daffodil parse/unparse processor
> -------------------------------------------
>
>                 Key: NIFI-5791
>                 URL: https://issues.apache.org/jira/browse/NIFI-5791
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>            Reporter: Steve Lawrence
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to