[ 
https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394487#comment-16394487
 ] 

Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:43 PM:
----------------------------------------------------------------

[~alopresto]:
  Started implementing an XML Record Reader. Shall I create a separate ticket 
for this?

Similar to the JSON readers, the XML reader will expect either a single record 
(e. g. <root><field1>content<field1/><field2> ... <root/>) or an array of 
records (e. g. <root><record><field1>content<field1/><field2> ... 
<record/><record> ... <root/>)

The reader will be aligned with common transformators. "Normal" fields (e. g. 
String, Integer) can be described by simple key-value pairs:

XML definition
{code}
 <root>
   <record>
     <field1>content<field1/>
     <field2>123<field2/>
   <record/>
 <root/>
{code}

Schema definition
{code}
{   "name": "testschema",
    "namespace": "nifi",
    "type": "record",
    "fields": [
       { "name": "field1", "type": "string" }, 
       { "name": "field2", "type": "int" } 
    ] 
}
{code}

Parsing of attributes or nested fields require the definition of nested records 
and a field name for the content (optional, a prefix for attributes can be 
defined):
 Property: CONTENT_FIELD=content_field
 Property: ATTRIBUTE_PREFIX=attr.

XML definition
{code}
 <root>
   <record>
     <field1 attribute="attr123">some text<field1/>
     <field2 attribute="attr123">
       <nested1>some nested text</nested1>
       <nested2>some other nested text</nested2>
     <field2/>
   <record/>
 <root/>
{code}

Schema definition
{code}
{  
    "name": "testschema",
    "namespace": "nifi",
    "type": "record",
    "fields": [
        {
                        "name": "field1", 
                        "type": {
                                "name": "NestedRecord",
                                "type": "record",
                                "fields" : [
                                        {"name": "attr.attribute", "type": 
"string"},
                                        {"name": "content_field", "type": 
"string"}
                                ]
                        }
                },
        {
                        "name": "field2", 
                        "type": {
                                "name": "NestedRecord",
                                "type": "record",
                                "fields" : [
                                        {"name": "attr.attribute", "type": 
"string"},
                                        {"name": "nested1", "type": "string"},
                                        {"name": "nested2", "type": "string"}
                                ]
                        }
                }
    ]
}
{code}
What do you say?


was (Author: jope):
[~alopresto]:
  Started implementing an XML Record Reader. Shall I create a separate ticket 
for this?

Similar to the JSON readers, the XML reader will expect either a single record 
(e. g. <root><field1>content<field1/><field2> ... <root/>) or an array of 
records (e. g. <root><record><field1>content<field1/><field2> ... 
<record/><record> ... <root/>)

The reader will be aligned with common transformators. "Normal" fields (e. g. 
String, Integer) can be described by simple key-value pairs:

XML definition
{code}
 <root>
   <record>
     <field1>content<field1/>
     <field2>123<field2/>
   <record/>
 <root/>
{code}

Schema definition
{code}
{   "name": "testschema",
    "namespace": "nifi",
    "type": "record",
    "fields": [
       { "name": "field1", "type": "string" }, 
       { "name": "field2", "type": "int" } 
    ] 
}
{code}

Parsing of attributes or nested fields require the definition of nested records 
and a field name for the content (optional, a prefix for attributes can be 
defined):
 Property: CONTENT_FIELD=content_field
 Property: ATTRIBUTE_PREFIX=attr.

XML definition
{code}
 <root>
   <record>
     <field1 attribute="attr123">some text<field1/>
     <field2 attribute="attr123">
       <nested1>some nested text</nested1>
       <nested2>some other nested text</nested2>
     <field2/>
   <record/>
 <root/>
{code}

Schema definition
{code}
 { 
   "name": "testschema",
   "namespace": "nifi",
   "type": "record",
   "fields": [
     {
       "name": "field1", 
       "type": {
          "name": "NestedRecord",
          "type": "record",
          "fields" : [ 
           {  "name": "attr.attribute", "type": "string"  },
           {  "name": "content_field", "type": "string" }
         ]
       }
   },
   {
     "name": "field2", 
     "type": {
       "name": "NestedRecord",
       "type": "record",
       "fields" : [  
           {  "name": "attr.attribute", "type": "string"  },
           {  "name": "nested1", "type": "string"  },
           {  "name": "nested2", "type": "string"  }
        ]
       }
    }
   ]
 }
{code}
What do you say?

> Add XML record reader & writer services
> ---------------------------------------
>
>                 Key: NIFI-4185
>                 URL: https://issues.apache.org/jira/browse/NIFI-4185
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>    Affects Versions: 1.3.0
>            Reporter: Andy LoPresto
>            Priority: Major
>              Labels: json, records, xml
>
> With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, 
> XML conversion has not yet been targeted. This will replace the previous 
> ticket for XML to JSON conversion. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to