[jira] [Commented] (NIFI-5224) Add SolrClientService
[ https://issues.apache.org/jira/browse/NIFI-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628391#comment-16628391 ] Johannes Peter commented on NIFI-5224: -- sorry [~mike.thomsen] for not responding for some reasons the jira notifications arrived in my spam folder. I already have started developing this, but haven't done too much, so go ahead. in the meanwhile, my second kid was born who consumes my entire open source time ;) > Add SolrClientService > - > > Key: NIFI-5224 > URL: https://issues.apache.org/jira/browse/NIFI-5224 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Assignee: Mike Thomsen >Priority: Major > > The Solr CRUD functions that are currently included in SolrUtils should be > moved to a controller service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5224) Add SolrClientService
[ https://issues.apache.org/jira/browse/NIFI-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488556#comment-16488556 ] Johannes Peter commented on NIFI-5224: -- "CRUD" might be a bit misleading in this case. I actually intended with this ticket what [~bende] stated. [~mike.thomsen] > Add SolrClientService > - > > Key: NIFI-5224 > URL: https://issues.apache.org/jira/browse/NIFI-5224 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > The Solr CRUD functions that are currently included in SolrUtils should be > moved to a controller service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NIFI-5224) Add SolrClientService
[ https://issues.apache.org/jira/browse/NIFI-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter reassigned NIFI-5224: Assignee: Johannes Peter > Add SolrClientService > - > > Key: NIFI-5224 > URL: https://issues.apache.org/jira/browse/NIFI-5224 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > The Solr CRUD functions that are currently included in SolrUtils should be > moved to a controller service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5224) Add SolrClientService
[ https://issues.apache.org/jira/browse/NIFI-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483177#comment-16483177 ] Johannes Peter commented on NIFI-5224: -- [~mike.thomsen] Related to this discussion: https://github.com/apache/nifi/pull/2517#discussion_r173344378 > Add SolrClientService > - > > Key: NIFI-5224 > URL: https://issues.apache.org/jira/browse/NIFI-5224 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Priority: Major > > The Solr CRUD functions that are currently included in SolrUtils should be > moved to a controller service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5224) Add SolrClientService
Johannes Peter created NIFI-5224: Summary: Add SolrClientService Key: NIFI-5224 URL: https://issues.apache.org/jira/browse/NIFI-5224 Project: Apache NiFi Issue Type: Improvement Reporter: Johannes Peter The Solr CRUD functions that are currently included in SolrUtils should be moved to a controller service. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5223) Allow the usage of expression language for properties of RecordSetWriters
[ https://issues.apache.org/jira/browse/NIFI-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483169#comment-16483169 ] Johannes Peter commented on NIFI-5223: -- [~mikerthomsen] related to this discussion: https://github.com/apache/nifi/pull/2675#discussion_r187770744 How could this be considered as optional? Where do you discuss such things? In the Developers Mailing List? > Allow the usage of expression language for properties of RecordSetWriters > - > > Key: NIFI-5223 > URL: https://issues.apache.org/jira/browse/NIFI-5223 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > To allow the usage of expression language for properties of RecordSetWriters, > the method createWriter of the interface RecordSetWriterFactory has to be > enhanced by a parameter to provide a map containing variables of a FlowFile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NIFI-5223) Allow the usage of expression language for properties of RecordSetWriters
[ https://issues.apache.org/jira/browse/NIFI-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter reassigned NIFI-5223: Assignee: Johannes Peter > Allow the usage of expression language for properties of RecordSetWriters > - > > Key: NIFI-5223 > URL: https://issues.apache.org/jira/browse/NIFI-5223 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > To allow the usage of expression language for properties of RecordSetWriters, > the method createWriter of the interface RecordSetWriterFactory has to be > enhanced by a parameter to provide a map containing variables of a FlowFile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5223) Allow the usage of expression language for properties of RecordSetWriters
Johannes Peter created NIFI-5223: Summary: Allow the usage of expression language for properties of RecordSetWriters Key: NIFI-5223 URL: https://issues.apache.org/jira/browse/NIFI-5223 Project: Apache NiFi Issue Type: Improvement Reporter: Johannes Peter To allow the usage of expression language for properties of RecordSetWriters, the method createWriter of the interface RecordSetWriterFactory has to be enhanced by a parameter to provide a map containing variables of a FlowFile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5189) If a schema is accessed using 'Use 'Schema Text' Property', the name of the schema is not available
[ https://issues.apache.org/jira/browse/NIFI-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474006#comment-16474006 ] Johannes Peter commented on NIFI-5189: -- [~markap14] Will open a PR for this today > If a schema is accessed using 'Use 'Schema Text' Property', the name of the > schema is not available > --- > > Key: NIFI-5189 > URL: https://issues.apache.org/jira/browse/NIFI-5189 > Project: Apache NiFi > Issue Type: Bug >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > If a schema is accessed using 'Use 'Schema Text' Property', the Avro schema > object will be transformed to a RecordSchema using the method > AvroTypeUtil.create(Schema avroSchema). This method returns a RecordSchema > with an empty SchemaIdentifier. Therefore, the name of the schema cannot be > accessed. The method should at least return a RecordSchema with a > SchemaIdentifier containing the name of the schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5189) If a schema is accessed using 'Use 'Schema Text' Property', the name of the schema is not available
Johannes Peter created NIFI-5189: Summary: If a schema is accessed using 'Use 'Schema Text' Property', the name of the schema is not available Key: NIFI-5189 URL: https://issues.apache.org/jira/browse/NIFI-5189 Project: Apache NiFi Issue Type: Bug Reporter: Johannes Peter Assignee: Johannes Peter If a schema is accessed using 'Use 'Schema Text' Property', the Avro schema object will be transformed to a RecordSchema using the method AvroTypeUtil.create(Schema avroSchema). This method returns a RecordSchema with an empty SchemaIdentifier. Therefore, the name of the schema cannot be accessed. The method should at least return a RecordSchema with a SchemaIdentifier containing the name of the schema. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451277#comment-16451277 ] Johannes Peter edited comment on NIFI-5113 at 4/24/18 9:59 PM: --- [~markap14] Hi Mark, I am wondering how we can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields are put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? was (Author: jope): [~markap14] Hi Mark, I am wondering how we can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451277#comment-16451277 ] Johannes Peter edited comment on NIFI-5113 at 4/24/18 9:57 PM: --- [~markap14] Hi Mark, I am wondering how we can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? was (Author: jope): [~markap14] Hi Mark, I am wondering how we can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451277#comment-16451277 ] Johannes Peter edited comment on NIFI-5113 at 4/24/18 9:56 PM: --- [~markap14] Hi Mark, I am wondering how we can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "PERSON", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? was (Author: jope): [~markap14] Hi Mark, I am wondering how we can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451277#comment-16451277 ] Johannes Peter edited comment on NIFI-5113 at 4/24/18 9:56 PM: --- [~markap14] Hi Mark, I am wondering how we can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? was (Author: jope): [~markap14] Hi Mark, I am wondering how I can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451277#comment-16451277 ] Johannes Peter commented on NIFI-5113: -- [~markap14] Hi Mark, I am wondering how I can solve the following issue: Assuming we have the following record: {code} MapRecord[{ID=1, NAME=Cleve Butler, AGE=42}] {code} Defining a schema for this is straightforward, as long as all keys shall be tags and all values shall be characters: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} Result: {code} 1 Cleve Butler 42 {code} However, I am wondering, how the schema can be defined to write XML with ID as attribute: {code} Cleve Butler 42 {code} One way could be to instruct users to define a prefix for attributes via a property. Let's assume, the value of the property is "ATTR_". The schema then has to be defined like this: Schema: {code} { "namespace": "nifi", "name": "test", "type": "record", "fields": [ { "name": "ATTR_ID", "type": "string" }, { "name": "NAME", "type": "string" }, { "name": "AGE", "type": "int" }, { "name": "COUNTRY", "type": "string" } ] } {code} When WriteXMLResult is created, the schema is checked for fields starting with "ATTR_". Matching fields are replaced by fields without the prefix. The reference to these fields is put into a list. When the above record is written to XML, the writer can check for each field, whether its reference is contained in the list. If that is the case, the field is written to the XML as attribute. This is the best workaround I have identified so far. Do you have any other ideas? Are there already any plans to enhance records / schemas by metadata / attributes? > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-4516) Add QuerySolr processor
[ https://issues.apache.org/jira/browse/NIFI-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter updated NIFI-4516: - Summary: Add QuerySolr processor (was: Add FetchSolr processor) > Add QuerySolr processor > --- > > Key: NIFI-4516 > URL: https://issues.apache.org/jira/browse/NIFI-4516 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > Labels: features > Fix For: 1.7.0 > > > The processor shall be capable > * to query Solr within a workflow, > * to make use of standard functionalities of Solr such as faceting, > highlighting, result grouping, etc., > * to make use of NiFis expression language to build Solr queries, > * to handle results as records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-4516) Add FetchSolr processor
[ https://issues.apache.org/jira/browse/NIFI-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter updated NIFI-4516: - Fix Version/s: 1.7.0 > Add FetchSolr processor > --- > > Key: NIFI-4516 > URL: https://issues.apache.org/jira/browse/NIFI-4516 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > Labels: features > Fix For: 1.7.0 > > > The processor shall be capable > * to query Solr within a workflow, > * to make use of standard functionalities of Solr such as faceting, > highlighting, result grouping, etc., > * to make use of NiFis expression language to build Solr queries, > * to handle results as records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter updated NIFI-5113: - External issue URL: (was: https://issues.apache.org/jira/browse/NIFI-4185) > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter updated NIFI-5113: - External issue URL: https://issues.apache.org/jira/browse/NIFI-4185 > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5113) Add XML record writer
Johannes Peter created NIFI-5113: Summary: Add XML record writer Key: NIFI-5113 URL: https://issues.apache.org/jira/browse/NIFI-5113 Project: Apache NiFi Issue Type: New Feature Reporter: Johannes Peter Assignee: Johannes Peter Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter updated NIFI-5113: - Description: Corresponding writer for the XML record reader was: Corresponding writer for the XML record reader Related issue: https://issues.apache.org/jira/browse/NIFI-4185 > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (NIFI-5113) Add XML record writer
[ https://issues.apache.org/jira/browse/NIFI-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter updated NIFI-5113: - Description: Corresponding writer for the XML record reader Related issue: https://issues.apache.org/jira/browse/NIFI-4185 was:Corresponding writer for the XML record reader > Add XML record writer > - > > Key: NIFI-5113 > URL: https://issues.apache.org/jira/browse/NIFI-5113 > Project: Apache NiFi > Issue Type: New Feature >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > > Corresponding writer for the XML record reader > Related issue: https://issues.apache.org/jira/browse/NIFI-4185 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (NIFI-5106) Add provenance reporting to GetSolr
[ https://issues.apache.org/jira/browse/NIFI-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter resolved NIFI-5106. -- Resolution: Fixed > Add provenance reporting to GetSolr > --- > > Key: NIFI-5106 > URL: https://issues.apache.org/jira/browse/NIFI-5106 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (NIFI-5106) Add provenance reporting to GetSolr
Johannes Peter created NIFI-5106: Summary: Add provenance reporting to GetSolr Key: NIFI-5106 URL: https://issues.apache.org/jira/browse/NIFI-5106 Project: Apache NiFi Issue Type: Improvement Reporter: Johannes Peter Assignee: Johannes Peter -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (NIFI-4516) Add FetchSolr processor
[ https://issues.apache.org/jira/browse/NIFI-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter resolved NIFI-4516. -- Resolution: Fixed > Add FetchSolr processor > --- > > Key: NIFI-4516 > URL: https://issues.apache.org/jira/browse/NIFI-4516 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > Labels: features > > The processor shall be capable > * to query Solr within a workflow, > * to make use of standard functionalities of Solr such as faceting, > highlighting, result grouping, etc., > * to make use of NiFis expression language to build Solr queries, > * to handle results as records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403399#comment-16403399 ] Johannes Peter edited comment on NIFI-4185 at 3/17/18 12:07 PM: Hi [~pvillard], for this reader I have not planned to require an XSD schema. My intention is that it can be configured in the same way like readers of other formats. I therefore translate Avro definitions to XML structures that are expected by the reader. Generally, the reader expects an array containing zero, one or more records. I use StAX as its pulling logic suits well to the record-lookup requirement. BTW: Do you have an idea which XML structure the reader could expect when users define a map in their schema? Maybe something like this? {code} content {content or object} ... ... {code} was (Author: jope): Hi [~pvillard], for this reader I have not planned to require an XSD schema. My intention is that it can be configured in the same way like readers of other formats. I therefore translate Avro definitions to XML structures that are expected by the reader. Generally, the reader expects an array containing zero, one or more records. I use StAX as its pulling logic suits well to the record-lookup requirement. BTW: Do you have an idea which XML structure the reader could expect when users define a map in their schema? Maybe something like this? {code} content {content or object} ... {code} > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Assignee: Johannes Peter >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403399#comment-16403399 ] Johannes Peter edited comment on NIFI-4185 at 3/17/18 12:07 PM: Hi [~pvillard], for this reader I have not planned to require an XSD schema. My intention is that it can be configured in the same way like readers of other formats. I therefore translate Avro definitions to XML structures that are expected by the reader. Generally, the reader expects an array containing zero, one or more records. I use StAX as its pulling logic suits well to the record-lookup requirement. BTW: Do you have an idea which XML structure the reader could expect when users define a map in their schema? Maybe something like this? {code} content {content or object} ... ... {code} was (Author: jope): Hi [~pvillard], for this reader I have not planned to require an XSD schema. My intention is that it can be configured in the same way like readers of other formats. I therefore translate Avro definitions to XML structures that are expected by the reader. Generally, the reader expects an array containing zero, one or more records. I use StAX as its pulling logic suits well to the record-lookup requirement. BTW: Do you have an idea which XML structure the reader could expect when users define a map in their schema? Maybe something like this? {code} content {content or object} ... ... {code} > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Assignee: Johannes Peter >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403399#comment-16403399 ] Johannes Peter commented on NIFI-4185: -- Hi [~pvillard], for this reader I have not planned to require an XSD schema. My intention is that it can be configured in the same way like readers of other formats. I therefore translate Avro definitions to XML structures that are expected by the reader. Generally, the reader expects an array containing zero, one or more records. I use StAX as its pulling logic suits well to the record-lookup requirement. BTW: Do you have an idea which XML structure the reader could expect when users define a map in their schema? Maybe something like this? {code} content content or object ... {code} > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Assignee: Johannes Peter >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403399#comment-16403399 ] Johannes Peter edited comment on NIFI-4185 at 3/17/18 12:05 PM: Hi [~pvillard], for this reader I have not planned to require an XSD schema. My intention is that it can be configured in the same way like readers of other formats. I therefore translate Avro definitions to XML structures that are expected by the reader. Generally, the reader expects an array containing zero, one or more records. I use StAX as its pulling logic suits well to the record-lookup requirement. BTW: Do you have an idea which XML structure the reader could expect when users define a map in their schema? Maybe something like this? {code} content {content or object} ... {code} was (Author: jope): Hi [~pvillard], for this reader I have not planned to require an XSD schema. My intention is that it can be configured in the same way like readers of other formats. I therefore translate Avro definitions to XML structures that are expected by the reader. Generally, the reader expects an array containing zero, one or more records. I use StAX as its pulling logic suits well to the record-lookup requirement. BTW: Do you have an idea which XML structure the reader could expect when users define a map in their schema? Maybe something like this? {code} content content or object ... {code} > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Assignee: Johannes Peter >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Peter reassigned NIFI-4185: Assignee: Johannes Peter > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Assignee: Johannes Peter >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:47 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record, e. g. {code} content ... {code} or an array of records, e. g. {code} content ... ... {code} The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition {code}    some text      some nested text    some other nested text    {code} Schema definition {code} { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "nested1", "type": "string"}, {"name": "nested2", "type": "string"} ] } } ] } {code} What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition {code}    some text      some nested text    some other nested text    {code} Schema definition {code} { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "nested1", "type": "string"}, {"name": "nested2", "type": "string"} ] } } ] } {code} What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:45 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition {code}    some text      some nested text    some other nested text    {code} Schema definition {code} { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "nested1", "type": "string"}, {"name": "nested2", "type": "string"} ] } } ] } {code} What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition {code}    some text      some nested text    some other nested text    {code} Schema definition {code} { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "nested1", "type": "string"}, {"name": "nested2", "type": "string"} ] } } ] } {code} What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:43 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition {code}    some text      some nested text    some other nested text    {code} Schema definition {code} { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "nested1", "type": "string"}, {"name": "nested2", "type": "string"} ] } } ] } {code} What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition {code}    some text      some nested text    some other nested text    {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [ { "name": "attr.attribute", "type": "string" },       { "name": "content_field", "type": "string" }      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       { "name": "attr.attribute", "type": "string" },       { "name": "nested1", "type": "string" },       { "name": "nested2", "type": "string" }     ]    }   }  ] } {code} What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:42 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition {code}    some text      some nested text    some other nested text    {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [ { "name": "attr.attribute", "type": "string" },       { "name": "content_field", "type": "string" }      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       { "name": "attr.attribute", "type": "string" },       { "name": "nested1", "type": "string" },       { "name": "nested2", "type": "string" }     ]    }   }  ] } {code} What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:39 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:38 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition {code}    content   123  {code} Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ \{ "name": "field1", "type": "string" }, \{ "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:38 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition {code} {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ \{ "name": "field1", "type": "string" }, \{ "name": "field2", "type": "int" } ] } {code} Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition ```json {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ \{ "name": "field1", "type": "string" }, \{ "name": "field2", "type": "int" } ] } ``` Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:37 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition ```json {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ \{ "name": "field1", "type": "string" }, \{ "name": "field2", "type": "int" } ] } ``` Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ \{ "name": "field1", "type": "string" }, \{ "name": "field2", "type": "int" } ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:36 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [ \{ "name": "field1", "type": "string" }, \{ "name": "field2", "type": "int" } ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \\{"name": "attr.attribute", "type": "string"} ,       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [    \{ "name": "field1", "type": "string" },    \{ "name": "field2", "type": "int" }   ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \{"name": "attr.attribute", "type": "string"},       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \{"name": "attr.attribute", "type": "string"},       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:34 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [    \{ "name": "field1", "type": "string" },    \{ "name": "field2", "type": "int" }   ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition    some text      some nested text    some other nested text    Schema definition {  "name": "testschema",  "namespace": "nifi",  "type": "record",  "fields": [   {    "name": "field1",    "type": {      "name": "NestedRecord",      "type": "record",      "fields" : [       \{"name": "attr.attribute", "type": "string"},       \{"name": "content_field", "type": "string"}      ]    }  },  {   "name": "field2",   "type": {    "name": "NestedRecord",    "type": "record",    "fields" : [       \{"name": "attr.attribute", "type": "string"},       \{"name": "nested1", "type": "string"},       \{"name": "nested2", "type": "string"}     ]    }   }  ] } What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": "string" } , { "name": "field2", "type": "int" } ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition some text some nested text some other nested text Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"} , {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"} , {"name": "nested1", "type": "string"} , {"name": "nested2", "type": "string"} ] } } ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter edited comment on NIFI-4185 at 3/11/18 12:29 PM: [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition    content   123  Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": "string" } , { "name": "field2", "type": "int" } ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition some text some nested text some other nested text Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"} , {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"} , {"name": "nested1", "type": "string"} , {"name": "nested2", "type": "string"} ] } } ] } What do you say? was (Author: jope): [~alopresto]:  Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition content 123 Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition some text some nested text some other nested text Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "nested1", "type": "string"}, {"name": "nested2", "type": "string"} ] } } ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4185) Add XML record reader & writer services
[ https://issues.apache.org/jira/browse/NIFI-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394487#comment-16394487 ] Johannes Peter commented on NIFI-4185: -- [~alopresto]: Â Started implementing an XML Record Reader. Shall I create a separate ticket for this? Similar to the JSON readers, the XML reader will expect either a single record (e. g. content ... ) or an array of records (e. g. content ... ... ) The reader will be aligned with common transformators. "Normal" fields (e. g. String, Integer) can be described by simple key-value pairs: XML definition content 123 Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": "string" }, { "name": "field2", "type": "int" } ] } Parsing of attributes or nested fields require the definition of nested records and a field name for the content (optional, a prefix for attributes can be defined): Property: CONTENT_FIELD=content_field Property: ATTRIBUTE_PREFIX=attr. XML definition some text some nested text some other nested text Schema definition { "name": "testschema", "namespace": "nifi", "type": "record", "fields": [ { "name": "field1", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "content_field", "type": "string"} ] } }, { "name": "field2", "type": { "name": "NestedRecord", "type": "record", "fields" : [ {"name": "attr.attribute", "type": "string"}, {"name": "nested1", "type": "string"}, {"name": "nested2", "type": "string"} ] } } ] } What do you say? > Add XML record reader & writer services > --- > > Key: NIFI-4185 > URL: https://issues.apache.org/jira/browse/NIFI-4185 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Affects Versions: 1.3.0 >Reporter: Andy LoPresto >Priority: Major > Labels: json, records, xml > > With the addition of the {{RecordReader}} and {{RecordSetWriter}} paradigm, > XML conversion has not yet been targeted. This will replace the previous > ticket for XML to JSON conversion. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4516) Add FetchSolr processor
[ https://issues.apache.org/jira/browse/NIFI-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387726#comment-16387726 ] Johannes Peter commented on NIFI-4516: -- Hi [~abhi.rohatgi], thank you for your offer, but I am almost done with this. > Add FetchSolr processor > --- > > Key: NIFI-4516 > URL: https://issues.apache.org/jira/browse/NIFI-4516 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Major > Labels: features > > The processor shall be capable > * to query Solr within a workflow, > * to make use of standard functionalities of Solr such as faceting, > highlighting, result grouping, etc., > * to make use of NiFis expression language to build Solr queries, > * to handle results as records. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4583) Restructure package nifi-solr-processors
[ https://issues.apache.org/jira/browse/NIFI-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16244399#comment-16244399 ] Johannes Peter commented on NIFI-4583: -- [~ijokarumawak] [~bbende] Do you agree? > Restructure package nifi-solr-processors > > > Key: NIFI-4583 > URL: https://issues.apache.org/jira/browse/NIFI-4583 > Project: Apache NiFi > Issue Type: Improvement >Reporter: Johannes Peter >Assignee: Johannes Peter >Priority: Minor > > Several functionalities currently implemented e. g. in GetSolr or > SolrProcessor should be made available for other processors or controller > services. A class SolrUtils should be created containing several static > methods. This includes the methods > - getRequestParams (PutSolrContentStream) > - solrDocumentsToRecordSet (GetSolr) > - createSolrClient (SolrProcessor) > and the inner class QueryResponseOutputStreamCallback (GetSolr) > Some unit tests might be affected. > The method declaration > protected SolrClient createSolrClient(final ProcessContext context, final > String solrLocation) > should be changed to > public static SolrClient createSolrClient(final PropertyContext context, > final String solrLocation) > to be suitable also for controller services. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (NIFI-4583) Restructure package nifi-solr-processors
Johannes Peter created NIFI-4583: Summary: Restructure package nifi-solr-processors Key: NIFI-4583 URL: https://issues.apache.org/jira/browse/NIFI-4583 Project: Apache NiFi Issue Type: Improvement Reporter: Johannes Peter Assignee: Johannes Peter Priority: Minor Several functionalities currently implemented e. g. in GetSolr or SolrProcessor should be made available for other processors or controller services. A class SolrUtils should be created containing several static methods. This includes the methods - getRequestParams (PutSolrContentStream) - solrDocumentsToRecordSet (GetSolr) - createSolrClient (SolrProcessor) and the inner class QueryResponseOutputStreamCallback (GetSolr) Some unit tests might be affected. The method declaration protected SolrClient createSolrClient(final ProcessContext context, final String solrLocation) should be changed to public static SolrClient createSolrClient(final PropertyContext context, final String solrLocation) to be suitable also for controller services. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (NIFI-4516) Add FetchSolr processor
Johannes Peter created NIFI-4516: Summary: Add FetchSolr processor Key: NIFI-4516 URL: https://issues.apache.org/jira/browse/NIFI-4516 Project: Apache NiFi Issue Type: Improvement Components: Extensions Reporter: Johannes Peter Assignee: Johannes Peter The processor shall be capable * to query Solr within a workflow, * to make use of standard functionalities of Solr such as faceting, highlighting, result grouping, etc., * to make use of NiFis expression language to build Solr queries, * to handle results as records. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172249#comment-16172249 ] Johannes Peter commented on NIFI-3248: -- Update: I am almost done with the new processor implementation. Quick update: (1) Meanwhile I had a little conversation with Cassandra Targett (Solr PMC), and she helped me clarifying some things about field \_version\_. Unfortunately, it is not possible to convert a value of this field into a valid timestamp. The values of this field are monotonically increasing depending on indexing time, but only at shard level, not at collection level. I am sorry for the confusion. The processor therefore iterates over shards if (a) Solr runs in cloud mode and (b) \_version\_ is used to track document retrieval instead of a dedicated date field. Although this way might require more queries and therefore be slower if collections comprise many shards, I implemented this to make the processor suitable for many more collections. The shard names currently have to be specified by property as I yet have not found a reliable way to figure them out automatically (shard names != core names). (2) I implemented an option to make the use of filter query caches configurable. (3) The processor now makes use of the StateManager. (4) I will add an option to convert results into records. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Johannes Peter > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any > documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, > i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date > range filter. > To avoid this, updated documents must have proper timezone in date field > string representation. > If one uses NiFi expression language to set current timestamp to that date > field, following NiFi expression can be used: > {code} > ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")} > {code} > It will produce a result like: > {code} > 2016-12-27T15:30:04.895+0900 > {code} > Then it will be indexed in Solr with UTC and will be queried by GetSolr as > expected. > h2. 3. Lag comes from NearRealTIme nature of Solr > Solr provides Near Real Time search capability, that means, the recently > updated documents can be queried in Near Real Time, but it's not real time. > This latency can be
[jira] [Comment Edited] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152432#comment-16152432 ] Johannes Peter edited comment on NIFI-3248 at 9/4/17 11:26 AM: --- [~ijokarumawak] (1) Sorting by ID ensures that each document ist retrieved only once, even if the document is updated. Sorting by \_version\_ asc ensures that each version of a document is retrieved once, as updated documents are "appended" at the end. I personally expect that someone, who uses Solr as a source, wants to see updated Solr documents in the target system to replace old ones. However, we could make this configurable. (2) The parameter fq provides the same query capabilities like q and can be used in the same way. The essential difference is that q basically is used to calculate relevancy, whereas fq is basically used to filter and to improve performance. In this case, we don't need relevancy as we sort by indexing time. Nevertheless, I see the point that users expect a property where they can configure the main query. (3) \_version\_ behaves like a timestamp, so there should be a little chance that two documents within a collection have the same value (in a cluster). I know that there is a way to convert it into a timestamp, but I first have to figure out how to do this exactly. Sorting by "\_version\_ asc" and using cursor marks should make the retrieval reliable to a very high degree. (4) I want to emphasize again that the logic and the purpose of GetSolr doesn't cover the capabilities of Solr sufficiently. There should be an additional processor to use Solr not only as a source, but also as a query layer. Features like faceting, grouping, pivot (e. g. for analytical purposes), spellchecking (e. g. for OCR or NLP), etc. etc. etc. are not covered by GetSolr (and shouldn't be included as the processor should focus on the reliable retrieval). However, there should be a more flexible option to query Solr within workflows. was (Author: jope): [~ijokarumawak] (1) Sorting by ID ensures that each document ist retrieved only once, even if the document is updated. Sorting by \_version\_ asc ensures that each version of a document is retrieved once, as updated documents are "appended" at the end. I personally expect that someone, who uses Solr as a source, wants to see updated Solr documents in the target system to replace old ones. However, we could make this configurable. (2) The parameter fq provides the same query capabilities like q and can be used in the same way. The essential difference is that q basically is used to calculate relevancy, whereas fq is basically used to filter and to improve performance. In this case, we don't need relevancy as we sort by indexing time. Nevertheless, I see the point that users expect a property where they can configure the main query. (3) \_version\_ behaves like a timestamp, so there should be a little chance that two documents within a collection have the same value (in a cluster). I know that there is a way to convert it into a timestamp, but I first have to figure out how to do this exactly. Sorting by "\_version\_ asc" and using cursor marks should make the retrieval reliable to a very high degree. (4) I want to emphasize again that the logic and the purpose of GetSolr doesn't cover the capabilities of Solr sufficiently. There should be an additional processor to use Solr not only as a source, but also as a query layer. Features like faceting, grouping, pivot (e. g. for analytical purposes), spellchecking (e. g. for OCR or NLP), etc. etc. etc. are not covered by GetSolr (and shouldn't be included as the main focus should rely on the reliable retrieval). However, there should be a more flexible option to query Solr within workflows. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Johannes Peter > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?|
[jira] [Comment Edited] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152432#comment-16152432 ] Johannes Peter edited comment on NIFI-3248 at 9/4/17 10:19 AM: --- [~ijokarumawak] (1) Sorting by ID ensures that each document ist retrieved only once, even if the document is updated. Sorting by \_version\_ asc ensures that each version of a document is retrieved once, as updated documents are "appended" at the end. I personally expect that someone, who uses Solr as a source, wants to see updated Solr documents in the target system to replace old ones. However, we could make this configurable. (2) The parameter fq provides the same query capabilities like q and can be used in the same way. The essential difference is that q basically is used to calculate relevancy, whereas fq is basically used to filter and to improve performance. In this case, we don't need relevancy as we sort by indexing time. Nevertheless, I see the point that users expect a property where they can configure the main query. (3) \_version\_ behaves like a timestamp, so there should be a little chance that two documents within a collection have the same value (in a cluster). I know that there is a way to convert it into a timestamp, but I first have to figure out how to do this exactly. Sorting by "\_version\_ asc" and using cursor marks should make the retrieval reliable to a very high degree. (4) I want to emphasize again that the logic and the purpose of GetSolr doesn't cover the capabilities of Solr sufficiently. There should be an additional processor to use Solr not only as a source, but also as a query layer. Features like faceting, grouping, pivot (e. g. for analytical purposes), spellchecking (e. g. for OCR or NLP), etc. etc. etc. are not covered by GetSolr (and shouldn't be included as the main focus should rely on the reliable retrieval). However, there should be a more flexible option to query Solr within workflows. was (Author: jope): (1) Sorting by ID ensures that each document ist retrieved only once, even if the document is updated. Sorting by \_version\_ asc ensures that each version of a document is retrieved once, as updated documents are "appended" at the end. I personally expect that someone, who uses Solr as a source, wants to see updated Solr documents in the target system to replace old ones. However, we could make this configurable. (2) The parameter fq provides the same query capabilities like q and can be used in the same way. The essential difference is that q basically is used to calculate relevancy, whereas fq is basically used to filter and to improve performance. In this case, we don't need relevancy as we sort by indexing time. Nevertheless, I see the point that users expect a property where they can configure the main query. (3) \_version\_ behaves like a timestamp, so there should be a little chance that two documents within a collection have the same value (in a cluster). I know that there is a way to convert it into a timestamp, but I first have to figure out how to do this exactly. Sorting by "\_version\_ asc" and using cursor marks should make the retrieval reliable to a very high degree. (4) I want to emphasize again that the logic and the purpose of GetSolr doesn't cover the capabilities of Solr sufficiently. There should be an additional processor to use Solr not only as a source, but also as a query layer. Features like faceting, grouping, pivot (e. g. for analytical purposes), spellchecking (e. g. for OCR or NLP), etc. etc. etc. are not covered by GetSolr (and shouldn't be included as the main focus should rely on the reliable retrieval). However, there should be a more flexible option to query Solr within workflows. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Johannes Peter > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp
[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152432#comment-16152432 ] Johannes Peter commented on NIFI-3248: -- (1) Sorting by ID ensures that each document ist retrieved once, even if the document is updated. Sorting by \_version\_ asc ensures that each version of a document is retrieved once, as updated documents are "appended" at the end. I personally expect that someone, who uses Solr as a source, wants to see updated Solr documents in the target system to replace old ones. However, we could make this configurable. (2) The parameter fq provides the same query capabilities like q and can be used in the same way. The essential difference is that q basically is used to calculate relevancy, whereas fq is basically used to filter and to improve performance. In this case, we don't need relevancy as we sort by indexing time. Nevertheless, I see the point that users expect a property where they can configure the main query. (3) \_version\_ behaves like a timestamp, so there should be a little chance that two documents within a collection have the same value (in a cluster). I know that there is a way to convert it into a timestamp, but I first have to figure out how to do this exactly. Sorting by "\_version\_ asc" and using cursor marks should make the retrieval reliable to a very high degree. (4) I want to emphasize again that the logic and the purpose of GetSolr doesn't cover the capabilities of Solr sufficiently. There should be an additional processor to use Solr not only as a source, but also as a query layer. Features like faceting, grouping, pivot (e. g. for analytical purposes), spellchecking (e. g. for OCR or NLP), etc. etc. etc. are not covered by GetSolr (and shouldn't be included as the main focus should rely on the reliable retrieval). However, there should be a more flexible option to query Solr within workflows. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Johannes Peter > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any > documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, > i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date > range filter. > To avoid this, updated documents must have proper timezone in date field > string representation. > If one uses NiFi
[jira] [Comment Edited] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152432#comment-16152432 ] Johannes Peter edited comment on NIFI-3248 at 9/4/17 10:17 AM: --- (1) Sorting by ID ensures that each document ist retrieved only once, even if the document is updated. Sorting by \_version\_ asc ensures that each version of a document is retrieved once, as updated documents are "appended" at the end. I personally expect that someone, who uses Solr as a source, wants to see updated Solr documents in the target system to replace old ones. However, we could make this configurable. (2) The parameter fq provides the same query capabilities like q and can be used in the same way. The essential difference is that q basically is used to calculate relevancy, whereas fq is basically used to filter and to improve performance. In this case, we don't need relevancy as we sort by indexing time. Nevertheless, I see the point that users expect a property where they can configure the main query. (3) \_version\_ behaves like a timestamp, so there should be a little chance that two documents within a collection have the same value (in a cluster). I know that there is a way to convert it into a timestamp, but I first have to figure out how to do this exactly. Sorting by "\_version\_ asc" and using cursor marks should make the retrieval reliable to a very high degree. (4) I want to emphasize again that the logic and the purpose of GetSolr doesn't cover the capabilities of Solr sufficiently. There should be an additional processor to use Solr not only as a source, but also as a query layer. Features like faceting, grouping, pivot (e. g. for analytical purposes), spellchecking (e. g. for OCR or NLP), etc. etc. etc. are not covered by GetSolr (and shouldn't be included as the main focus should rely on the reliable retrieval). However, there should be a more flexible option to query Solr within workflows. was (Author: jope): (1) Sorting by ID ensures that each document ist retrieved once, even if the document is updated. Sorting by \_version\_ asc ensures that each version of a document is retrieved once, as updated documents are "appended" at the end. I personally expect that someone, who uses Solr as a source, wants to see updated Solr documents in the target system to replace old ones. However, we could make this configurable. (2) The parameter fq provides the same query capabilities like q and can be used in the same way. The essential difference is that q basically is used to calculate relevancy, whereas fq is basically used to filter and to improve performance. In this case, we don't need relevancy as we sort by indexing time. Nevertheless, I see the point that users expect a property where they can configure the main query. (3) \_version\_ behaves like a timestamp, so there should be a little chance that two documents within a collection have the same value (in a cluster). I know that there is a way to convert it into a timestamp, but I first have to figure out how to do this exactly. Sorting by "\_version\_ asc" and using cursor marks should make the retrieval reliable to a very high degree. (4) I want to emphasize again that the logic and the purpose of GetSolr doesn't cover the capabilities of Solr sufficiently. There should be an additional processor to use Solr not only as a source, but also as a query layer. Features like faceting, grouping, pivot (e. g. for analytical purposes), spellchecking (e. g. for OCR or NLP), etc. etc. etc. are not covered by GetSolr (and shouldn't be included as the main focus should rely on the reliable retrieval). However, there should be a more flexible option to query Solr within workflows. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Johannes Peter > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or
[jira] [Comment Edited] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151784#comment-16151784 ] Johannes Peter edited comment on NIFI-3248 at 9/3/17 12:17 PM: --- [~ijokarumawak], [~bbende] I examined the current GetSolr implementation and I found several issues, which I want to discuss: (1) Currently, a date field needs to be included into the index schema and the Solr documents for indexing. Although this can be realized easily via Solrs' TimestampUpdateProcessor, it should be better, simply to use Solrs' \_version\_ field for filtering subsequent retrieval. This field is included in every well-configured Solr index as it is required for several functionalities. By doing so, this processor could also be used for indexes, which were not created considering NiFi interactions. (2) Iterating through a resultset will only be done if the processor runs the first time. This will be problematic if the amount of newly indexed documents in a trigger interval exceeds the configured batch size. (3) Successively increasing the start parameter to retrieve Solr documents in batches is accompanied by two problems in this context. First, this way shows a poor performance for large collections. Second, updating the index during the iteration will probably lead to duplicates or a loss of documents in the case that positions of documents change due to newly indexed documents or deletions. Instead of increasing the start parameter, cursor marks should be used, and the sorting should be fixed to an ascending order of the time when documents were indexed (\_version\_ field). More details on this can be retrieved here https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html (4) Using the fq-parameter instead of the q-parameter should improve the performance in some cases, as Solr is able to use caches for fq. The q-parameter should be fixed to "\*:\*". As a consequence, I suggest to redesign the GetSolr processor in a way that it mainly focuses on retrieving documents reliably. This can be done better by using cursor marks and the \_version\_ field. Additionally, users should not be enabled to change the parameters sort and q. The full query capabilities of Solr could be made available by integrating an additional processor, e. g. "FetchSolr". was (Author: jope): [~ijokarumawak], [~bbende] I examined the current GetSolr implementation and I found several issues, which I want to discuss: (1) Currently, a date field needs to be included into the index schema and the Solr documents for indexing. Although this can be realized easily via Solrs' TimestampUpdateProcessor, it should be better, simply to use Solrs' \_version\_ field for filtering subsequent retrieval. This field is included in every well-configured Solr index as it is required for several functionalities. By doing so, this processor could also be used for indexes, which were not created considering NiFi interactions. (2) Iterating through a resultset will only be done if the processor runs the first time. This will be problematic if the amount of newly indexed documents in a trigger interval exceeds the configured batch size. (3) Successively increasing the start parameter to retrieve Solr documents in batches is accompanied by two problems in this context. First, this way shows a poor performance for large collections. Second, updating the index during the iteration will probably lead to duplicates or a loss of documents in the case that positions of documents change due to newly indexed documents or deletions. Instead of increasing the start parameter, cursor marks should be used, and the sorting should be fixed to an ascending order of the time when documents were indexed (\_version\_ field). More details on this can be retrieved here https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html (4) Using the fq-parameter instead of the q-parameter should improve the performance in some cases, as Solr is able to use caches for fq. The q-parameter should be fixed to "*:*". As a consequence, I suggest to redesign the GetSolr processor in a way that it mainly focuses on retrieving documents reliably. This can be done better by using cursor marks and the \_version\_ field. Additionally, users should not be enabled to change the parameters sort and q. The full query capabilities of Solr could be made available by integrating an additional processor, e. g. "FetchSolr". > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura
[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151784#comment-16151784 ] Johannes Peter commented on NIFI-3248: -- [~ijokarumawak], [~bbende] I examined the current GetSolr implementation and I found several issues, which I want to discuss: (1) Currently, a date field needs to be included into the index schema and the Solr documents for indexing. Although this can be realized easily via Solrs' TimestampUpdateProcessor, it should be better, simply to use Solrs' \_version\_ field for filtering subsequent retrieval. This field is included in every well-configured Solr index as it is required for several functionalities. By doing so, this processor could also be used for indexes, which were not created considering NiFi interactions. (2) Iterating through a resultset will only be done if the processor runs the first time. This will be problematic if the amount of newly indexed documents in a trigger interval exceeds the configured batch size. (3) Successively increasing the start parameter to retrieve Solr documents in batches is accompanied by two problems in this context. First, this way shows a poor performance for large collections. Second, updating the index during the iteration will probably lead to duplicates or a loss of documents in the case that positions of documents change due to newly indexed documents or deletions. Instead of increasing the start parameter, cursor marks should be used, and the sorting should be fixed to an ascending order of the time when documents were indexed (\_version\_ field). More details on this can be retrieved here https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html (4) Using the fq-parameter instead of the q-parameter should improve the performance in some cases, as Solr is able to use caches for fq. The q-parameter should be fixed to "*:*". As a consequence, I suggest to redesign the GetSolr processor in a way that it mainly focuses on retrieving documents reliably. This can be done better by using cursor marks and the \_version\_ field. Additionally, users should not be enabled to change the parameters sort and q. The full query capabilities of Solr could be made available by integrating an additional processor, e. g. "FetchSolr". > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Johannes Peter > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as
[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149066#comment-16149066 ] Johannes Peter commented on NIFI-3248: -- [~ijokarumawak] Sure. I will start within next week. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any > documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, > i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date > range filter. > To avoid this, updated documents must have proper timezone in date field > string representation. > If one uses NiFi expression language to set current timestamp to that date > field, following NiFi expression can be used: > {code} > ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")} > {code} > It will produce a result like: > {code} > 2016-12-27T15:30:04.895+0900 > {code} > Then it will be indexed in Solr with UTC and will be queried by GetSolr as > expected. > h2. 3. Lag comes from NearRealTIme nature of Solr > Solr provides Near Real Time search capability, that means, the recently > updated documents can be queried in Near Real Time, but it's not real time. > This latency can be controlled by either on client side which requests the > update operation by specifying "commitWithin" parameter, or on the Solr > server side, "autoCommit" and "autoSoftCommit" in > [solrconfig.xml|https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-Commits]. > Since commit and updating index can be costly, it's recommended to set this > interval long enough up to the maximum tolerable latency. > However, this can be problematic with GetSolr. For instance, as shown in the > simple NiFi flow below, GetSolr can miss updated documents: > {code} > t1: GetSolr queried > t2: GenerateFlowFile set date = t2 > t3: PutSolrContentStream stored new doc > t4: GetSolr queried again, from t1 to t4, but the new doc hasn't been indexed > t5: Solr completed index > t6: GetSolr queried again, from t4 to t6, the doc didn't match query > {code} > This behavior should be at least documented. > Plus, it would be helpful to add a new configuration property to GetSolr, to > specify commit lag-time so that GetSolr aims older timestamp range to query > documents. > {code} > // with commit
[jira] [Comment Edited] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145306#comment-16145306 ] Johannes Peter edited comment on NIFI-3248 at 8/29/17 1:46 PM: --- Have you considered to use the Solr field \_version\_ yet? It can be treated like a timestamp. It also can be transformed to a timestamp. E. g. sorting for "\_version\_ desc" sorts documents depending on their time of indexing. was (Author: jope): Have you considered to use the Solr field \_version\_ yet? It can be treated like a timestamp. It also can be transformed to a timestamp. E. g. sorting for "_version_ desc" sorts documents depending on their time of indexing. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any > documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, > i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date > range filter. > To avoid this, updated documents must have proper timezone in date field > string representation. > If one uses NiFi expression language to set current timestamp to that date > field, following NiFi expression can be used: > {code} > ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")} > {code} > It will produce a result like: > {code} > 2016-12-27T15:30:04.895+0900 > {code} > Then it will be indexed in Solr with UTC and will be queried by GetSolr as > expected. > h2. 3. Lag comes from NearRealTIme nature of Solr > Solr provides Near Real Time search capability, that means, the recently > updated documents can be queried in Near Real Time, but it's not real time. > This latency can be controlled by either on client side which requests the > update operation by specifying "commitWithin" parameter, or on the Solr > server side, "autoCommit" and "autoSoftCommit" in > [solrconfig.xml|https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-Commits]. > Since commit and updating index can be costly, it's recommended to set this > interval long enough up to the maximum tolerable latency. > However, this can be problematic with GetSolr. For instance, as shown in the > simple NiFi flow below, GetSolr can miss updated documents: > {code} > t1: GetSolr queried > t2: GenerateFlowFile set date = t2
[jira] [Comment Edited] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145306#comment-16145306 ] Johannes Peter edited comment on NIFI-3248 at 8/29/17 1:46 PM: --- Have you considered to use the Solr field \_version\_ yet? It can be treated like a timestamp. It also can be transformed to a timestamp. E. g. sorting for "_version_ desc" sorts documents depending on their time of indexing. was (Author: jope): Have you considered to use the Solr field "__version__" yet? It can be treated like a timestamp. It also can be transformed to a timestamp. E. g. sorting for "_version_ desc" sorts documents depending on their time of indexing. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any > documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, > i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date > range filter. > To avoid this, updated documents must have proper timezone in date field > string representation. > If one uses NiFi expression language to set current timestamp to that date > field, following NiFi expression can be used: > {code} > ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")} > {code} > It will produce a result like: > {code} > 2016-12-27T15:30:04.895+0900 > {code} > Then it will be indexed in Solr with UTC and will be queried by GetSolr as > expected. > h2. 3. Lag comes from NearRealTIme nature of Solr > Solr provides Near Real Time search capability, that means, the recently > updated documents can be queried in Near Real Time, but it's not real time. > This latency can be controlled by either on client side which requests the > update operation by specifying "commitWithin" parameter, or on the Solr > server side, "autoCommit" and "autoSoftCommit" in > [solrconfig.xml|https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-Commits]. > Since commit and updating index can be costly, it's recommended to set this > interval long enough up to the maximum tolerable latency. > However, this can be problematic with GetSolr. For instance, as shown in the > simple NiFi flow below, GetSolr can miss updated documents: > {code} > t1: GetSolr queried > t2: GenerateFlowFile set date = t2
[jira] [Comment Edited] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145306#comment-16145306 ] Johannes Peter edited comment on NIFI-3248 at 8/29/17 1:45 PM: --- Have you considered to use the Solr field "__version__" yet? It can be treated like a timestamp. It also can be transformed to a timestamp. E. g. sorting for "_version_ desc" sorts documents depending on their time of indexing. was (Author: jope): Have you considered to use the Solr field "_version_" yet? It can be treated like a timestamp. It also can be transformed to a timestamp. E. g. sorting for "_version_ desc" sorts documents depending on their time of indexing. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any > documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, > i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date > range filter. > To avoid this, updated documents must have proper timezone in date field > string representation. > If one uses NiFi expression language to set current timestamp to that date > field, following NiFi expression can be used: > {code} > ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")} > {code} > It will produce a result like: > {code} > 2016-12-27T15:30:04.895+0900 > {code} > Then it will be indexed in Solr with UTC and will be queried by GetSolr as > expected. > h2. 3. Lag comes from NearRealTIme nature of Solr > Solr provides Near Real Time search capability, that means, the recently > updated documents can be queried in Near Real Time, but it's not real time. > This latency can be controlled by either on client side which requests the > update operation by specifying "commitWithin" parameter, or on the Solr > server side, "autoCommit" and "autoSoftCommit" in > [solrconfig.xml|https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-Commits]. > Since commit and updating index can be costly, it's recommended to set this > interval long enough up to the maximum tolerable latency. > However, this can be problematic with GetSolr. For instance, as shown in the > simple NiFi flow below, GetSolr can miss updated documents: > {code} > t1: GetSolr queried > t2: GenerateFlowFile set date = t2
[jira] [Commented] (NIFI-3248) GetSolr can miss recently updated documents
[ https://issues.apache.org/jira/browse/NIFI-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145306#comment-16145306 ] Johannes Peter commented on NIFI-3248: -- Have you considered to use the Solr field "_version_" yet? It can be treated like a timestamp. It also can be transformed to a timestamp. E. g. sorting for "_version_ desc" sorts documents depending on their time of indexing. > GetSolr can miss recently updated documents > --- > > Key: NIFI-3248 > URL: https://issues.apache.org/jira/browse/NIFI-3248 > Project: Apache NiFi > Issue Type: Bug > Components: Extensions >Affects Versions: 1.0.0, 0.5.0, 0.6.0, 0.5.1, 0.7.0, 0.6.1, 1.1.0, 0.7.1, > 1.0.1 >Reporter: Koji Kawamura >Assignee: Koji Kawamura > Attachments: nifi-flow.png, query-result-with-curly-bracket.png, > query-result-with-square-bracket.png > > > GetSolr holds the last query timestamp so that it only fetches documents > those have been added or updated since the last query. > However, GetSolr misses some of those updated documents, and once the > documents date field value becomes older than last query timestamp, the > document won't be able to be queried by GetSolr any more. > This JIRA is for tracking the process of investigating this behavior, and > discussion on them. > Here are things that can be a cause of this behavior: > |#|Short description|Should we address it?| > |1|Timestamp range filter, curly or square bracket?|No| > |2|Timezone difference between update and query|Additional docs might be > helpful| > |3|Lag comes from NearRealTIme nature of Solr|Should be documented at least, > add 'commit lag-time'?| > h2. 1. Timestamp range filter, curly or square bracket? > At the first glance, using curly and square bracket in mix looked strange > ([source > code|https://github.com/apache/nifi/blob/support/nifi-0.5.x/nifi-nar-bundles/nifi-solr-bundle/nifi-solr-processors/src/main/java/org/apache/nifi/processors/solr/GetSolr.java#L202]). > But these difference has a meaning. > The square bracket on the range query is inclusive and the curly bracket is > exclusive. If we use inclusive on both sides and a document has a time stamp > exactly on the boundary then it could be returned in two consecutive > executions, and we only want it in one. > This is intentional, and it should be as it is. > h2. 2. Timezone difference between update and query > Solr treats date fields as [UTC > representation|https://cwiki.apache.org/confluence/display/solr/Working+with+Dates|]. > If date field String value of an updated document represents time without > timezone, and NiFi is running on an environment using timezone other than > UTC, GetSolr can't perform date range query as users expect. > Let's say NiFi is running with JST(UTC+9). A process added a document to Solr > at 15:00 JST. But the date field doesn't have timezone. So, Solr indexed it > as 15:00 UTC. Then GetSolr performs range query at 15:10 JST, targeting any > documents updated from 15:00 to 15:10 JST. GetSolr formatted dates using UTC, > i.e. 6:00 to 6:10 UTC. The updated document won't be matched with the date > range filter. > To avoid this, updated documents must have proper timezone in date field > string representation. > If one uses NiFi expression language to set current timestamp to that date > field, following NiFi expression can be used: > {code} > ${now():format("-MM-dd'T'HH:mm:ss.SSSZ")} > {code} > It will produce a result like: > {code} > 2016-12-27T15:30:04.895+0900 > {code} > Then it will be indexed in Solr with UTC and will be queried by GetSolr as > expected. > h2. 3. Lag comes from NearRealTIme nature of Solr > Solr provides Near Real Time search capability, that means, the recently > updated documents can be queried in Near Real Time, but it's not real time. > This latency can be controlled by either on client side which requests the > update operation by specifying "commitWithin" parameter, or on the Solr > server side, "autoCommit" and "autoSoftCommit" in > [solrconfig.xml|https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-Commits]. > Since commit and updating index can be costly, it's recommended to set this > interval long enough up to the maximum tolerable latency. > However, this can be problematic with GetSolr. For instance, as shown in the > simple NiFi flow below, GetSolr can miss updated documents: > {code} > t1: GetSolr queried > t2: GenerateFlowFile set date = t2 > t3: PutSolrContentStream stored new doc > t4: GetSolr queried again, from t1 to t4, but the new doc hasn't been indexed > t5: Solr completed index > t6: GetSolr queried again, from t4 to t6, the doc didn't match query > {code} > This behavior should be at least documented. > Plus, it would be