Re: Nifi integration record oriented processor for reading

Otto Fowler Thu, 17 Jun 2021 07:38:29 -0700
That sounds great.

> On Jun 17, 2021, at 10:14, Iñigo Angulo <[email protected]> wrote:
> 
> Hi Otto,
> 
> So we did this workaround, we re-forked the repo and added the changes of the 
> nifi-record feature on a dedicated branch. I closed the previous PR, will 
> upload the new one on the following days (we are doing some last test just in 
> case we forgot to copy something...)
> 
> I will keep you informed.
> 
> regards,
> iñigo
> 
> ----------------------------------------- 
> Iñigo Angulo 
> 
> ZYLK.net :: consultoría.openSource 
> telf.: 747412337 
> Ribera de Axpe, 11 
> Edificio A, modulo 201-203 
> 48950 Erandio (Bizkaia) 
> -----------------------------------------
> 
> ----- Mensaje original -----
> De: "ottobackwards" <[email protected]>
> Para: "dev" <[email protected]>
> Enviados: Jueves, 10 de Junio 2021 16:14:54
> Asunto: Re: Nifi integration record oriented processor for reading
> 
> Yes, that sounds great.
> Thank you
> 
>> On Jun 10, 2021, at 03:43, Iñigo Angulo <[email protected]> wrote:
>> 
>> Hi Otto,
>> 
>> I have tried to do the rebase of our commits, but I am having difficulties 
>> at it... 
>> 
>> The issue is: we forked the repo on september 2020, and started making tests 
>> and commits to our fork (on 'develop' branch). Now I am trying to do a 
>> rebase (using git rebase -i ID) specifying the ID of our first commit. But 
>> when the file is open for interactive mode, it gets ~900 commits on the 
>> develop branch (belonging to members of the PLC4X community). I think this 
>> happens because before opening the PullRequest, I did a 'merge upstream' 
>> with the actual PLC4X repo, to get the updates of the code. So, I understand 
>> that in the interactive mode file, I have to leave all commits to 'pick' (by 
>> default), and change our commits of the nifi feature to 'squash' except from 
>> the first one (which also remains as 'pick'). However, when I tried this, 
>> many conflicts appear, almost one per each commit (comunity members' 
>> commits)...
>> I may be doing something wrong (never did a rebase before...) and I prefered 
>> just to ask, as I dont want to break or cause any conflict to the repo 
>> code.. If you see anything Im missing please let me know.
>> 
>> As a workaround, I was thinking we could close the PR, re-do the fork of the 
>> PLC4X repo, and add the changes to the code on a dedicated 
>> 'feature-nifi-record' branch. Maybe this could make things clearer...
>> What do you think?
>> 
>> thank you,
>> iñigo
>> 
>> ----------------------------------------- 
>> Iñigo Angulo 
>> 
>> ZYLK.net :: consultoría.openSource 
>> telf.: 747412337 
>> Ribera de Axpe, 11 
>> Edificio A, modulo 201-203 
>> 48950 Erandio (Bizkaia) 
>> -----------------------------------------
>> 
>> ----- Mensaje original -----
>> De: "ottobackwards" <[email protected]>
>> Para: "dev" <[email protected]>
>> Enviados: Jueves, 27 de Mayo 2021 16:10:19
>> Asunto: Re: Nifi integration record oriented processor for reading
>> 
>> Awesome.
>> 
>> If you can, can I ask you to:
>> 
>> 1.  Mark the PR as ready to review in github
>> 2. rebase or squash it to a single commit and force push to your branch to 
>> clean it up
>> 
>> 
>> 
>>> On May 27, 2021, at 06:45, Iñigo Angulo <[email protected]> wrote:
>>> 
>>> Hi Otto, Chris
>>> 
>>> we have finally commited the updates on the Nifi Processor to the Pull 
>>> Request. The changes we have done are the following:
>>> - deducing avro datatypes from PlcResponse. Here we may check the method 
>>> (org.apache.plc4x.nifi.util.Plc4xCommon.createSchema()), in order to see if 
>>> it is the best way to do it.
>>> - we have added in the  plc4j/pom.xml an "ignoredDependency" for 
>>> org.apache.nifi:nifi-standard-nar, as it is used on runtime and was rising 
>>> errors during compilation.
>>> - we have changed onScheduled method in Plc4xSourceRecordProcessor 
>>> (comparing to BaseProcessor), as we have included the posibility to have an 
>>> input connection into the processor, and indicate the target addressMap 
>>> through flowfile attributes. The addressMap is now created in the onTrigger 
>>> method.
>>> - we have tested the performance with S7 and Modbus protocols (using a 
>>> Siemens S7-1200 and Schneider M221). We will upload an updated nifi 
>>> template for both protocols, but regarding this, do you have any testing 
>>> environment to simulate PLCs? If that is the case, we could prepare the 
>>> Processors configuration to match these ones (connection Strings and 
>>> addressMaps).
>>> 
>>> Please take a look at the code, any suggestion will be very welcome.
>>> 
>>> iñigo
>>> 
>>> ----------------------------------------- 
>>> Iñigo Angulo 
>>> 
>>> ZYLK.net :: consultoría.openSource 
>>> telf.: 747412337 
>>> Ribera de Axpe, 11 
>>> Edificio A, modulo 201-203 
>>> 48950 Erandio (Bizkaia) 
>>> -----------------------------------------
>>> 
>>> ----- Mensaje original -----
>>> De: "Iñigo Angulo" <[email protected]>
>>> Para: "dev" <[email protected]>
>>> Enviados: Miércoles, 12 de Mayo 2021 14:42:22
>>> Asunto: Re: Nifi integration record oriented processor for reading
>>> 
>>> Hi Otto, Chris,
>>> 
>>> we have been working on the processor to include the logic of getting the 
>>> values for variables from the PLC response. We tested it with the S7-1200 
>>> and seems to work fine, however we would like to make some further tests 
>>> before commiting it.
>>> 
>>> Regarding the actual method which takes the datatype from the response 
>>> object, we did it in the following way:
>>> 
>>> //PlcReadResponse readResponse
>>> Map<String, ? extends PlcValue> responseDataStructure = 
>>> readResponse.getAsPlcValue().getStruct();
>>> for (Map.Entry<String, ? extends PlcValue> entry : 
>>> responseDataStructure.entrySet()) {
>>> 
>>>  if (entry.getValue() instanceof PlcINT) {                                  
>>>     
>>>      
>>> builder.name(fieldName).type().unionOf().nullBuilder().endNull().and().intType().endUnion().noDefault();
>>>  }else if (entry.getValue() instanceof PlcREAL) {
>>>      
>>> builder.name(fieldName).type().unionOf().nullBuilder().endNull().and().doubleType().endUnion().noDefault();
>>>  } ... and so on for the rest of the classes on the package 
>>> (org.apache.plc4x.java.spi.values.*)
>>> 
>>>  //the builder object is used to build avro schema, with desired datatypes 
>>> (for example intType())
>>> }
>>> 
>>> 
>>> Is this the solution you had in mind? If you think there is a better way to 
>>> access PlcValues, please let us know and we will update it.
>>> 
>>> We will upload the code soon so that you can take a deeper look.
>>> 
>>> thank you!
>>> iñigo
>>>     
>>> ----------------------------------------- 
>>> Iñigo Angulo 
>>> 
>>> ZYLK.net :: consultoría.openSource 
>>> telf.: 747412337 
>>> Ribera de Axpe, 11 
>>> Edificio A, modulo 201-203 
>>> 48950 Erandio (Bizkaia) 
>>> -----------------------------------------
>>> 
>>> ----- Mensaje original -----
>>> De: "ottobackwards" <[email protected]>
>>> Para: "dev" <[email protected]>
>>> Enviados: Viernes, 30 de Abril 2021 14:50:11
>>> Asunto: Re: Nifi integration record oriented processor for reading
>>> 
>>> Sounds like a good plan!
>>> 
>>>> On Apr 30, 2021, at 05:26, Iñigo Angulo <[email protected]> wrote:
>>>> 
>>>> Hi Otto, Chris,
>>>> 
>>>> we have been reviewing the comments on the pull request, and started to 
>>>> think about the approach of extracting values from response directly. 
>>>> During next week, we will work on this and make some updates to the code 
>>>> with the suggestions you made. We keep you informed
>>>> 
>>>> thank you,
>>>> 
>>>> iñigo
>>>> 
>>>> ----------------------------------------- 
>>>> Iñigo Angulo 
>>>> 
>>>> ZYLK.net :: consultoría.openSource 
>>>> telf.: 747412337 
>>>> Ribera de Axpe, 11 
>>>> Edificio A, modulo 201-203 
>>>> 48950 Erandio (Bizkaia) 
>>>> -----------------------------------------
>>>> 
>>>> ----- Mensaje original -----
>>>> De: "Christofer Dutz" <[email protected]>
>>>> Para: "dev" <[email protected]>
>>>> Enviados: Viernes, 23 de Abril 2021 18:10:35
>>>> Asunto: AW: AW: Nifi integration record oriented processor for reading
>>>> 
>>>> Hi Inigo,
>>>> 
>>>> especially if you have a look at the KNX protocol. This doesn't define the 
>>>> usual IEC datatypes we tried to use for all normal PLC drivers.
>>>> Here we have hundreds of datatypes that don't match any other protocol. I 
>>>> think the PLCValue approach would be the simplest.
>>>> The one thing you have to keep in mind, is that you should check a 
>>>> PLCValue, if it's a list (Array type) or a Structure (Which sort of 
>>>> relates to komplex types with a more sophisticated structure).
>>>> 
>>>> Chris
>>>> 
>>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Iñigo Angulo <[email protected]> 
>>>> Gesendet: Freitag, 23. April 2021 15:34
>>>> An: dev <[email protected]>
>>>> Betreff: Re: AW: Nifi integration record oriented processor for reading
>>>> 
>>>> Hi Otto, Chris,
>>>> 
>>>> Yes, I think the approach you propose will be best. By now, we are 
>>>> generating the schema ourselves. We have a record writer who is in charge 
>>>> of reading PLC values. Schema is defined previously to reading the values. 
>>>> We build this schema getting the protocol from the 'connectionString' (S7, 
>>>> Modbus) and the specified variable type from the 'PLC resource address 
>>>> String' containing the list of variable to read. From this we deduce the 
>>>> expected Avro datatype when reading, for instance, a word in S7 or a coil 
>>>> in Modbus. 
>>>> 
>>>> However, as you mentioned, the other approach will be much clearer and 
>>>> useful. Ideally, getting the actual datatype from PLCValue when getting 
>>>> the response. Regarding this, we tried to keep the previously described 
>>>> 'mapping' separated from the rest of the code, so that hopefully it can be 
>>>> easily replaced..
>>>> 
>>>> We have done the pull request, hope you can take a look at the code and 
>>>> let us know what you think. We will fill the ICLA document too.
>>>> 
>>>> thank you
>>>> iñigo
>>>> 
>>>> 
>>>> 
>>>> -----------------------------------------
>>>> Iñigo Angulo 
>>>> 
>>>> ZYLK.net :: consultoría.openSource
>>>> telf.: 747412337
>>>> Ribera de Axpe, 11
>>>> Edificio A, modulo 201-203
>>>> 48950 Erandio (Bizkaia)
>>>> -----------------------------------------
>>>> 
>>>> ----- Mensaje original -----
>>>> De: "Christofer Dutz" <[email protected]>
>>>> Para: "dev" <[email protected]>
>>>> Enviados: Jueves, 22 de Abril 2021 17:12:49
>>>> Asunto: AW: Nifi integration record oriented processor for reading
>>>> 
>>>> Hi all,
>>>> 
>>>> Well, you get PlcValues from the response that wrap the different 
>>>> datatypes. So generally you shouldn't care about the detail type.
>>>> 
>>>> However, you can call getObject() which returns the core value the 
>>>> plc-value has ... so if it's the PLCValue for a Short, getObject will 
>>>> return a short value.
>>>> 
>>>> Does that help?
>>>> 
>>>> Chris
>>>> 
>>>> 
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Otto Fowler <[email protected]>
>>>> Gesendet: Donnerstag, 22. April 2021 15:21
>>>> An: [email protected]
>>>> Betreff: Re: Nifi integration record oriented processor for reading
>>>> 
>>>> So, you are generating the schema yourself, such that downstream if they 
>>>> inherit schema they will just get what you generate?  And you are trying 
>>>> to do that by the connection string?  If so, a different way I could 
>>>> imagine doing would be to get the ’types’ of the data from the responses 
>>>> themselves.  This would be more generic.  The flow I could imagine ( in 
>>>> OnTrigger ):
>>>> 
>>>> DO READ
>>>> IF NOT HAS SCHEMA
>>>>    GENERATE SCHEMA FROM RESPONSE AND CACHE IN ATOMIC WRITE WITH SCHEMA
>>>> 
>>>> Maybe Chris can speak to how to get the types from the responses.
>>>> 
>>>> 
>>>>> On Apr 22, 2021, at 05:48, Iñigo Angulo <[email protected]> wrote:
>>>>> 
>>>>> Hi Chris, Otto,
>>>>> 
>>>>> Regarding the Record Processor concept, i will try to give an overview. 
>>>>> In Nifi, information packages are called Flowfiles, and these are the 
>>>>> actual units of information that are exchanged between Procesors, all 
>>>>> along the dataflow. Flowfiles have two sections where we can manage data: 
>>>>> Attributes and Content. In the "traditional" Nifi approach, you work with 
>>>>> both sections, extracting information from the Content to the Attributes 
>>>>> and viceversa to perform operations. This approach could have one 
>>>>> limitation when you are processing batch data (lines from a CSV file for 
>>>>> instance), where you need to split each of the lines into different 
>>>>> Flowfiles. Thus, a 1000 line CSV file leads to 1000 Flowfiles to process, 
>>>>> each of them containing a single record.
>>>>> 
>>>>> On later versions of the product, they introduced the Record oriented 
>>>>> approach. This approach allows you to manage multiple records on a single 
>>>>> Flowfile's Content, as long as these records have all the same schema. 
>>>>> This means that the operations defined by the Processors are applied 
>>>>> simultaneously to the whole content at once. Following with the previous 
>>>>> example, a 1000 line CSV file could produce a single Flowfile with a 
>>>>> content of 1000 records. 
>>>>> 
>>>>> To do this, Nifi uses Avro, to serialize the Flowfile's Content. Then, 
>>>>> the Record Oriented Processors use Writers and Readers to present this 
>>>>> information in the desired format (such as Avro, Json, CSV, etc). 
>>>>> Basically, with the record oriented approach, Nifi introduced multiple 
>>>>> new Processors, and also included the Record version of many of the "old" 
>>>>> ones. Using this Record approach, Nifi perfomance enhances notably, 
>>>>> specially when working with large structured information.
>>>>> 
>>>>> The work we did was creating a Record Oriented Processor, based on the 
>>>>> previously existing one Plc4xSourceProcessor, to read values from the 
>>>>> devices. We have also included a README on the 
>>>>> plc4x/plc4j/integrations/apache-nifi module explaining the Processor 
>>>>> configuration and giving an example. Moreover, we put a nifi template 
>>>>> with a dataflow for testing these processors, if useful.
>>>>> 
>>>>> Otto, regarding the idea behind this new Processor, that is right. We 
>>>>> added the writer capability to the existing PLC4XSourceProcessor, so that 
>>>>> it formats the output to the desired configuration in a record manner. At 
>>>>> the actual implementation, we did this "protocol adaptation" from the 
>>>>> sintax of the particular properties on Processor's configuration. For 
>>>>> example, from connection string 's7://IP:PORT', we extract the S7 
>>>>> idenifier and map variable datatypes to the actual Avro datatypes for 
>>>>> build the record output schema. However, here we dont have vast 
>>>>> experience with PLC4X libraries, and for sure there will be better ways 
>>>>> for doing this.
>>>>> Also about the Base Processor, we were thinking that maybe the best 
>>>>> approach could be to have this Base Processor, and then implement readers 
>>>>> for particular protocols as Controller Services. But here also, it could 
>>>>> be very helpful to have your opinion.
>>>>> 
>>>>> Lastly, regarding the pull request, do you have any documentation on 
>>>>> how to do this? I mean, maybe you have defined some naming 
>>>>> conventions, or expected structure to facilitate later work. At the 
>>>>> present, we have a fork of the project where we have been working on 
>>>>> these Nifi changes. We updated the content of our fork (fetch/merge
>>>>> upstream) about 2 weeks ago, and commited our changes to the 'develop' 
>>>>> branch. Do we better create a new branch with our commits? how do you 
>>>>> prefer to receive the code? (we are not very experts on git, just in 
>>>>> case we could cause some problems...)
>>>>> 
>>>>> thank you in advance
>>>>> 
>>>>> iñigo
>>>>> 
>>>>> 
>>>>> -----------------------------------------
>>>>> Iñigo Angulo
>>>>> 
>>>>> ZYLK.net :: consultoría.openSource
>>>>> telf.: 747412337
>>>>> Ribera de Axpe, 11
>>>>> Edificio A, modulo 201-203
>>>>> 48950 Erandio (Bizkaia)
>>>>> -----------------------------------------
>>>>> 
>>>>> ----- Mensaje original -----
>>>>> De: "Christofer Dutz" <[email protected]>
>>>>> Para: "dev" <[email protected]>
>>>>> Enviados: Miércoles, 21 de Abril 2021 20:01:15
>>>>> Asunto: AW: Nifi integration record oriented processor for reading
>>>>> 
>>>>> The more I think of it,
>>>>> 
>>>>> Perhaps we should also think of potentially providing some information on 
>>>>> supported configuration options.
>>>>> Wouldn't it be cool if the driver could say: "I generally have these 
>>>>> options and they have these datatypes and mean this"
>>>>> Additionally, the transports could too say: "I generally have these 
>>>>> options and they have these datatypes and mean this"
>>>>> 
>>>>> I would be our StreamPipes friends would love something like that? Right?
>>>>> 
>>>>> Chris
>>>>> 
>>>>> 
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: Otto Fowler <[email protected]>
>>>>> Gesendet: Mittwoch, 21. April 2021 17:46
>>>>> An: [email protected]
>>>>> Betreff: Re: Nifi integration record oriented processor for reading
>>>>> 
>>>>> Hi Inigo,
>>>>> 
>>>>> I’m a committer on Apache Nifi as well as PLC4X, I would be happy to 
>>>>> review your processor.
>>>>> If I understand what you are saying correctly, you have a single 
>>>>> processor which supports record writing output?
>>>>> 
>>>>> plc4x -> records
>>>>> 
>>>>> And that you have, for configuration purposes for that processor created 
>>>>> support on a per protocol basis for configuration and validation?
>>>>> 
>>>>> If there is per protocol configuration / validation etc, it may be better 
>>>>> to have a base processor, and derived processors per protocol to handle 
>>>>> those differences.
>>>>> 
>>>>> I look forward to seeing the code.
>>>>> 
>>>>> 
>>>>>> On Apr 21, 2021, at 04:05, Iñigo Angulo <[email protected]> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I am writing as we have been working on the Apache Nifi integration part 
>>>>>> of the project. We have created a Record oriented processor for reading 
>>>>>> PLC data. It is based on the previous existing SourceProcessor, but 
>>>>>> works with records, using a Nifi Writer (such as Avro, Json, and so on) 
>>>>>> to write data on flowfiles content. 
>>>>>> 
>>>>>> We updated the code on our fork with the actual PLC4X git repo about 2 
>>>>>> weeks ago, and tested it reading values with S7 from a S7-1200 CPU from 
>>>>>> Nifi. Also, one of our customers has recently started to use it for 
>>>>>> validation. 
>>>>>> 
>>>>>> Currently, it works with S7 and Modbus over TCP. This is because we had 
>>>>>> to write some classes to map connectionString and variableList 
>>>>>> properties (sintax) of the processor to the actual protocol, to be able 
>>>>>> to build then avro schema for output flowfile, taking into account 
>>>>>> variable datatypes, etc. We only did this for S7 and Modbus. I am sure 
>>>>>> that there is a better way to do this, so at this point you maybe could 
>>>>>> take a look to find the best solution and avoid needing to do this 
>>>>>> mapping. 
>>>>>> 
>>>>>> If you find this useful, we could do a pull request to the main PLC4x 
>>>>>> repo. Let us know what you think. 
>>>>>> 
>>>>>> best regards,
>>>>>> iñigo
>>>>>> 
>>>>>> -----------------------------------------
>>>>>> Iñigo Angulo
>>>>>> 
>>>>>> ZYLK.net :: consultoría.openSource
>>>>>> telf.: 747412337
>>>>>> Ribera de Axpe, 11
>>>>>> Edificio A, modulo 201-203
>>>>>> 48950 Erandio (Bizkaia)
>>>>>> -----------------------------------------
Re: Nifi integration record oriented processor for reading

Reply via email to