[ 
https://issues.apache.org/jira/browse/MINIFI-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936609#comment-15936609
 ] 

Andrew Christianson commented on MINIFI-244:
--------------------------------------------

[~joewitt], your inquiry is totally on-base. This is a tricky one. I agonized 
over this quite a bit, and dug into the semantics of Unpack/Merge from NiFi as 
well as the latest notify/wait processors. In general, my aim for any MiNiFi 
work is to remain as consistent with NiFi as possible. That said, if we have a 
case for doing something here, then it would be worthwhile to port to NiFi as 
well and do it differently there, for good purpose.

As for the purpose, it really boils down to a simple use-case: you take a tar 
(or some other sufficiently "complex" object) into a flow, and you want to 
manipulate one component of the complex object without having to break apart 
the object or perform other contortions (splitting, re-merging, etc.). If we 
can perform the transformation atomically on one data item within one 
processor, it solves a lot of complexity. I.e. no merge correlations, 
error-handling complexity, or complex notify/wait configurations.

For a completely concrete example, just consider performing an XSLT on one 
entry in a tar.

The comparison is:

Option A (conventional): run the FlowFile through Unpack, resulting in multiple 
flow files, route based on filename, run the target XML entry through 
TransformXML, then route everything back to Merge. Use the defrag strategy to 
re-create the original structure, while making sure to have the component 
attributes exactly right). Will it all arrive on time? What happens if one 
file, only incidentally extracted (in order to be able to re-create the tar), 
fails? We can set a timeout on the merge, but what if processing is just slow 
that day? Lots of questions arise. Does it work? Yes. A colleague familiar with 
NiFi told me this works "90%" of the time.

Option B (with lens): run the FlowFile through a lens ("focusing" on one part 
of an overall complex object) perform a transformation with a completely 
standard/unmodified processor (TransformXML), then send it back through a lens 
to get it back to its original state ("unfocus" the entry, "focus" the archive).

Ultimately, the aim is to reduce the need for a complex (and error-prone) flow, 
and avoid the temptation/need to write a custom processor every time there is a 
need to manipulate one part of a complex structure, while preserving structure 
(and doing it all atomically if possible).

As for the language and semantics, I agree with you in principle. It should all 
be as user-centric as possible, and the language of the flow should be simple 
even if the inner-workings are necessarily complex. So, I'm ears on how we 
could label this thing. I don't think that the "extract" verb fits, and I think 
the use-cases above should show why. Some other ideas (and I'm all ears for 
other suggestions):

- FocusArchive (a little too general as we are technically focusing the entry, 
not the archive)
- FocusArchiveEntry/UnfocusArchiveEntry (feels clunky?)
- ApplyArchiveLens (kind of like this one. just the right amount of generality 
and flexibility)
- ExposeArchiveElement/UnexposeArchiveElement
- RotateArchive (symmetrical; uses a geometric vs. optical analogy)
- InvertArchive (symmetrical)

I can't make it fit into any sort of extract/unpack verb, because that's just 
not what it is. The basic purpose of the processor is to expose a part of a 
greater whole, while explicitly not extracting or unpacking it. I think we 
probably need a new (to NiFi) verb.

The other thing is, this is all rooted in theory (category theory). The theory 
can be dense, but the blog entry linked in the description is focused more on 
the practical aspect. I believe that the concepts of lenses (and monads) can 
really help simplify some recurring dataflow design problems, even if we don't 
necessarily expose the language to the user. Even though the theory is complex, 
I think there is great value even for the typical NiFi user. This kind of 
transformation task comes up all the time, and I want to make the language and 
semantics as easy to understand as possible, while leveraging the benefits of 
the advanced theory.

> Create ArchiveLens processor
> ----------------------------
>
>                 Key: MINIFI-244
>                 URL: https://issues.apache.org/jira/browse/MINIFI-244
>             Project: Apache NiFi MiNiFi
>          Issue Type: Task
>          Components: C++, Extensions
>            Reporter: Andrew Christianson
>            Assignee: Andrew Christianson
>            Priority: Minor
>
> Create an ArchiveLens processor. A concise, though informal, definition of a 
> lens is as follows:
> "Essentially, they represent the act of “peering into” or “focusing in on” 
> some particular piece/path of a complex data object such that you can more 
> precisely target particular operations without losing the context or 
> structure of the overall data you’re working with." 
> https://medium.com/@dtipson/functional-lenses-d1aba9e52254#.hdgsvbraq
> Why an ArchiveLens in MiNiFi? Simply put, it will enable us to "focus in on" 
> an entry in the archive, perform processing *in-context* of that entry, then 
> re-focus on the overall archive. This allows for transformation or other 
> processing of an entry in the archive without losing the overall context of 
> the archive.
> Initial format support is tar, due to its simplicity and ubiquity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to