[ https://issues.apache.org/jira/browse/MINIFI-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936609#comment-15936609 ]
Andrew Christianson commented on MINIFI-244: -------------------------------------------- [~joewitt], your inquiry is totally on-base. This is a tricky one. I agonized over this quite a bit, and dug into the semantics of Unpack/Merge from NiFi as well as the latest notify/wait processors. In general, my aim for any MiNiFi work is to remain as consistent with NiFi as possible. That said, if we have a case for doing something here, then it would be worthwhile to port to NiFi as well and do it differently there, for good purpose. As for the purpose, it really boils down to a simple use-case: you take a tar (or some other sufficiently "complex" object) into a flow, and you want to manipulate one component of the complex object without having to break apart the object or perform other contortions (splitting, re-merging, etc.). If we can perform the transformation atomically on one data item within one processor, it solves a lot of complexity. I.e. no merge correlations, error-handling complexity, or complex notify/wait configurations. For a completely concrete example, just consider performing an XSLT on one entry in a tar. The comparison is: Option A (conventional): run the FlowFile through Unpack, resulting in multiple flow files, route based on filename, run the target XML entry through TransformXML, then route everything back to Merge. Use the defrag strategy to re-create the original structure, while making sure to have the component attributes exactly right). Will it all arrive on time? What happens if one file, only incidentally extracted (in order to be able to re-create the tar), fails? We can set a timeout on the merge, but what if processing is just slow that day? Lots of questions arise. Does it work? Yes. A colleague familiar with NiFi told me this works "90%" of the time. Option B (with lens): run the FlowFile through a lens ("focusing" on one part of an overall complex object) perform a transformation with a completely standard/unmodified processor (TransformXML), then send it back through a lens to get it back to its original state ("unfocus" the entry, "focus" the archive). Ultimately, the aim is to reduce the need for a complex (and error-prone) flow, and avoid the temptation/need to write a custom processor every time there is a need to manipulate one part of a complex structure, while preserving structure (and doing it all atomically if possible). As for the language and semantics, I agree with you in principle. It should all be as user-centric as possible, and the language of the flow should be simple even if the inner-workings are necessarily complex. So, I'm ears on how we could label this thing. I don't think that the "extract" verb fits, and I think the use-cases above should show why. Some other ideas (and I'm all ears for other suggestions): - FocusArchive (a little too general as we are technically focusing the entry, not the archive) - FocusArchiveEntry/UnfocusArchiveEntry (feels clunky?) - ApplyArchiveLens (kind of like this one. just the right amount of generality and flexibility) - ExposeArchiveElement/UnexposeArchiveElement - RotateArchive (symmetrical; uses a geometric vs. optical analogy) - InvertArchive (symmetrical) I can't make it fit into any sort of extract/unpack verb, because that's just not what it is. The basic purpose of the processor is to expose a part of a greater whole, while explicitly not extracting or unpacking it. I think we probably need a new (to NiFi) verb. The other thing is, this is all rooted in theory (category theory). The theory can be dense, but the blog entry linked in the description is focused more on the practical aspect. I believe that the concepts of lenses (and monads) can really help simplify some recurring dataflow design problems, even if we don't necessarily expose the language to the user. Even though the theory is complex, I think there is great value even for the typical NiFi user. This kind of transformation task comes up all the time, and I want to make the language and semantics as easy to understand as possible, while leveraging the benefits of the advanced theory. > Create ArchiveLens processor > ---------------------------- > > Key: MINIFI-244 > URL: https://issues.apache.org/jira/browse/MINIFI-244 > Project: Apache NiFi MiNiFi > Issue Type: Task > Components: C++, Extensions > Reporter: Andrew Christianson > Assignee: Andrew Christianson > Priority: Minor > > Create an ArchiveLens processor. A concise, though informal, definition of a > lens is as follows: > "Essentially, they represent the act of “peering into” or “focusing in on” > some particular piece/path of a complex data object such that you can more > precisely target particular operations without losing the context or > structure of the overall data you’re working with." > https://medium.com/@dtipson/functional-lenses-d1aba9e52254#.hdgsvbraq > Why an ArchiveLens in MiNiFi? Simply put, it will enable us to "focus in on" > an entry in the archive, perform processing *in-context* of that entry, then > re-focus on the overall archive. This allows for transformation or other > processing of an entry in the archive without losing the overall context of > the archive. > Initial format support is tar, due to its simplicity and ubiquity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)