Peter, First of all, great work! I couldn't find an Apache Jira for this, but I remember seeing something in the dev email list about perhaps having a ControllerService for arbitrary conversions.
I took a look at the commit; first things first, looks good for the use case, thanks much! A handful of notes: - As you know, the critical part is your "RowConverter". To be most useful, we should make sure (like Java has, over time, with their supported SQL types), we can refer to structured types in some common language. So "startNewFile()" might be better as "newRecord()", and addRow() might be better expressed as a NiFi-defined interface. ResultSets could have a simple wrapper implementing the generic interface that would let the code consume it the same as any other object. - For a common language, perhaps we could refer to structured data as "records" and individual fields (also perhaps nested) as "fields". - With a common NiFi-defined API for getting records and fields, we could implement all the converters you mention without any explicit dependency on an implementation NAR/JAR. - We should avoid the explicit conversion of input format to any intermediate format; in other words, the interface should follow the Adapter or Facade pattern, and should convert-on-read. - I'm sure I've forgotten some of the details between format details, but I'll list them for discussion with respect to conversions: 1) XML 2) JSON 3) CSV (and TSV and other text-based delimited files) 4) SQL ResultSet 5) CQL ResultSet (Cassandra) 6) Elasticsearch result set 7) Avro 8) Parquet If we can cover these with a single interface and 8(-ish) implementations, then I think we're in good shape for world domination :) Not being sarcastic, I'm saying let's make it happen! Regards, Matt On Mon, Aug 22, 2016 at 9:32 PM, Peter Wicks (pwicks) <[email protected]> wrote: > I'm working on a change to QueryDatabaseTable (and eventually would apply to > ExecuteSQL) to allow users to choose the output format, so something besides > just Avro. I'm planning to put in a ticket soon for my work. > > Goals: > > - If this update goes out no user should be affected, as defaults > will work the way they have before. > > - Don't want to muck up dependencies of standard processors with > lots of other libraries to support multiple formats. As such I have > implemented it using Controller Services to make the converters pluggable. > > - Would like to support these output formats to start: > > o Avro (This one is already in my commit, as I copy/pasted the code over; > but the logic has now been moved to an appropriate library, which I like) > > o ORC (Would be implemented in the HIVE library) > > o JSON (No idea where I would put this processor, unless it's in a new > module) > > o Delimited (No idea where I would put this processor, unless it's in a new > module) > > Here is my commit: > https://github.com/patricker/nifi/commit/01a79804f60b6be0f86499949712cd9118fb4f7f > > I'd appreciate feedback on design/implementation. I have the guts in there > of how I was planning to implement this. > > Thanks, > Peter
