Re: NiFi code re-use
Scott Youre very right there must be a better way. The flow registry with versioned flows is the answer. You can version control the common logic and reuse it in as many instances as you need. This is like having a flow Class in java terms where you can instantiate as many objects of that type Class you need. It was definitely a long missing solution that was addressed in nifi 1.5.0 and with the flow registry. Also, we should just remove the root group remote port limitation. It was an implementation choice long before we had multi tenant auth and now it no longer makes sense to force root group only. Still though the above scenario of versioned flows and the flow registry solves the main problem. thanks On Sat, May 12, 2018, 9:22 PM Charlie Meyer < charlie.me...@civitaslearning.com> wrote: > We do this often by leveraging the variable registery and the expression > language to make components be more dynamic and reusable > > -Charlie > > On Sat, May 12, 2018, 20:01 scott wrote: > > > Hi Devs, > > > > I've got a question about an observation I've had while working with > > NiFi. Is there a better way to re-use process groups similar to how > > programming languages reference functions, libraries, classes, or > > pointers. I know about remote process groups and templates, but neither > > do exactly what I was thinking. RPGs are great, but I think the output > > goes to the root canvas level, and you have to have have connectors all > > the way back up your flow hierarchy, and that's not practical. > > Ultimately, I'm looking for an easy way to re-use process groups that > > contain common logic in many of my flows, so that I reduce the amount of > > places I have to change. > > > > Hopefully that made sense. Appreciate your thoughts. > > > > Scott > > > > >
Re: NiFi code re-use
We do this often by leveraging the variable registery and the expression language to make components be more dynamic and reusable -Charlie On Sat, May 12, 2018, 20:01 scott wrote: > Hi Devs, > > I've got a question about an observation I've had while working with > NiFi. Is there a better way to re-use process groups similar to how > programming languages reference functions, libraries, classes, or > pointers. I know about remote process groups and templates, but neither > do exactly what I was thinking. RPGs are great, but I think the output > goes to the root canvas level, and you have to have have connectors all > the way back up your flow hierarchy, and that's not practical. > Ultimately, I'm looking for an easy way to re-use process groups that > contain common logic in many of my flows, so that I reduce the amount of > places I have to change. > > Hopefully that made sense. Appreciate your thoughts. > > Scott > >
NiFi code re-use
Hi Devs, I've got a question about an observation I've had while working with NiFi. Is there a better way to re-use process groups similar to how programming languages reference functions, libraries, classes, or pointers. I know about remote process groups and templates, but neither do exactly what I was thinking. RPGs are great, but I think the output goes to the root canvas level, and you have to have have connectors all the way back up your flow hierarchy, and that's not practical. Ultimately, I'm looking for an easy way to re-use process groups that contain common logic in many of my flows, so that I reduce the amount of places I have to change. Hopefully that made sense. Appreciate your thoughts. Scott
Re: Graph database support w/ NiFi
Matt, You have some interesting ideas that I really like. GraphReaders and GraphWriters would be interesting. When I started writing a graph processor with my idea, the concept was not yet implemented in NiFi. I don't find GraphML and GraphSON so tingly because they contain e.g. the Vertex/Edge IDs and serve as import and export format to my knowledge (correct me if I'm wrong). A ConvertRecordToGraph processor is a good approach, the only question is from which format we can convert? I also think to make a graph processor a bit general we would have to provide a query as input which provides the correct vertex from which the graph should be extended. Maybe like your suggestion with a gremlin query or a small gremlin script. If a vertex is found a new edge and a new vertex are added. It asks how we transmit the individual attributes to the edge and vertex as well as the labels of the edge and vertex? Possibly with NiFi attributes? I have some headaches about the complexity. A small example: Imagine we have a set from a CSV file. The columns are Set ID, Token1, Token2, Token3... ID, Token1,Token2,Token3,Token4,Token5 123, Mary, had, a, little, lamp I want to create a vertex with ID 123 (if not exists). Then I want to check for each token if a vertex exists in the graph database (search for vertex with label "Token" and attribute "name"="Mary"). If the vertex does not exist, the vertex has to be created. Since I want to save e.g. Wikipedia to my graph I want to avoid the supernode problem for the token vertices. I create a few distribution vertices for each vertex that belongs to a token. If there is a vertex for Token1(Mary) then I don't want to make the edge from this vertex to my vertex with the ID 123, but from one of the distribution vertices. If the vertex for the token does not exist, the distribution vertices have also to be created ... and so on... Even with this very simple example it seems to become difficult with a universal processor. In any case I think the idea to implement a graph processor in NiFi is a good one. The more we work on it the more good ideas we get and maybe only I can't see the forest for the trees. One question about Titan. To my knowledge, Titan has been dead for a year and a half and Janusgraph is the successor? Titan has become unofficially Datastax Enterprise Graph?! Supporting Titan could become difficult because Titan does not support my knowledge after TinkerPop 3 and is no longer maintained. I like your idea for a wiki page for more ideas. In the many mails one loses oneself otherwise. Regards, Kay-Uwe Am 12.05.2018 um 16:52 schrieb Matt Burgess: > All, > > As Joe implied, I'm very happy that we are discussing graph tech in > relation to NiFi! NiFi and Graph theory/tech/analytics are passions of > mine. Mike, the examples you list are great, I would add Titan (and > its fork Janusgraph as Kay-Uwe mentioned) and Azure CosmosDB (these > and others are at [1]). I think there are at least four aspects to > this: > > 1) Graph query/traversal: This deals with getting data out of a graph > database and into flow file(s) for further processing. Here I agree > with Kay-Uwe that we should consider Apache Tinkerpop as the main > library for graph query/traversal, for a few reasons. The first as > Kay-Uwe said is that there are many adapters for Tinkerpop (TP) to > connect to various databases, from Mike's list I believe ArangoDB is > the only one that does not yet have a TP adapter. The second is > informed by the first, TP is a standard interface and graph traversal > engine with a common DSL in Gremlin. A third is that Gremlin is a > Groovy-based DSL, and Groovy syntax is fairly close to Java 8+ syntax > and you can call Groovy/Gremlin from Java and vice versa. A third is > that Tinkerpop is an Apache TLP with a very active and vibrant > community, so we will be able to reap the benefits of all the graph > goodness they develop moving forward. I think a QueryGraph processor > could be appropriate, perhaps with a GraphDBConnectionPool controller > service or something of the like. Apache DBCP can't do the pooling for > us, but we could implement something similar to that for pooling TP > connections. > > 2) Graph ingest: This one IMO is the long pole in the tent. Gremlin is > a graph traversal language, and although its API has addVertex() and > addEdge() methods and such, it seems like an inefficient solution, > akin to using individual INSERTs in an RDBMS rather than a > PreparedStatement or a bulk load. Keeping the analogy, bulk loading in > RDBMSs is usually specific to that DB, and the same goes for graphs. > The Titan-based ones have Titan-Hadoop (formerly Faunus), Neo4j has > external tools (not sure if there's a Java API or not) and Cypher, > OrientDB has an ETL pipeline system, etc. If we have a standard Graph > concept, we could have controller services / writers that are > system-specific (see aspect #4). > > 3) Arbitrary data -> Graph: Converting non-gra
Re: Graph database support w/ NiFi
All, As Joe implied, I'm very happy that we are discussing graph tech in relation to NiFi! NiFi and Graph theory/tech/analytics are passions of mine. Mike, the examples you list are great, I would add Titan (and its fork Janusgraph as Kay-Uwe mentioned) and Azure CosmosDB (these and others are at [1]). I think there are at least four aspects to this: 1) Graph query/traversal: This deals with getting data out of a graph database and into flow file(s) for further processing. Here I agree with Kay-Uwe that we should consider Apache Tinkerpop as the main library for graph query/traversal, for a few reasons. The first as Kay-Uwe said is that there are many adapters for Tinkerpop (TP) to connect to various databases, from Mike's list I believe ArangoDB is the only one that does not yet have a TP adapter. The second is informed by the first, TP is a standard interface and graph traversal engine with a common DSL in Gremlin. A third is that Gremlin is a Groovy-based DSL, and Groovy syntax is fairly close to Java 8+ syntax and you can call Groovy/Gremlin from Java and vice versa. A third is that Tinkerpop is an Apache TLP with a very active and vibrant community, so we will be able to reap the benefits of all the graph goodness they develop moving forward. I think a QueryGraph processor could be appropriate, perhaps with a GraphDBConnectionPool controller service or something of the like. Apache DBCP can't do the pooling for us, but we could implement something similar to that for pooling TP connections. 2) Graph ingest: This one IMO is the long pole in the tent. Gremlin is a graph traversal language, and although its API has addVertex() and addEdge() methods and such, it seems like an inefficient solution, akin to using individual INSERTs in an RDBMS rather than a PreparedStatement or a bulk load. Keeping the analogy, bulk loading in RDBMSs is usually specific to that DB, and the same goes for graphs. The Titan-based ones have Titan-Hadoop (formerly Faunus), Neo4j has external tools (not sure if there's a Java API or not) and Cypher, OrientDB has an ETL pipeline system, etc. If we have a standard Graph concept, we could have controller services / writers that are system-specific (see aspect #4). 3) Arbitrary data -> Graph: Converting non-graph data into a graph almost always takes domain knowledge, which NiFi itself won't have and will thus have to be provided by the user. We'd need to make it as simple as possible but also as powerful and flexible as possible in order to get the most value. We can investigate how each of the systems in aspect #2 approaches this, and perhaps come up with a good user experience around it. 4) Organization and implementation: I think we should make sure to keep the capabilities very loosely coupled in terms of which modules/NARs/JARs provide which capabilities, to allow for maximum flexibility and ease of future development. I would prefer an API/libraries module akin to nifi-hadoop-libraries-nar, which would only include Apache Tinkerpop and any dependencies needed to do "pure" graph stuff, so probably no TP adapters except tinkergraph (and/or its faster fork from ShiftLeft [2]). The reason I say that is so NiFi components (and even the framework!) could use graphs in a lightweight manner, without lots of heavy and possibly unnecessary dependencies. Imagine being able to query your own flows using Gremlin or Cypher! I also envision an API much like the Record API in NiFi but for graphs, so we'd have GraphReaders and GraphWriters perhaps, they could convert from GraphML to GraphSON or Kryo for example, or in conjunction with a ConvertRecordToGraph processor, could be used to support the capability in aspect #3 above. I'd also be looking at bringing in Gremlin to the scripting processors, or having a Gremlin based scripting bundle as NiFi's graph capabilities mature. You might be able to tell I'm excited about this discussion ;) Should we get a Wiki page going for ideas, and/or keep it going here, or something else? I'm all ears for thoughts, questions, and ideas (especially the ones that might seem crazy!) Regards, Matt [1] http://tinkerpop.apache.org/providers.html [2] https://github.com/ShiftLeftSecurity/tinkergraph-gremlin On Sat, May 12, 2018 at 8:02 AM, u...@moosheimer.com wrote: > Hi Mike, > > graph database support is not quite as easy as it seems. > Unlike relational databases, graphs have not only defined vertices and edges > (labeled vertices and edges), they are directed or not and might have > attributes at the nodes and edges, too. > > This makes it a bit confusing for a general interface. > > In general, a graph database should always be accessed via TinkerPop 3 (or > higher), since every professional graph database supports TinkerPop. > TinkerPop is for graph databases what jdbc is for relational databases. > > I tried to create a general NiFi processor for graph databases myself and > then quit. > Unlike relational databases, graph databases usu
Re: Graph database support w/ NiFi
joe Wouldn't it be good to integrate Apache Atlas more to NiFi? What I mean is just using something existing before doing it on any new way. Mit freundlichen Grüßen / best regards Kay-Uwe Moosheimer > Am 12.05.2018 um 13:07 schrieb Joe Witt : > > mike > > Do you mean support to send data to a graphdb? > > A really awesome case would be sending provenance data to one and building > queries, etc... around it! > > I know mattyb would be all over that. > > Thanks > >> On Sat, May 12, 2018, 7:02 AM Mike Thomsen wrote: >> >> I was wondering if anyone on the dev list had given much thought to graph >> database support in NiFi. There are a lot of graph databases out there, and >> many of them seem to be half-baked or barely supported. Narrowing it down, >> it looks like the best candidates for a no fuss, decent sized graph that we >> could build up with NiFi processors would be OrientDB, Neo4J and ArangoDB. >> The first two are particularly attractive because they offer JDBC drivers >> which opens the potential to making them even part of the standard >> JDBC-based processors. >> >> Anyone have any opinions or insights on this issue? I might have to do >> OrientDB anyway, but if someone has a good feel for the market and can make >> recommendations that would be appreciated. >> >> Thanks, >> >> Mike >>
Re: Graph database support w/ NiFi
Hi Mike, graph database support is not quite as easy as it seems. Unlike relational databases, graphs have not only defined vertices and edges (labeled vertices and edges), they are directed or not and might have attributes at the nodes and edges, too. This makes it a bit confusing for a general interface. In general, a graph database should always be accessed via TinkerPop 3 (or higher), since every professional graph database supports TinkerPop. TinkerPop is for graph databases what jdbc is for relational databases. I tried to create a general NiFi processor for graph databases myself and then quit. Unlike relational databases, graph databases usually have many dependencies. You do not simply create a data set but search for a particular vertex (which may still have certain edges) and create further edges and vertices at that. And the search for the correct node is usually context-related. This makes it difficult to do something general for all requirements. In any case I am looking forward to your concept and how you want to solve it. It's definitely a good idea but hard to solve. Btw.: You forgot the most important graph database - Janusgraph. Mit freundlichen Grüßen / best regards Kay-Uwe Moosheimer > Am 12.05.2018 um 13:01 schrieb Mike Thomsen : > > I was wondering if anyone on the dev list had given much thought to graph > database support in NiFi. There are a lot of graph databases out there, and > many of them seem to be half-baked or barely supported. Narrowing it down, > it looks like the best candidates for a no fuss, decent sized graph that we > could build up with NiFi processors would be OrientDB, Neo4J and ArangoDB. > The first two are particularly attractive because they offer JDBC drivers > which opens the potential to making them even part of the standard > JDBC-based processors. > > Anyone have any opinions or insights on this issue? I might have to do > OrientDB anyway, but if someone has a good feel for the market and can make > recommendations that would be appreciated. > > Thanks, > > Mike
Re: Graph database support w/ NiFi
mike Do you mean support to send data to a graphdb? A really awesome case would be sending provenance data to one and building queries, etc... around it! I know mattyb would be all over that. Thanks On Sat, May 12, 2018, 7:02 AM Mike Thomsen wrote: > I was wondering if anyone on the dev list had given much thought to graph > database support in NiFi. There are a lot of graph databases out there, and > many of them seem to be half-baked or barely supported. Narrowing it down, > it looks like the best candidates for a no fuss, decent sized graph that we > could build up with NiFi processors would be OrientDB, Neo4J and ArangoDB. > The first two are particularly attractive because they offer JDBC drivers > which opens the potential to making them even part of the standard > JDBC-based processors. > > Anyone have any opinions or insights on this issue? I might have to do > OrientDB anyway, but if someone has a good feel for the market and can make > recommendations that would be appreciated. > > Thanks, > > Mike >
Graph database support w/ NiFi
I was wondering if anyone on the dev list had given much thought to graph database support in NiFi. There are a lot of graph databases out there, and many of them seem to be half-baked or barely supported. Narrowing it down, it looks like the best candidates for a no fuss, decent sized graph that we could build up with NiFi processors would be OrientDB, Neo4J and ArangoDB. The first two are particularly attractive because they offer JDBC drivers which opens the potential to making them even part of the standard JDBC-based processors. Anyone have any opinions or insights on this issue? I might have to do OrientDB anyway, but if someone has a good feel for the market and can make recommendations that would be appreciated. Thanks, Mike