How to use delta storage format
We read data via JDBC from a database and want to save the results as a delta table and then read them again. How can I realize this with Nifi and Hive or Glue Metastore?
Re: Nifi zookeeper state migration to different cluster
If you are asking if this is the correct way to migrate the data from an old ZK to a new one, then yes. As far as the flow being stopped... if you don't stop it, then flow might continue to modify the state in ZK while you are migrating it, so maybe there is some value of 1 and that gets migrated to the new ZK, but then the old flow runs and updates the value to 2 in your old ZK, now when the flow runs in your new cluster it will start from 1 even though the old cluster already processed 2. On Fri, Mar 27, 2020 at 9:57 AM sanjeet rath wrote: > Hi, > > If someone could help me with my trailed query, then it would be > really helpful. > I eagerly await for your response. > > Regards, > Sanjeet > > On Thu, Mar 26, 2020 at 3:20 PM sanjeet rath > wrote: > >> >> Hi Team, >> >> I am trying to achieve Nifi migration to a different cluster using >> zookeeper state migration (in case of failure of cluster) >> >> Steps i am doing. >> >> zk-migrator.sh -r -z >> sourceHostname:sourceClientPort/sourceRootPath/components -f >> /path/to/export/zk-source-data.json >> >> zk-migrator.sh -s -z >> destinationHostname:destinationClientPort/destinationRootPath/components -f >> /path/to/export/zk-source-data.json >> also moved the flow.xml.gz, users.xml, authorization.xml from source >> cluster to the destination cluster. >> >> Then its working fine, means its capturing the processor state id from >> the old cluster. >> >> But my question is: >> >> Is the above approach is correct for moving to new cluster with state >> when there is failure in old cluster. >> And one more thing, as per zk migration doc the flow need be stoped >> before running the zk migration export script. >> Is it mandatory ? >> >> Is there any other approach is there to achieve this, then please suggest, >> >> Thanks, >> Sanjeet >> >> >> >> >> >> >> >> >> >> >> -- >> Sanjeet Kumar Rath, >> mob- +91 8777577470 >> >> > > -- > Sanjeet Kumar Rath, > mob- +91 8777577470 > >
Re: Dashboards for reporting ingest status to users
I do not think provenance data alone will make any sense even to you, certainly not for your users. We followed advise from Pierre's blog https://pierrevillard.com/best-of-nifi/ and it is a combination of naming processors and having our own custom "AuditLog" processor that logs key events, record counts, statuses etc. to the external MySQL table in append only fashion. Users can use this table to poll for events or build dashboards to report specific ingest processes. We considered using Kafka as we use it already for other projects but did not do it because it would be difficult for BI/ETL developers using traditional tools to work with Kafka messages. MySQL was the easiest option. We might switch to NoSQL DB at some point but we have no issues with our volumes staying on mysql. On Fri, Mar 27, 2020 at 8:31 AM Mike Thomsen wrote: > Thanks. I've done something similar in the past using Elasticsearch as the > data store. I think we might start with that and hope that we don't get > more nuanced requirements. I guess we could look at naming the processors > after the steps and hope that that works to keep users happy. > > On Fri, Mar 27, 2020 at 8:22 AM Marc Parisi wrote: > >> Hey Mike, >> >> I recently did something similar for a personal project. I ingested >> Provenance data into a NoSQL store ( through a reporting task that also >> indexed the data ), primarily querying upon the ProvenanceEventType. >> >> I tracked some piece of information ( in my case the original file name >> with an identifier ) and queried for event types to get an idea of what >> occurred - for example I looked for ROUTE and ATTRIBUTES_MODIFIED to >> determine which path my data took. >> >> It was very easy to monitor the provenance event types for DROP and to >> check if data succeeded or failed. I didn't concern myself with diving into >> why data failed because I was worried that would be a bit more complex and >> requires a bit more thought. >> >> I originally had an ingest processor perform this notification but moved >> to a provenance reporting task as it just worked so well ( at least for my >> purposes ). >> >> In my case the dashboard was a simple table that showed what file(s) I >> uploaded and their state, flashing red if data took more than a >> configurable period of time to complete ( fail or success). The table >> linked to a separate query interface that would allow a deeper dive into >> the provenance records so that i can dive into a problem set further if >> failure or extreme latency occurred. it was super simple... >> >> Hope this helps, >> Marc >> >> On Fri, Mar 27, 2020 at 7:51 AM Mike Thomsen >> wrote: >> >>> Has anyone ever created good dashboards on top of NiFi flows or >>> provenance data that will report the status of a flowfile back to the user? >>> Our client would like to give users the ability to feed Nifi data and then >>> get a basic view of where it is. It can be fairly simplistic, like >>> "Started..." "Processing..." "Done..." for now, but I was wondering if >>> anyone has any good patterns for this before I dive into it myself. >>> >>> My current thought here is to create a new processor bundle that would >>> add a new processor called "ProgressGateProcessor" that would allow users >>> in one step to signal to an external application or data store the status >>> of a flowfile, so you don't have to mix in process groups. >>> >>> Thanks, >>> >>> Mike >>> >>
Re: Nifi zookeeper state migration to different cluster
Hi, If someone could help me with my trailed query, then it would be really helpful. I eagerly await for your response. Regards, Sanjeet On Thu, Mar 26, 2020 at 3:20 PM sanjeet rath wrote: > > Hi Team, > > I am trying to achieve Nifi migration to a different cluster using > zookeeper state migration (in case of failure of cluster) > > Steps i am doing. > > zk-migrator.sh -r -z > sourceHostname:sourceClientPort/sourceRootPath/components -f > /path/to/export/zk-source-data.json > > zk-migrator.sh -s -z > destinationHostname:destinationClientPort/destinationRootPath/components -f > /path/to/export/zk-source-data.json > also moved the flow.xml.gz, users.xml, authorization.xml from source > cluster to the destination cluster. > > Then its working fine, means its capturing the processor state id from the > old cluster. > > But my question is: > > Is the above approach is correct for moving to new cluster with state > when there is failure in old cluster. > And one more thing, as per zk migration doc the flow need be stoped before > running the zk migration export script. > Is it mandatory ? > > Is there any other approach is there to achieve this, then please suggest, > > Thanks, > Sanjeet > > > > > > > > > > > -- > Sanjeet Kumar Rath, > mob- +91 8777577470 > > -- Sanjeet Kumar Rath, mob- +91 8777577470
Re: Dashboards for reporting ingest status to users
Thanks. I've done something similar in the past using Elasticsearch as the data store. I think we might start with that and hope that we don't get more nuanced requirements. I guess we could look at naming the processors after the steps and hope that that works to keep users happy. On Fri, Mar 27, 2020 at 8:22 AM Marc Parisi wrote: > Hey Mike, > > I recently did something similar for a personal project. I ingested > Provenance data into a NoSQL store ( through a reporting task that also > indexed the data ), primarily querying upon the ProvenanceEventType. > > I tracked some piece of information ( in my case the original file name > with an identifier ) and queried for event types to get an idea of what > occurred - for example I looked for ROUTE and ATTRIBUTES_MODIFIED to > determine which path my data took. > > It was very easy to monitor the provenance event types for DROP and to > check if data succeeded or failed. I didn't concern myself with diving into > why data failed because I was worried that would be a bit more complex and > requires a bit more thought. > > I originally had an ingest processor perform this notification but moved > to a provenance reporting task as it just worked so well ( at least for my > purposes ). > > In my case the dashboard was a simple table that showed what file(s) I > uploaded and their state, flashing red if data took more than a > configurable period of time to complete ( fail or success). The table > linked to a separate query interface that would allow a deeper dive into > the provenance records so that i can dive into a problem set further if > failure or extreme latency occurred. it was super simple... > > Hope this helps, > Marc > > On Fri, Mar 27, 2020 at 7:51 AM Mike Thomsen > wrote: > >> Has anyone ever created good dashboards on top of NiFi flows or >> provenance data that will report the status of a flowfile back to the user? >> Our client would like to give users the ability to feed Nifi data and then >> get a basic view of where it is. It can be fairly simplistic, like >> "Started..." "Processing..." "Done..." for now, but I was wondering if >> anyone has any good patterns for this before I dive into it myself. >> >> My current thought here is to create a new processor bundle that would >> add a new processor called "ProgressGateProcessor" that would allow users >> in one step to signal to an external application or data store the status >> of a flowfile, so you don't have to mix in process groups. >> >> Thanks, >> >> Mike >> >
Re: Dashboards for reporting ingest status to users
Hey Mike, I recently did something similar for a personal project. I ingested Provenance data into a NoSQL store ( through a reporting task that also indexed the data ), primarily querying upon the ProvenanceEventType. I tracked some piece of information ( in my case the original file name with an identifier ) and queried for event types to get an idea of what occurred - for example I looked for ROUTE and ATTRIBUTES_MODIFIED to determine which path my data took. It was very easy to monitor the provenance event types for DROP and to check if data succeeded or failed. I didn't concern myself with diving into why data failed because I was worried that would be a bit more complex and requires a bit more thought. I originally had an ingest processor perform this notification but moved to a provenance reporting task as it just worked so well ( at least for my purposes ). In my case the dashboard was a simple table that showed what file(s) I uploaded and their state, flashing red if data took more than a configurable period of time to complete ( fail or success). The table linked to a separate query interface that would allow a deeper dive into the provenance records so that i can dive into a problem set further if failure or extreme latency occurred. it was super simple... Hope this helps, Marc On Fri, Mar 27, 2020 at 7:51 AM Mike Thomsen wrote: > Has anyone ever created good dashboards on top of NiFi flows or provenance > data that will report the status of a flowfile back to the user? Our client > would like to give users the ability to feed Nifi data and then get a basic > view of where it is. It can be fairly simplistic, like "Started..." > "Processing..." "Done..." for now, but I was wondering if anyone has any > good patterns for this before I dive into it myself. > > My current thought here is to create a new processor bundle that would add > a new processor called "ProgressGateProcessor" that would allow users in > one step to signal to an external application or data store the status of a > flowfile, so you don't have to mix in process groups. > > Thanks, > > Mike >
Dashboards for reporting ingest status to users
Has anyone ever created good dashboards on top of NiFi flows or provenance data that will report the status of a flowfile back to the user? Our client would like to give users the ability to feed Nifi data and then get a basic view of where it is. It can be fairly simplistic, like "Started..." "Processing..." "Done..." for now, but I was wondering if anyone has any good patterns for this before I dive into it myself. My current thought here is to create a new processor bundle that would add a new processor called "ProgressGateProcessor" that would allow users in one step to signal to an external application or data store the status of a flowfile, so you don't have to mix in process groups. Thanks, Mike