How to use delta storage format

2020-03-27 Thread Paul Parker
We read data via JDBC from a database and want to save the results as a
delta table and then read them again. How can I realize this with Nifi and
Hive or Glue Metastore?


Re: Nifi zookeeper state migration to different cluster

2020-03-27 Thread Bryan Bende
If you are asking if this is the correct way to migrate the data from an
old ZK to a new one, then yes.

As far as the flow being stopped... if you don't stop it, then flow might
continue to modify the state in ZK while you are migrating it, so maybe
there is some value of 1 and that gets migrated to the new ZK, but then the
old flow runs and updates the value to 2 in your old ZK, now when the flow
runs in your new cluster it will start from 1 even though the old cluster
already processed 2.

On Fri, Mar 27, 2020 at 9:57 AM sanjeet rath  wrote:

> Hi,
>
> If someone could help me with my trailed query, then it would be
> really helpful.
> I eagerly await for your response.
>
> Regards,
> Sanjeet
>
> On Thu, Mar 26, 2020 at 3:20 PM sanjeet rath 
> wrote:
>
>>
>> Hi Team,
>>
>> I am trying to achieve Nifi migration to a different cluster using
>> zookeeper state migration (in case of failure of cluster)
>>
>> Steps i am doing.
>>
>> zk-migrator.sh -r -z
>> sourceHostname:sourceClientPort/sourceRootPath/components -f
>> /path/to/export/zk-source-data.json
>>
>> zk-migrator.sh -s -z
>> destinationHostname:destinationClientPort/destinationRootPath/components -f
>> /path/to/export/zk-source-data.json
>> also moved the flow.xml.gz, users.xml, authorization.xml from source
>> cluster to the destination cluster.
>>
>> Then its working fine, means its capturing the processor state id from
>> the old cluster.
>>
>> But my question is:
>>
>> Is the above approach is correct  for moving to new cluster with state
>> when there is failure in old cluster.
>> And one more thing, as per zk migration doc the flow need be stoped
>> before running the zk migration export script.
>> Is it mandatory ?
>>
>> Is there any other approach is there to achieve this, then please suggest,
>>
>> Thanks,
>> Sanjeet
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sanjeet Kumar Rath,
>> mob- +91 8777577470
>>
>>
>
> --
> Sanjeet Kumar Rath,
> mob- +91 8777577470
>
>


Re: Dashboards for reporting ingest status to users

2020-03-27 Thread Boris Tyukin
I do not think provenance data alone will make any sense even to you,
certainly not for your users. We followed advise from Pierre's blog
https://pierrevillard.com/best-of-nifi/ and it is a combination of naming
processors and having our own custom "AuditLog" processor that logs key
events, record counts, statuses etc. to the external MySQL table in append
only fashion.

Users can use this table to poll for events or build dashboards to report
specific ingest processes. We considered using Kafka as we use it already
for other projects but did not do it because it would be difficult for
BI/ETL developers using traditional tools to work with Kafka messages.
MySQL was the easiest option. We might switch to NoSQL DB at some point but
we have no issues with our volumes staying on mysql.

On Fri, Mar 27, 2020 at 8:31 AM Mike Thomsen  wrote:

> Thanks. I've done something similar in the past using Elasticsearch as the
> data store. I think we might start with that and hope that we don't get
> more nuanced requirements. I guess we could look at naming the processors
> after the steps and hope that that works to keep users happy.
>
> On Fri, Mar 27, 2020 at 8:22 AM Marc Parisi  wrote:
>
>> Hey Mike,
>>
>> I recently did something similar for a personal project. I ingested
>> Provenance data into a NoSQL store ( through a reporting task that also
>> indexed the data ), primarily querying upon the ProvenanceEventType.
>>
>> I tracked some piece of information ( in my case the original file name
>> with an identifier ) and queried for event types to get an idea of what
>> occurred - for example I looked for ROUTE and ATTRIBUTES_MODIFIED to
>> determine which path my data took.
>>
>> It was very easy to monitor the provenance event types for DROP and to
>> check if data succeeded or failed. I didn't concern myself with diving into
>> why data failed because I was worried that would be a bit more complex and
>> requires a bit more thought.
>>
>> I originally had an ingest processor perform this notification but moved
>> to a provenance reporting task as it just worked so well ( at least for my
>> purposes ).
>>
>> In my case the dashboard was a simple table that showed what file(s) I
>> uploaded and their state, flashing red if data took more than a
>> configurable period of time to complete ( fail or success). The table
>> linked to a separate query interface that would allow a deeper dive into
>> the provenance records so that i can dive into a problem set further if
>> failure or extreme latency occurred.  it was super simple...
>>
>> Hope this helps,
>> Marc
>>
>> On Fri, Mar 27, 2020 at 7:51 AM Mike Thomsen 
>> wrote:
>>
>>> Has anyone ever created good dashboards on top of NiFi flows or
>>> provenance data that will report the status of a flowfile back to the user?
>>> Our client would like to give users the ability to feed Nifi data and then
>>> get a basic view of where it is. It can be fairly simplistic, like
>>> "Started..." "Processing..." "Done..." for now, but I was wondering if
>>> anyone has any good patterns for this before I dive into it myself.
>>>
>>> My current thought here is to create a new processor bundle that would
>>> add a new processor called "ProgressGateProcessor" that would allow users
>>> in one step to signal to an external application or data store the status
>>> of a flowfile, so you don't have to mix in process groups.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>


Re: Nifi zookeeper state migration to different cluster

2020-03-27 Thread sanjeet rath
Hi,

If someone could help me with my trailed query, then it would be
really helpful.
I eagerly await for your response.

Regards,
Sanjeet

On Thu, Mar 26, 2020 at 3:20 PM sanjeet rath  wrote:

>
> Hi Team,
>
> I am trying to achieve Nifi migration to a different cluster using
> zookeeper state migration (in case of failure of cluster)
>
> Steps i am doing.
>
> zk-migrator.sh -r -z
> sourceHostname:sourceClientPort/sourceRootPath/components -f
> /path/to/export/zk-source-data.json
>
> zk-migrator.sh -s -z
> destinationHostname:destinationClientPort/destinationRootPath/components -f
> /path/to/export/zk-source-data.json
> also moved the flow.xml.gz, users.xml, authorization.xml from source
> cluster to the destination cluster.
>
> Then its working fine, means its capturing the processor state id from the
> old cluster.
>
> But my question is:
>
> Is the above approach is correct  for moving to new cluster with state
> when there is failure in old cluster.
> And one more thing, as per zk migration doc the flow need be stoped before
> running the zk migration export script.
> Is it mandatory ?
>
> Is there any other approach is there to achieve this, then please suggest,
>
> Thanks,
> Sanjeet
>
>
>
>
>
>
>
>
>
>
> --
> Sanjeet Kumar Rath,
> mob- +91 8777577470
>
>

-- 
Sanjeet Kumar Rath,
mob- +91 8777577470


Re: Dashboards for reporting ingest status to users

2020-03-27 Thread Mike Thomsen
Thanks. I've done something similar in the past using Elasticsearch as the
data store. I think we might start with that and hope that we don't get
more nuanced requirements. I guess we could look at naming the processors
after the steps and hope that that works to keep users happy.

On Fri, Mar 27, 2020 at 8:22 AM Marc Parisi  wrote:

> Hey Mike,
>
> I recently did something similar for a personal project. I ingested
> Provenance data into a NoSQL store ( through a reporting task that also
> indexed the data ), primarily querying upon the ProvenanceEventType.
>
> I tracked some piece of information ( in my case the original file name
> with an identifier ) and queried for event types to get an idea of what
> occurred - for example I looked for ROUTE and ATTRIBUTES_MODIFIED to
> determine which path my data took.
>
> It was very easy to monitor the provenance event types for DROP and to
> check if data succeeded or failed. I didn't concern myself with diving into
> why data failed because I was worried that would be a bit more complex and
> requires a bit more thought.
>
> I originally had an ingest processor perform this notification but moved
> to a provenance reporting task as it just worked so well ( at least for my
> purposes ).
>
> In my case the dashboard was a simple table that showed what file(s) I
> uploaded and their state, flashing red if data took more than a
> configurable period of time to complete ( fail or success). The table
> linked to a separate query interface that would allow a deeper dive into
> the provenance records so that i can dive into a problem set further if
> failure or extreme latency occurred.  it was super simple...
>
> Hope this helps,
> Marc
>
> On Fri, Mar 27, 2020 at 7:51 AM Mike Thomsen 
> wrote:
>
>> Has anyone ever created good dashboards on top of NiFi flows or
>> provenance data that will report the status of a flowfile back to the user?
>> Our client would like to give users the ability to feed Nifi data and then
>> get a basic view of where it is. It can be fairly simplistic, like
>> "Started..." "Processing..." "Done..." for now, but I was wondering if
>> anyone has any good patterns for this before I dive into it myself.
>>
>> My current thought here is to create a new processor bundle that would
>> add a new processor called "ProgressGateProcessor" that would allow users
>> in one step to signal to an external application or data store the status
>> of a flowfile, so you don't have to mix in process groups.
>>
>> Thanks,
>>
>> Mike
>>
>


Re: Dashboards for reporting ingest status to users

2020-03-27 Thread Marc Parisi
Hey Mike,

I recently did something similar for a personal project. I ingested
Provenance data into a NoSQL store ( through a reporting task that also
indexed the data ), primarily querying upon the ProvenanceEventType.

I tracked some piece of information ( in my case the original file name
with an identifier ) and queried for event types to get an idea of what
occurred - for example I looked for ROUTE and ATTRIBUTES_MODIFIED to
determine which path my data took.

It was very easy to monitor the provenance event types for DROP and to
check if data succeeded or failed. I didn't concern myself with diving into
why data failed because I was worried that would be a bit more complex and
requires a bit more thought.

I originally had an ingest processor perform this notification but moved to
a provenance reporting task as it just worked so well ( at least for my
purposes ).

In my case the dashboard was a simple table that showed what file(s) I
uploaded and their state, flashing red if data took more than a
configurable period of time to complete ( fail or success). The table
linked to a separate query interface that would allow a deeper dive into
the provenance records so that i can dive into a problem set further if
failure or extreme latency occurred.  it was super simple...

Hope this helps,
Marc

On Fri, Mar 27, 2020 at 7:51 AM Mike Thomsen  wrote:

> Has anyone ever created good dashboards on top of NiFi flows or provenance
> data that will report the status of a flowfile back to the user? Our client
> would like to give users the ability to feed Nifi data and then get a basic
> view of where it is. It can be fairly simplistic, like "Started..."
> "Processing..." "Done..." for now, but I was wondering if anyone has any
> good patterns for this before I dive into it myself.
>
> My current thought here is to create a new processor bundle that would add
> a new processor called "ProgressGateProcessor" that would allow users in
> one step to signal to an external application or data store the status of a
> flowfile, so you don't have to mix in process groups.
>
> Thanks,
>
> Mike
>


Dashboards for reporting ingest status to users

2020-03-27 Thread Mike Thomsen
Has anyone ever created good dashboards on top of NiFi flows or provenance
data that will report the status of a flowfile back to the user? Our client
would like to give users the ability to feed Nifi data and then get a basic
view of where it is. It can be fairly simplistic, like "Started..."
"Processing..." "Done..." for now, but I was wondering if anyone has any
good patterns for this before I dive into it myself.

My current thought here is to create a new processor bundle that would add
a new processor called "ProgressGateProcessor" that would allow users in
one step to signal to an external application or data store the status of a
flowfile, so you don't have to mix in process groups.

Thanks,

Mike