Unable to get public no-arg constructor

2024-03-08 Thread u...@moosheimer.com

I use NiFi 1.24 as a standalone.
Today I installed Apache Drill and Apache Zookeeper as a daemon.

Of course, I immediately tested whether everything works again after a
reboot.
And lo and behold: of course not.

I suddenly get the error message in the nifi-bootstrap.log:

2024-03-08 16:10:05,430 INFO [main] org.apache.nifi.bootstrap.Command
Launched Apache NiFi with Process ID 8804
2024-03-08 16:10:05,461 INFO [NiFi logging handler]
org.apache.nifi.StdOut Listening for transport dt_socket at address: 8000
2024-03-08 16:10:05,785 INFO [NiFi Bootstrap Command Listener]
org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening
for Bootstrap requests on port 41543
2024-03-08 16:10:09,567 ERROR [NiFi logging handler]
org.apache.nifi.StdErr Failed to start web server:
javax.servlet.ServletContainerInitializer:
com.sun.jersey.server.impl.container.servlet.JerseyServletContainerInitializer
Unable to get public no-arg constructor
2024-03-08 16:10:09,567 ERROR [NiFi logging handler]
org.apache.nifi.StdErr Shutting down...
2024-03-08 16:10:10,432 INFO [main] org.apache.nifi.bootstrap.RunNiFi
NiFi never started. Will not restart NiFi

Can anyone tell me what this means?
I can't see that any port is used that NiFi needs.
And the statement "Unable to get public no-arg constructor" is
surprising to me, since NiFi was already working before.

But still, I have broken something.
But what?

-- Kay-Uwe



Re: UpdateAttribute Failure Relationship

2024-02-09 Thread u...@moosheimer.com
Yes, that certainly involves a lot of effort.
I wonder whether it's a good idea to fix a possible "design flaw" with a 
construct that is neither consistent nor easy to handle.

I also think the argument that you have to adjust a lot in the flow is 
questionable. 
With Release 2.0, so much has to be adapted (cron jobs, database schema 
controller etc.) that it hardly matters whether you adapt the UpdateAttribute 
processors or not.

The effort to adapt and test everything is already present with the upgrade to 
2.0.
Doesn't Release 2.0 actually offer a really good opportunity to adapt 
everything that has been put off for a long time?

Who knows when the next opportunity will come? With Release 3.0? Or not at all, 
because NiFi is becoming more and more widespread and the customer lobby is 
getting bigger and bigger? Cloudera will find it increasingly difficult to pack 
"large-scale" customizations into a release.

Whether we wait 3-4 months or 5-6 months for the 2.0 release doesn't really 
matter, does it?
But what do I know ... the NiFi experts and full-time maintainers will 
certainly do the right thing.

-- Uwe

> Am 09.02.2024 um 22:04 schrieb Michael Moser :
> 
> These are great ideas!  I do love Adam's dynamic relationship idea over
> creating a failure relationship that is auto-terminated. This would make
> flow migrations in Registry and NiFi easier.
> 
> After some more pondering, I (slowly) realized that this problem affects
> more than UpdateAttribute, though.  You can easily get expression language
> (explang) exceptions anywhere that explang is used.  I'm sure all of us can
> create a RouteOnAttribute configuration that causes an explang exception
> which rolls back the flowfile.  If we spend so much effort on a solution it
> would be a shame for that to only apply to UpdateAttribute.
> 
> For this reason I would favor Matt's idea of a try() or trycatch() explang
> method.  How it might work isn't intuitive, though.  Would we have to make
> it aware of a data type to return?  RouteOnAttribute expects boolean but
> UpdateAttribute expects string (or convertible to string).
> 
> I can see why the rollback/admin yield solution has been status quo for so
> long.
> 
> -- Mike
> 
> 
> 
> 
>> On Fri, Feb 9, 2024 at 10:03 AM u...@moosheimer.com 
>> wrote:
>> 
>> 
>> --Uwe
>> 
>>>> Am 09.02.2024 um 13:50 schrieb Mike Thomsen :
>>> 
>>> How about a third option which is to provide three options:
>>> 
>>> 1) Default - status quo, exceptions cause it to yield
>>> 2) Exception = moves forward to success w/ an error attribute, an error
>> log
>>> statement that triggers a bulletin, etc to let data manages know what's
>>> happening.
>>> 3) Exception = moves to a failure relationship that is otherwise
>>> autoterminated
>>> 
>>>> On Thu, Feb 8, 2024 at 7:12 PM Matt Burgess 
>> wrote:
>>>> 
>>>> Mike's option #2 seems solid but would take a lot of work and there will
>>>> always be inputs we don't account for. I support that work but in code
>>>> sometimes we just do a "catch(Throwable)" just so it doesn't blow up.
>> What
>>>> about a subjectless "try" or "trycatch" function you can wrap around
>> your
>>>> whole expression? If no exception is thrown, the evaluated value will be
>>>> returned but if one is thrown, you can provide some alternate value that
>>>> you can check downstream. As this is optional it would retain the
>> current
>>>> behavior unless you use it, and then it takes the place of all those
>>>> ifElse(isXYZValid()) calls we'd need throughout the expression.
>>>> 
>>>> Regards,
>>>> Matt
>>>> 
>>>> 
>>>> On Wed, Feb 7, 2024 at 8:11 PM Phillip Lord 
>>>> wrote:
>>>> 
>>>>> IMO... UpdateAttribute has been around since the beginning of time, I
>>>> can't
>>>>> see adding a failure relationship. At the same time I understand the
>> want
>>>>> for such exceptions to be handled more gracefully rather than rolling
>>>> back
>>>>> indefinitely.
>>>>> I'd vote in favor of considering Moser's option #2... and being able to
>>>>> implement an "if this then that" logic within your flow.
>>>>> 
>>>>> Also just thinking... for every UA failure you have to consider a good
>>>>> failure-management strategy, which MIGHT add a lot of noise to the
>> flow.
>>>>> Something that might otherwi

Re: UpdateAttribute Failure Relationship

2024-02-09 Thread u...@moosheimer.com

--Uwe

> Am 09.02.2024 um 13:50 schrieb Mike Thomsen :
> 
> How about a third option which is to provide three options:
> 
> 1) Default - status quo, exceptions cause it to yield
> 2) Exception = moves forward to success w/ an error attribute, an error log
> statement that triggers a bulletin, etc to let data manages know what's
> happening.
> 3) Exception = moves to a failure relationship that is otherwise
> autoterminated
> 
>> On Thu, Feb 8, 2024 at 7:12 PM Matt Burgess  wrote:
>> 
>> Mike's option #2 seems solid but would take a lot of work and there will
>> always be inputs we don't account for. I support that work but in code
>> sometimes we just do a "catch(Throwable)" just so it doesn't blow up. What
>> about a subjectless "try" or "trycatch" function you can wrap around your
>> whole expression? If no exception is thrown, the evaluated value will be
>> returned but if one is thrown, you can provide some alternate value that
>> you can check downstream. As this is optional it would retain the current
>> behavior unless you use it, and then it takes the place of all those
>> ifElse(isXYZValid()) calls we'd need throughout the expression.
>> 
>> Regards,
>> Matt
>> 
>> 
>> On Wed, Feb 7, 2024 at 8:11 PM Phillip Lord 
>> wrote:
>> 
>>> IMO... UpdateAttribute has been around since the beginning of time, I
>> can't
>>> see adding a failure relationship. At the same time I understand the want
>>> for such exceptions to be handled more gracefully rather than rolling
>> back
>>> indefinitely.
>>> I'd vote in favor of considering Moser's option #2... and being able to
>>> implement an "if this then that" logic within your flow.
>>> 
>>> Also just thinking... for every UA failure you have to consider a good
>>> failure-management strategy, which MIGHT add a lot of noise to the flow.
>>> Something that might otherwise easily be identified in a downstream
>>> component and/or database/etc.
>>> 
>>> My 2 cents **
>>> Phil
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Wed, Feb 7, 2024 at 5:18 PM Adam Taft  wrote:
>>> 
>>>> Or better, the failure relationship just doesn't even exist until the
>>>> property "Has Failure Relationship" is set to True.  This involves
>>> updating
>>>> UpdateAttribute to have dynamic relationships (the failure
>> relationships
>>>> appearing on true), which isn't hard to do in processor code.
>>>> 
>>>> This has the advantage of being backwards compatible for existing users
>>> and
>>>> allows the failure relationship to exist for new configurations.
>>> Obviously
>>>> the processor would need an update to catch Expression Language
>>> exceptions
>>>> and then route conditionally to failure.
>>>> 
>>>> Just thinking out loud.
>>>> /Adam
>>>> 
>>>> 
>>>> 
>>>> On Wed, Feb 7, 2024 at 1:48 PM u...@moosheimer.com 
>>>> wrote:
>>>> 
>>>>> Hi Mike,
>>>>> 
>>>>> How about the option of introducing a new property that decides
>> whether
>>>> to
>>>>> route to the 'failure' relationship in the event of an error?
>>>>> If this property is set to false, then the 'failure' relationship is
>>>>> automatically set to 'terminate' (since nothing is routed there
>>> anyway).
>>>>> 
>>>>> Then everyone can decide whether and where they want to use this new
>>>>> feature or not.
>>>>> All other options would still be possible with such a solution.
>>>>> 
>>>>> -- Uwe
>>>>> 
>>>>>> Am 07.02.2024 um 22:15 schrieb Michael Moser :
>>>>>> 
>>>>>> Hi Dan,
>>>>>> 
>>>>>> This has been discussed in the past, as you found with those two
>> Jira
>>>>>> tickets.  Personally, I'm still not sure whether a new failure
>>>>> relationship
>>>>>> on UpdateAttribute in 2.0 is a good approach.  I have heard from
>> some
>>>>>> dataflow managers that would not want to go through their entire
>>> graph
>>>>> when
>>>>>> upgrading to 2.0 and update every UpdateAttribute configuration.
>>>>>> 
>>>>>> I have heard some 

Re: UpdateAttribute Failure Relationship

2024-02-07 Thread u...@moosheimer.com
Hi Mike,

How about the option of introducing a new property that decides whether to 
route to the 'failure' relationship in the event of an error?
If this property is set to false, then the 'failure' relationship is 
automatically set to 'terminate' (since nothing is routed there anyway).

Then everyone can decide whether and where they want to use this new feature or 
not.
All other options would still be possible with such a solution.

-- Uwe

> Am 07.02.2024 um 22:15 schrieb Michael Moser :
> 
> Hi Dan,
> 
> This has been discussed in the past, as you found with those two Jira
> tickets.  Personally, I'm still not sure whether a new failure relationship
> on UpdateAttribute in 2.0 is a good approach.  I have heard from some
> dataflow managers that would not want to go through their entire graph when
> upgrading to 2.0 and update every UpdateAttribute configuration.
> 
> I have heard some alternatives to a 'failure' relationship that I would
> like to share as options.
> 
> 1) Add a new property to UpdateAttribute that controls whether a flowfile
> that causes an expression language exception either yields and rolls back,
> or silently fails to update the attribute and sends the flowfile to
> success.  I personally don't like this, because the use case for "silent
> failure" seems really like a rarely needed edge case.
> 
> 2) Identify all expression language methods that can throw an exception and
> document that fact in the Expression Language Guide (some methods already
> mention they can throw an "exception bulletin").  Then implement new
> expression methods to check if an expression could fail, and use that in
> UpdateAttribute advanced rules.  For example, if the format() and
> formatInstant() methods can fail on a negative number, we create a new
> method such as isValidMilliseconds().  This already exists for some cases,
> such as isJson() which can do a quick check of some value before calling
> jsonPathDelete() on it.
> 
> I'm curious to hear more thoughts on this.
> 
> -- Mike
> 
> 
> 
>> On Wed, Jan 31, 2024 at 11:02 AM Dan S  wrote:
>> 
>> My team is requesting a failure relationship for UpdateAttribute as seen in
>> NIFI-5448  and NIFI-6344
>>  as we are
>> experiencing the same problem where a NIFI Expression Language is throwing
>> an exception. In the PR for NIFI-5448 it was mentioned this feature would
>> have to wait until NIFI 2.0.0. I wanted to know if there is any active work
>> regarding this and whether eventually there will be a failure relationship
>> added to UpdateAttribute?
>> 



Re: PGVector and Database Driver

2023-09-01 Thread u...@moosheimer.com

I have found the problem.
To have a "clean" installation, I have the following directory structure

/opt/nifi/nifi-1.23.2
/opt/nifi/current (symbolic link to the current version)
/opt/nifi/driver
/opt/nifi/extensions

I defined the ../extension and ../driver directories separately, so that 
when I upgrade NiFi, I only have to adjust the directories in the config.


Then in the DBCPConnectionPool I specify where the driver is under 
"Database Driver Location(s)" -> "/opt/nifi/driver/postgresql-42.6.0.jar".

This works fine, but PGVector doesn't like it.

If I copy the driver to /opt/nifi/current/lib and leave "Database Driver 
Location(s)" empty, everything works fine.

Also addVectorType(con) is not needed.

Interesting. I don't really understand why, but I accept that some 
things are closed to me :)


Maybe my explanation will help if other developers have similar problems.
And maybe there should be a "don't do" chapter in the documentation, 
pointing out that the JDBC driver should be placed in the ../lib directory.
If someone can tell me why the behavior is like this, I would be happy 
to learn something.


Thanks again for your help.

Regards,
Uwe

On 01.09.23 05:59, Matt Burgess wrote:

Maybe this [1]? Perhaps you have to call unwrap() yourself in this
case. IIRC you don't have access to the DataSource but you can check
it directly on the connection.

Regards,
Matt

[1] 
https://stackoverflow.com/questions/36986653/cast-java-sql-connection-to-pgconnection

On Thu, Aug 31, 2023 at 8:15 PM u...@moosheimer.com  wrote:

Mark & Matt,

Thanks for the quick help. I really appreciate it.

PGvector.addVectorType(con) returns the following:
*java.sql.SQLException: Cannot unwrap to org.postgresql.PGConnection*

Could this be a connection pool issue?

Interestingly, I didn't call addVectorType() at all in my test java code
and it still works?!
I'll have to check again ... maybe I'm not seeing it correctly anymore.
It is already 2:05 a.m. here.


Regards,
Uwe


java.sql.SQLException: Cannot unwrap to org.postgresql.PGConnection

On 31.08.23 18:53, Matt Burgess wrote:

This means the JDBC driver you're using does not support the use of
the two-argument setObject() call when the object is a PGVector. Did
you register the Vector type by calling:

PGvector.addVectorType(conn);

The documentation [1] says that the two-argument setObject() should
work if you have registered the Vector type.

Regards,
Matt

[1]https://github.com/pgvector/pgvector-java

On Thu, Aug 31, 2023 at 12:01 PM Mark Payne  wrote:

Hey Uwe,

The DBCPConnectionPool returns a java.sql.Connection. From that you’d create a 
Statement. So I’m a little confused when you say that you’ve got it working in 
Pure JDBC but not with NiFi, as the class returned IS pure JDBC. Perhaps you 
can share a code snippet of what you’re doing in the “Pure JDBC” route that is 
working versus what you’re doing in the NiFi processor that’s not working?

Thanks
-Mark



On Aug 31, 2023, at 10:58 AM,u...@moosheimer.com  wrote:

Hi,

I am currently writing a processor to write OpenAI embeddings to Postgres.
I am using DBCPConnectionPool for this.
I use Maven to integrate PGVector (https://github.com/pgvector/pgvector).

With pure JDBC this works fine. With the database classes from NiFi I get the 
error:
*Cannot infer the SQL type to use for an instance of com.pgvector.PGvector. Use 
setObject() with an explicit Types value to specify the type to use.*

I use -> setObject (5, new PGvector(embeddingArray)).
embeddingArray is defined as: float[] embeddingArray

Of course I know why I get the error from NiFi and not from the JDBC driver, 
but unfortunately this knowledge does not help me.

Can anyone tell me what SQLType I need to specify for this?
I have searched the internet and the NiFi sources on GitHub for several hours 
now and have found nothing.

One option would be to use native JDBC and ignore the ConnectionPool. But that 
would be a very bad style in my opinion.
Perhaps there is a better solution?

Any help, especially from Matt B., is appreciated as I'm at a loss.
Thanks guys.


Re: PGVector and Database Driver

2023-08-31 Thread u...@moosheimer.com

Mark & Matt,

Thanks for the quick help. I really appreciate it.

PGvector.addVectorType(con) returns the following:
*java.sql.SQLException: Cannot unwrap to org.postgresql.PGConnection*

Could this be a connection pool issue?

Interestingly, I didn't call addVectorType() at all in my test java code 
and it still works?!
I'll have to check again ... maybe I'm not seeing it correctly anymore. 
It is already 2:05 a.m. here.



Regards,
Uwe


java.sql.SQLException: Cannot unwrap to org.postgresql.PGConnection

On 31.08.23 18:53, Matt Burgess wrote:

This means the JDBC driver you're using does not support the use of
the two-argument setObject() call when the object is a PGVector. Did
you register the Vector type by calling:

PGvector.addVectorType(conn);

The documentation [1] says that the two-argument setObject() should
work if you have registered the Vector type.

Regards,
Matt

[1]https://github.com/pgvector/pgvector-java

On Thu, Aug 31, 2023 at 12:01 PM Mark Payne  wrote:

Hey Uwe,

The DBCPConnectionPool returns a java.sql.Connection. From that you’d create a 
Statement. So I’m a little confused when you say that you’ve got it working in 
Pure JDBC but not with NiFi, as the class returned IS pure JDBC. Perhaps you 
can share a code snippet of what you’re doing in the “Pure JDBC” route that is 
working versus what you’re doing in the NiFi processor that’s not working?

Thanks
-Mark



On Aug 31, 2023, at 10:58 AM,u...@moosheimer.com  wrote:

Hi,

I am currently writing a processor to write OpenAI embeddings to Postgres.
I am using DBCPConnectionPool for this.
I use Maven to integrate PGVector (https://github.com/pgvector/pgvector).

With pure JDBC this works fine. With the database classes from NiFi I get the 
error:
*Cannot infer the SQL type to use for an instance of com.pgvector.PGvector. Use 
setObject() with an explicit Types value to specify the type to use.*

I use -> setObject (5, new PGvector(embeddingArray)).
embeddingArray is defined as: float[] embeddingArray

Of course I know why I get the error from NiFi and not from the JDBC driver, 
but unfortunately this knowledge does not help me.

Can anyone tell me what SQLType I need to specify for this?
I have searched the internet and the NiFi sources on GitHub for several hours 
now and have found nothing.

One option would be to use native JDBC and ignore the ConnectionPool. But that 
would be a very bad style in my opinion.
Perhaps there is a better solution?

Any help, especially from Matt B., is appreciated as I'm at a loss.
Thanks guys.

PGVector and Database Driver

2023-08-31 Thread u...@moosheimer.com

Hi,

I am currently writing a processor to write OpenAI embeddings to Postgres.
I am using DBCPConnectionPool for this.
I use Maven to integrate PGVector (https://github.com/pgvector/pgvector).

With pure JDBC this works fine. With the database classes from NiFi I 
get the error:
*Cannot infer the SQL type to use for an instance of 
com.pgvector.PGvector. Use setObject() with an explicit Types value to 
specify the type to use.*


I use -> setObject (5, new PGvector(embeddingArray)).
embeddingArray is defined as: float[] embeddingArray

Of course I know why I get the error from NiFi and not from the JDBC 
driver, but unfortunately this knowledge does not help me.


Can anyone tell me what SQLType I need to specify for this?
I have searched the internet and the NiFi sources on GitHub for several 
hours now and have found nothing.


One option would be to use native JDBC and ignore the ConnectionPool. 
But that would be a very bad style in my opinion.

Perhaps there is a better solution?

Any help, especially from Matt B., is appreciated as I'm at a loss.
Thanks guys.


Re: OpenTelemetry Integration

2023-05-23 Thread u...@moosheimer.com
Hallo Brian,

Jaeger would be a good choice because it is very common (almost the standard 
with OpenTelemetry).
Have you looked at OpenLineage (https://openlineage.io/)? Possibly interesting?!

Thanks
Uwe

> Am 23.05.2023 um 04:57 schrieb Brian Putt :
> 
> Hello Joe / All,
> 
> Jaeger or Grafana (w/ tempo) offer comparable tools to visualize the trace
> data. I believe additional tools will be needed to get the most out of the
> trace data. We've been experimenting with a number of open source products
> to see what works best for the amount of trace data that NiFi emits. So
> far, Grafana Tempo, Victoria Metrics, and Clickhouse seem to offer a good
> set of features to cover searching / viewing the traces along with
> summarizing certain flowfile attributes. As long as the trace data is in
> OTEL's format, the collector offers flexibility in exporting the data to a
> number of services with ease.
> 
> I would expect a PR to OTEL's java auto instrumentation project over the
> next few months that adds NiFi to its list of instrumentations. If the NiFi
> committers would like a demo / tech exchange to go over the current state
> of the tracing agent, we'd be happy to accommodate. As it stands, the agent
> utilizes flowfile attributes to pass along the tracestate so trace
> propagation can occur across NiFi to NiFi boundaries.
> 
> Thanks,
> 
> Brian
> 
>> On Wed, May 17, 2023 at 1:05 PM Joe Witt  wrote:
>> 
>> Brian Putt, All
>> 
>> Are you aware of any good tools/services that can ingest the traces and
>> provide an interesting view/story/reporting on it?
>> 
>> I could see us emitting otel events instead of our current provenance
>> mechanism and using that both internally to do what we already do but also
>> have a clear/spec friendly way of exporting it to others.
>> 
>> Thanks
>> 
>> On Sat, Jul 30, 2022 at 7:43 AM u...@moosheimer.com 
>> wrote:
>> 
>>> Hello Brian, Bryan, Greg, NiFi devs,
>>> 
>>> Integrating OpenTelemetry is a very good idea, especially since the major
>>> cloud providers also rely on it. This could also be interesting for
>>> Stateless NiFi.
>>> 
>>> I have a suggestion that I would like to put up for discussion.
>>> 
>>> Would it be useful to make a list of what extensions or new development
>>> would be helpful for a complete integration of OpenTelemetry?
>>> 
>>> I'm thinking of ConsumeMQTT and PublishMQTT, for example. Currently these
>>> can do max. MQTT version 3.11, but since version 5 the User Properties
>>> exist, which are similar to the HTTP header fields.
>>> Thus one could implement OpenTelemetry in the MQTT processors similarly
>> as
>>> in HTTP.
>>> 
>>> With a list we could make an overview of the "necessary" adjustments and
>>> advertise for support.
>>> 
>>> If what I write is nonsense, then I may not have understood something and
>>> I take it all back :)
>>> 
>>> Mit freundlichen Grüßen / best regards
>>> Kay-Uwe Moosheimer
>>> 
>>>> Am 29.07.2022 um 05:09 schrieb Brian Putt :
>>>> 
>>>> Hello Bryan / Greg / NiFi devs,
>>>> 
>>>> Distributed tracing (DT) is similar to provenance in that it shows the
>>> path
>>>> a particular flowfile travels, but its core selling point is that it
>>>> supports tracing across multiple systems/services regardless of what's
>>>> receiving the data. Provenance is a fantastic feature and there are
>>>> instances where one might want to draw that bigger picture of
>> identifying
>>>> bottlenecks as data flows from one system to another and that system
>>>> may/may not be using NiFi.
>>>> 
>>>> DT utilizes three ids: traceId, parentId, and spanId. While a tree can
>> be
>>>> built using two ids, the third id (traceId) helps bring all of the
>>> relevant
>>>> information out of a datastore more easily.
>>>> DT is focused more on performance and identifying bottlenecks in one or
>>>> more systems. Imagine if NiFi were receiving data from various sources
>>>> (i.e. HTTP, Kafka, SQS) and NiFi egressed to other sources (HTTP,
>> Kafka,
>>>> NiFi).
>>>> DT provides a spec that we'd be able to follow and correlate the data
>> as
>>> it
>>>> traverses from system to system. Each system that participates in the
>> DT
>>>> ecosystem would simply emit information (a trace is made up of one or
>>

Re: OpenTelemetry Integration

2022-07-30 Thread u...@moosheimer.com
Hello Brian, Bryan, Greg, NiFi devs,

Integrating OpenTelemetry is a very good idea, especially since the major cloud 
providers also rely on it. This could also be interesting for Stateless NiFi.

I have a suggestion that I would like to put up for discussion.

Would it be useful to make a list of what extensions or new development would 
be helpful for a complete integration of OpenTelemetry?

I'm thinking of ConsumeMQTT and PublishMQTT, for example. Currently these can 
do max. MQTT version 3.11, but since version 5 the User Properties exist, which 
are similar to the HTTP header fields.
Thus one could implement OpenTelemetry in the MQTT processors similarly as in 
HTTP.

With a list we could make an overview of the "necessary" adjustments and 
advertise for support.

If what I write is nonsense, then I may not have understood something and I 
take it all back :)

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 29.07.2022 um 05:09 schrieb Brian Putt :
> 
> Hello Bryan / Greg / NiFi devs,
> 
> Distributed tracing (DT) is similar to provenance in that it shows the path
> a particular flowfile travels, but its core selling point is that it
> supports tracing across multiple systems/services regardless of what's
> receiving the data. Provenance is a fantastic feature and there are
> instances where one might want to draw that bigger picture of identifying
> bottlenecks as data flows from one system to another and that system
> may/may not be using NiFi.
> 
> DT utilizes three ids: traceId, parentId, and spanId. While a tree can be
> built using two ids, the third id (traceId) helps bring all of the relevant
> information out of a datastore more easily.
> DT is focused more on performance and identifying bottlenecks in one or
> more systems. Imagine if NiFi were receiving data from various sources
> (i.e. HTTP, Kafka, SQS) and NiFi egressed to other sources (HTTP, Kafka,
> NiFi).
> DT provides a spec that we'd be able to follow and correlate the data as it
> traverses from system to system. Each system that participates in the DT
> ecosystem would simply emit information (a trace is made up of one or more
> spans) and there'd be a collection system which would aggregate all of
> these spans and would draw a bigger picture of the path that data went
> through and could help identify key bottlenecks.
> 
> OpenTelemetry (OTEL) provides clients (across many languages, including
> java) where developers can instrument their library's APIs and participate
> in a DT ecosystem as it adheres to the tracing spec. Egressing trace data
> is possible without using OTEL, but then we may find ourselves having to
> recreate the wheel, but could be optimized for NiFi.
> 
> Creating a reporting task could certainly be a path, mainly have a few
> concerns with that:
> 
> 1. If provenance is disabled, will provenance events still be emitted and
> be collected by a new reporting task?
> 2. There'll be an impact on performance, how much is unknown. OTEL is
> gaining traction across industry and there are ways to mitigate
> performance, mainly sampling and the fact that *tracing is best effort*.
> Spans would be emitted from NiFi via UDP to a collector on the same network
> 3. Would there be any issues with appending a flowfile attribute that is
> carried throughout the flow where it maintains the traceId, parentSpanId,
> and trace flags? See below for more details
> 
> There's a W3C spec (Trace context) which includes a formatted string that
> would be propagated to services (HTTP, Kafka, etc...). So if NiFi were to
> put information onto kafka, any consumers of that data would be able to
> continue the trace and help draw the bigger picture.
> 
> W3C Spec: https://www.w3.org/TR/trace-context/#traceparent-header
> 
> For #2, since DT is focused on performance, sampling can help alleviate
> chatter over the wire and ideally, 0.01% would draw the same picture as 1%
> or 10%+. This is certainly different from provenance as DT is focused on
> performance over quality of the data and should not be thought of as
> auditing.
> https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampler
> 
>> On Thu, Jul 28, 2022 at 5:01 PM Bryan Bende  wrote:
>> 
>> Hi Greg,
>> 
>> I don't really know anything about OpenTelemetry, but from the
>> perspective of integrating something into the framework, some things
>> to consider...
>> 
>> Is there some way to piggy-back on provenance and use a ReportingTask
>> to process provenance events and report something to OpenTelemetry?
>> 
>> If something new does need to be added, it should probably be an
>> extension point where there is an interface in the framework-api and
>> different implementations can be plugged in.
>> Ideally the framework itself wouldn't have any knowledge of
>> OpenTelemetry specifically, it would only be reporting some
>> information, which could then be used in some way by the OpenTelemetry
>> implementation.
>> 
>> 

Re: PutDatabaseRecord 1.13.2

2021-03-26 Thread u...@moosheimer.com
Hi Tony,

This looks like it may be the behavior described in NIFI-8320.
Matt has already said he is looking at the problem I described. I'm curious to 
see what he finds.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 26.03.2021 um 14:46 schrieb Anton Kovaľ :
> 
> Hi Uwe,
> I was digging at something similar in PutDatabaseRecord 1.13.2 and I found 
> out that problem is the order of the fields in JSON. I wrote test 
> (TestPutDatabaseRecord) that shows:
> 
> @Test
>void testInsertJsonChangeOrderFields() throws InitializationException, 
> ProcessException, SQLException, IOException {
>recreateTable(createPersons)
> 
>final jsonReader = new JsonTreeReader()
>runner.addControllerService("jsonParser", jsonReader)
>runner.enableControllerService(jsonReader)
> 
>final input = '''
>[
>  { 
>"id": 1,
>"code": 101,
>"name": "rec1"
>  },
>  { 
>"id": 2,
>"name": "rec2",
>"code": 102
>  }
>]
>'''
> 
>runner.setProperty(PutDatabaseRecord.RECORD_READER_FACTORY, 
> 'jsonParser')
>runner.setProperty(PutDatabaseRecord.STATEMENT_TYPE, 
> PutDatabaseRecord.INSERT_TYPE)
>runner.setProperty(PutDatabaseRecord.TABLE_NAME, 'PERSONS')
> 
>runner.enqueue(input)
>runner.run()
> 
>runner.assertTransferCount(PutDatabaseRecord.REL_SUCCESS, 1)
>final Connection conn = dbcp.getConnection()
>final Statement stmt = conn.createStatement()
>final ResultSet rs = stmt.executeQuery('SELECT * FROM PERSONS')
>assertTrue(rs.next())
>assertEquals(1, rs.getInt(1))
>assertEquals('rec1', rs.getString(2))
>assertEquals(101, rs.getInt(3))
>assertTrue(rs.next())
>assertEquals(2, rs.getInt(1))
>assertEquals('rec2', rs.getString(2))
>assertEquals(102, rs.getInt(3))
> 
>stmt.close()
>conn.close()
>}
> 
> Routing to failure.: java.lang.NumberFormatException: For input string: "rec1"
> 
> Tony Koval
> 
>> On 2021/03/25 17:59:07, "u...@moosheimer.com"  wrote: 
>> Matt,
>> 
>> Thank you for taking care of this problem!
>> 
>> I have added the "active=true" in the JSON.
>> Unfortunately exactly the same error.
>> 
>> I can't add the primary key (unless I read the sequence myself which I 
>> want to avoid).
>> The attribute "ts" must be set only by the database, because it must be 
>> the exact time.
>> 
>> With this I have unfortunately no more possibility. Or have I overlooked 
>> something?
>> 
>> Regards,
>> Uwe
>> 
>>> Am 25.03.21 um 17:07 schrieb Matt Burgess:
>>> Uwe,
>>> 
>>> I think it's related to [1], which tries to use the field's datatype
>>> rather than the column's datatype if they're compatible but not
>>> identical. I'm guessing if your record doesn't have the field, we're
>>> mishandling the datatype and associated INSERT statement. I will
>>> reproduce, write up the Jira, and fix it shortly. A workaround should
>>> be to fill the records with all fields using the DB's default values.
>>> 
>>> Regards,
>>> Matt
>>> 
>>> [1] https://issues.apache.org/jira/browse/NIFI-8223
>>> 
>>> On Thu, Mar 25, 2021 at 11:20 AM u...@moosheimer.com  
>>> wrote:
>>>> Hi Dev-Team,
>>>> 
>>>> We still have a problem with PutDatabaseRecord on NiFi 1.13.2, which
>>>> used to run on NiFi 1.12.1.
>>>> 
>>>> We have a JSON that has both more attributes than exist in the table and
>>>> missing attributes that are defined in the table with default value.
>>>> The table has a bool value that is set "default true" and is not passed
>>>> in the JSON.
>>>> 
>>>> Processor settings are:
>>>> Record Reader - JsonTreeReader
>>>> Database Type - PostgreSQL (tried also with Generic)
>>>> Statement Type - INSERT
>>>> Translate Field Names - false
>>>> Quote Column Identifier - false
>>>> Quate Table identifier - false
>>>> Unmatched Field Behavior - Ignore Unmatched Fields
>>>> Unmatched Column Behavior - Ignore Unmatched Columns
>>>> 
>>>&

Re: PutDatabaseRecord 1.13.2

2021-03-25 Thread u...@moosheimer.com

Pierre,

The name of the field is "reference" and the data type is VARCHAR(128).
The name of the "bool" is "active".

Here is the table (it is intentional that "date_created" and 
"data_changed" are not of type "timestamp"):


CREATE TABLE text.metadata
(
    id bigint NOT NULL DEFAULT 
nextval('blockchain.metadata_id_seq'::regclass),

    vid bigint NOT NULL,
    reference character varying(128) NOT NULL,
    reference_description character varying(512) NOT NULL,
    reference_name character varying(100) NOT NULL,
    reference_hash character varying(256) NOT NULL,
    active boolean NOT NULL DEFAULT true,
    ts timestamp with time zone NOT NULL DEFAULT now(),
    process character varying(64) NOT NULL,
    type character varying(64) NOT NULL,
    reference_size bigint NOT NULL,
    uplink character(36) NOT NULL,
    downlink character(36) NOT NULL,
    reference_created character varying(32) NOT NULL,
    csid_created character varying(256) ,
    csid_changed character varying(256),
    date_created character varying(32),
    date_changed character varying(32),
    CONSTRAINT metadata_pkey PRIMARY KEY (id)
    USING INDEX TABLESPACE ts_metadata_data,
    CONSTRAINT metadata_vid_fkey FOREIGN KEY (vid)
    REFERENCES metadata.all (id) MATCH SIMPLE
    ON UPDATE NO ACTION
    ON DELETE CASCADE
    NOT VALID
)

Here is the JSON data:

{
    "event":"add",
    "vid":33,
    "uid":"d3595259",
    "uuid":"2f10f0ad-fd63-476b-90b8-7d059c5a7128",
    "process":"nifi add",
    "date_changed":"2021-03-25T14:33:39.591Z",
    "csid_changed":"bMO33ZX9smu8Fvmgk6FRiVaZhmQ",
    "uplink":"79e9ab3b-e302-4b9e-a120-b743e14a9cfe",
    "reference":"hKaT0ytUPfgwNcjhDDtKRin42743t",
    "date_created":"2021-03-25T14:33:39.591Z",
    "csid_created":"bMO33ZX9smu8Fvmgk6FRiVaZhmQ",
    "reference_description":"QYusH+cD8u0458vDG0m3qmi",
    "reference_name":"WVp9qWLhAsd16L39d3tEr2M",
    "reference_size":15,
"reference_hash":"36a12c1cd47016daa7b2786893678aa52a003656ae8aed484eaf5373ef1ce496a835449cc91ef2c3dd96ef14b9ed6c075e27b74b4dbd651fe8890acbaa3d6b4e", 


    "reference_created":"2021-03-25T14:32:50.344Z",
    "type":"data",
    "downlink":"354fed7e-1456-4248-a27f-93db18e68cd8"
}

Am 25.03.21 um 17:05 schrieb Pierre Villard:

I guess we could have a better error message but to me it sounds like there
is a field in your incoming JSON that contains
"hKaT0ytUPfgwNcjhDDtKRin42743t" and the name of this field is matching a
column in the Postgres table that is expecting a boolean. That's why the
casting does not work.

Can you share a sample of the JSON data you're pushing into postgres and
the create table statement for the destination table?

Pierre

Le jeu. 25 mars 2021 à 16:20, u...@moosheimer.com  a
écrit :


Hi Dev-Team,

We still have a problem with PutDatabaseRecord on NiFi 1.13.2, which
used to run on NiFi 1.12.1.

We have a JSON that has both more attributes than exist in the table and
missing attributes that are defined in the table with default value.
The table has a bool value that is set "default true" and is not passed
in the JSON.

Processor settings are:
Record Reader - JsonTreeReader
Database Type - PostgreSQL (tried also with Generic)
Statement Type - INSERT
Translate Field Names - false
Quote Column Identifier - false
Quate Table identifier - false
Unmatched Field Behavior - Ignore Unmatched Fields
Unmatched Column Behavior - Ignore Unmatched Columns

As error message we get:

2021-03-25 15:00:24,509 ERROR [Timer-Driven Process Thread-7]
o.a.n.p.standard.PutDatabaseRecord
PutDatabaseRecord[id=a1ef9918-0177-1000--ba128239] Failed to put
Records to database for
StandardFlowFileRecord[uuid=47bacb24-718b-42e0-97a9-588ab628a4af,claim=StandardContentClaim

[resourceClaim=StandardResourceClaim[id=1616614332963-2,
container=default, section=2], offset=97327973,
length=877],offset=0,name=3699054725979545,size=877]. Routing to
failure.: org.postgresql.util.PSQLException: Cannot cast to boolean:
"hKaT0ytUPfgwNcjhDDtKRin42743t"
org.postgresql.util.PSQLException: Cannot cast to boolean:
"hKaT0ytUPfgwNcjhDDtKRin42743t"
  at

org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
  at
org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
  at
org.postgresql.jdbc.BooleanTypeUtil.castToBoolean(BooleanTypeUtil.java:43)
  at

org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:655)
  at

org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:

Re: PutDatabaseRecord 1.13.2

2021-03-25 Thread u...@moosheimer.com

Matt,

Thank you for taking care of this problem!

I have added the "active=true" in the JSON.
Unfortunately exactly the same error.

I can't add the primary key (unless I read the sequence myself which I 
want to avoid).
The attribute "ts" must be set only by the database, because it must be 
the exact time.


With this I have unfortunately no more possibility. Or have I overlooked 
something?


Regards,
Uwe

Am 25.03.21 um 17:07 schrieb Matt Burgess:

Uwe,

I think it's related to [1], which tries to use the field's datatype
rather than the column's datatype if they're compatible but not
identical. I'm guessing if your record doesn't have the field, we're
mishandling the datatype and associated INSERT statement. I will
reproduce, write up the Jira, and fix it shortly. A workaround should
be to fill the records with all fields using the DB's default values.

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-8223

On Thu, Mar 25, 2021 at 11:20 AM u...@moosheimer.com  
wrote:

Hi Dev-Team,

We still have a problem with PutDatabaseRecord on NiFi 1.13.2, which
used to run on NiFi 1.12.1.

We have a JSON that has both more attributes than exist in the table and
missing attributes that are defined in the table with default value.
The table has a bool value that is set "default true" and is not passed
in the JSON.

Processor settings are:
Record Reader - JsonTreeReader
Database Type - PostgreSQL (tried also with Generic)
Statement Type - INSERT
Translate Field Names - false
Quote Column Identifier - false
Quate Table identifier - false
Unmatched Field Behavior - Ignore Unmatched Fields
Unmatched Column Behavior - Ignore Unmatched Columns

As error message we get:

2021-03-25 15:00:24,509 ERROR [Timer-Driven Process Thread-7]
o.a.n.p.standard.PutDatabaseRecord
PutDatabaseRecord[id=a1ef9918-0177-1000--ba128239] Failed to put
Records to database for
StandardFlowFileRecord[uuid=47bacb24-718b-42e0-97a9-588ab628a4af,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1616614332963-2,
container=default, section=2], offset=97327973,
length=877],offset=0,name=3699054725979545,size=877]. Routing to
failure.: org.postgresql.util.PSQLException: Cannot cast to boolean:
"hKaT0ytUPfgwNcjhDDtKRin42743t"
org.postgresql.util.PSQLException: Cannot cast to boolean:
"hKaT0ytUPfgwNcjhDDtKRin42743t"
  at
org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
  at
org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
  at
org.postgresql.jdbc.BooleanTypeUtil.castToBoolean(BooleanTypeUtil.java:43)
  at
org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:655)
  at
org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:935)
  at
org.apache.commons.dbcp2.DelegatingPreparedStatement.setObject(DelegatingPreparedStatement.java:529)
  at
org.apache.commons.dbcp2.DelegatingPreparedStatement.setObject(DelegatingPreparedStatement.java:529)
  at jdk.internal.reflect.GeneratedMethodAccessor687.invoke(Unknown
Source)
  at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:254)
  at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.access$100(StandardControllerServiceInvocationHandler.java:38)
  at
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler$ProxiedReturnObjectInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:240)
  at com.sun.proxy.$Proxy357.setObject(Unknown Source)
  at
org.apache.nifi.processors.standard.PutDatabaseRecord.executeDML(PutDatabaseRecord.java:736)
  at
org.apache.nifi.processors.standard.PutDatabaseRecord.putToDatabase(PutDatabaseRecord.java:841)
  at
org.apache.nifi.processors.standard.PutDatabaseRecord.onTrigger(PutDatabaseRecord.java:487)
  at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
  at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)
  at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
  at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
  at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
  at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
  at
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
  at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
  at
java.base/java.util.concurrent.ThreadP

PutDatabaseRecord 1.13.2

2021-03-25 Thread u...@moosheimer.com

Hi Dev-Team,

We still have a problem with PutDatabaseRecord on NiFi 1.13.2, which 
used to run on NiFi 1.12.1.


We have a JSON that has both more attributes than exist in the table and 
missing attributes that are defined in the table with default value.
The table has a bool value that is set "default true" and is not passed 
in the JSON.


Processor settings are:
Record Reader - JsonTreeReader
Database Type - PostgreSQL (tried also with Generic)
Statement Type - INSERT
Translate Field Names - false
Quote Column Identifier - false
Quate Table identifier - false
Unmatched Field Behavior - Ignore Unmatched Fields
Unmatched Column Behavior - Ignore Unmatched Columns

As error message we get:

2021-03-25 15:00:24,509 ERROR [Timer-Driven Process Thread-7] 
o.a.n.p.standard.PutDatabaseRecord 
PutDatabaseRecord[id=a1ef9918-0177-1000--ba128239] Failed to put 
Records to database for 
StandardFlowFileRecord[uuid=47bacb24-718b-42e0-97a9-588ab628a4af,claim=StandardContentClaim 
[resourceClaim=StandardResourceClaim[id=1616614332963-2, 
container=default, section=2], offset=97327973, 
length=877],offset=0,name=3699054725979545,size=877]. Routing to 
failure.: org.postgresql.util.PSQLException: Cannot cast to boolean: 
"hKaT0ytUPfgwNcjhDDtKRin42743t"
org.postgresql.util.PSQLException: Cannot cast to boolean: 
"hKaT0ytUPfgwNcjhDDtKRin42743t"
    at 
org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
    at 
org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
    at 
org.postgresql.jdbc.BooleanTypeUtil.castToBoolean(BooleanTypeUtil.java:43)
    at 
org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:655)
    at 
org.postgresql.jdbc.PgPreparedStatement.setObject(PgPreparedStatement.java:935)
    at 
org.apache.commons.dbcp2.DelegatingPreparedStatement.setObject(DelegatingPreparedStatement.java:529)
    at 
org.apache.commons.dbcp2.DelegatingPreparedStatement.setObject(DelegatingPreparedStatement.java:529)
    at jdk.internal.reflect.GeneratedMethodAccessor687.invoke(Unknown 
Source)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:254)
    at 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.access$100(StandardControllerServiceInvocationHandler.java:38)
    at 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler$ProxiedReturnObjectInvocationHandler.invoke(StandardControllerServiceInvocationHandler.java:240)

    at com.sun.proxy.$Proxy357.setObject(Unknown Source)
    at 
org.apache.nifi.processors.standard.PutDatabaseRecord.executeDML(PutDatabaseRecord.java:736)
    at 
org.apache.nifi.processors.standard.PutDatabaseRecord.putToDatabase(PutDatabaseRecord.java:841)
    at 
org.apache.nifi.processors.standard.PutDatabaseRecord.onTrigger(PutDatabaseRecord.java:487)
    at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)
    at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
    at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)

    at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

    at java.base/java.lang.Thread.run(Thread.java:834)


Our test environment is:
NiFi 1.13.2 Cluster with 3 nodes
Postgres 13.2
openjdk version "11.0.10"
Ubuntu 20.04.1 LTS

Is this a known issue or an individual fate that makes us despair?




Re: java api for changing parameter context

2021-01-27 Thread u...@moosheimer.com

That helps...

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

Am 27.01.21 um 18:31 schrieb Bryan Bende:

You can open Chrome Dev Tools while using the UI and perform whichever
operations you are interested in on a parameter context and you can
see the requests made to the REST API.

If you are interested in REST API docs, there is a link on the left
side of the main docs page:

https://nifi.apache.org/docs.html

Parameter contexts are under Controller.

On Wed, Jan 27, 2021 at 12:08 PM u...@moosheimer.com  
wrote:

Couldn't find any information about changing parameters via REST API.
Do you have any example?

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer


Am 27.01.2021 um 15:17 schrieb Russell Bateman :

Wait! Can't this be done using the ReST APIs?


On 1/27/21 3:24 AM, u...@moosheimer.com wrote:
Hello NiFi-Core-Team,

Are you planning to create a high-level Java API for setting (and
clearing) individual parameters in the parameter context, so we can use
this API in processor development?

Example:
setParameter(string contextName, string parameterName, string
parameterValue, boolean sensitive);
deleteParameter(string contextName, string parameterName);

Some of our customers have systems with weekly changing parameter values
and/or access passphrase.
Apart from these nothing changes in the system and the changes can be
automated with self written processor.

Best regards,
Kay-Uwe Moosheimer





Re: java api for changing parameter context

2021-01-27 Thread u...@moosheimer.com
Couldn't find any information about changing parameters via REST API.
Do you have any example?

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 27.01.2021 um 15:17 schrieb Russell Bateman :
> 
> Wait! Can't this be done using the ReST APIs?
> 
>> On 1/27/21 3:24 AM, u...@moosheimer.com wrote:
>> Hello NiFi-Core-Team,
>> 
>> Are you planning to create a high-level Java API for setting (and
>> clearing) individual parameters in the parameter context, so we can use
>> this API in processor development?
>> 
>> Example:
>> setParameter(string contextName, string parameterName, string
>> parameterValue, boolean sensitive);
>> deleteParameter(string contextName, string parameterName);
>> 
>> Some of our customers have systems with weekly changing parameter values
>> and/or access passphrase.
>> Apart from these nothing changes in the system and the changes can be
>> automated with self written processor.
>> 
>> Best regards,
>> Kay-Uwe Moosheimer
>> 
>> 
>> 
> 



java api for changing parameter context

2021-01-27 Thread u...@moosheimer.com
Hello NiFi-Core-Team,

Are you planning to create a high-level Java API for setting (and
clearing) individual parameters in the parameter context, so we can use
this API in processor development?

Example:
setParameter(string contextName, string parameterName, string
parameterValue, boolean sensitive);
deleteParameter(string contextName, string parameterName);

Some of our customers have systems with weekly changing parameter values
and/or access passphrase.
Apart from these nothing changes in the system and the changes can be
automated with self written processor.

Best regards,
Kay-Uwe Moosheimer




Re: Parameter Contexts Expression Language scopes

2020-10-19 Thread u...@moosheimer.com
If parameters make it possible to use environment variables, then the system 
can be configured in one place. You would not separate between expression 
language and parameters.

It is also possible that an environment variable should be replaced by a 
parameter or vice versa.
It makes sense to configure this in a central place.

Therefore I think this is a good suggestion.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 19.10.2020 um 23:18 schrieb Bryan Bende :
> 
> I think there is some confusion based on terminology...
> 
> A given property has an expression language scope defined, which can
> be "flow file attributes" or "variable registry only". This was
> created mostly for documentation purposes so that users could look at
> a property in the docs and see "expression language supported: true"
> and know which values they could reference. What it really comes down
> to in the code is the difference between
> "someProperty.evaluateAttributeExpressions()" and
> "someProperty.evaluateAttributeExpressions(flowFile)"... meaning can I
> reference a value from a flow file or not.
> 
> In the case of "variable registry only", the value can come from any
> of the following...
> a) system properties
> b) expression language
> c) process-group variable registry
> d) file-based variable registry
> 
> When people have stated that "variables are being deprecated in favor
> of parameters", they are referring to the last two items (c & d).
> 
> The reason being that parameters solved several short-comings of those
> two options...
> 
> - the ability to store sensitive values encrypted
> - the ability to reference them from any property using a new syntax
> #{...}, not long dependent on component developer saying
> "expressionLanguageSupported(true)" on the descriptor
> - the ability to create policies for which users/groups could
> reference a set of parameters
> - the hierarchical ambiguity of what "${foo}" actually resolves to
> 
> If you just want to use environment variables, why use parameter
> contexts? Expression language has always offered access to env vars
> and still will going forward.
> 
> The one  argument I can see is that not all properties support
> expression language, so using parameters gives you a way around that.
> 
> 
> 
> 
> 
>> On Mon, Oct 19, 2020 at 3:58 PM Chris Sampson
>>  wrote:
>> 
>> Being based primarily in Docker containers and having experience with both
>> Kubernetes (where secrets such as KESTORE_PASSWORD can be injected as
>> environment variables or files) and Docker Swarm (which only handles
>> secrets as environment variables), I'd have definitely been wanting this
>> before moving from Variables to Parameters if I was still in Swarm (or
>> Docker Compose/straight up Docker).
>> 
>> It's certainly possible to script creating/updating Parameters via the
>> Toolkit/NiPyAPI, but in Docker Swarm that's not so easy (whereas it's
>> possible as a Job in Kubernetes, for example). So environment variables
>> could save the day in that instance.
>> 
>> I guess one likely problem (but no different to how I guess the Variable
>> Registry uses env vars) would be how NiFi will handle changes to the env
>> vars - does it:
>> 
>>   - ignore them until instance restart, which could lead to maintainer
>>   confusion (I've changed KEYSTORE_PASSWORD in the env but things are still
>>   failing in NiFi)
>>   - alert the maintainer to the fact that the env var has changed and a
>>   Parameter needs updating, with the new value being used after all
>>   associated processors/controllers have been restarted
>>   - automatically attempt to update the parameters by restarting all
>>   associated processors/controllers, which I'd guess would be a bit dangerous
>>   for interrupting in-flow data, etc.
>> 
>> 
>> ---
>> *Chris Sampson*
>> IT Consultant
>> chris.samp...@naimuri.com
>> <https://www.naimuri.com/>
>> 
>> 
>>> On Mon, 19 Oct 2020 at 19:35, Bryan Bende  wrote:
>>> 
>>> Access to environment variables directly from expression language is
>>> not being removed.
>>> 
>>> The discussion is about whether a parameter value should be able to
>>> use expression language to reference an environment variable.
>>> 
>>> For example, processor property has #{keystore.password} -> parameter
>>> "keystore.password" has value "${KEYSTORE_PASSWORD}" which then gets
>>> the password from an environment variable.
>>

Re: Parameter Contexts Expression Language scopes

2020-10-19 Thread u...@moosheimer.com
Then I must have misunderstood it.
Thanks Bryan for clarification.

However, the idea of Chad makes sense for me.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 19.10.2020 um 20:35 schrieb Bryan Bende :
> 
> Access to environment variables directly from expression language is
> not being removed.
> 
> The discussion is about whether a parameter value should be able to
> use expression language to reference an environment variable.
> 
> For example, processor property has #{keystore.password} -> parameter
> "keystore.password" has value "${KEYSTORE_PASSWORD}" which then gets
> the password from an environment variable.
> 
> 
> 
>> On Mon, Oct 19, 2020 at 2:14 PM u...@moosheimer.com  
>> wrote:
>> 
>> Chad,
>> 
>> So far I thought that only the NiFi variables are deprecated and access to 
>> environment variables will still be possible.
>> 
>> If this is not the case, then I agree with you. It should definitely be 
>> possible to access environment variables. Otherwise I can't imagine how to 
>> refer to the hostname or the current JAVA path or ... or ... or on each 
>> node?!
>> 
>> Mit freundlichen Grüßen / best regards
>> Kay-Uwe Moosheimer
>> 
>>>> Am 19.10.2020 um 20:00 schrieb Chad Zobrisky :
>>> 
>>> Andy,
>>> 
>>> Thanks for the response!
>>> 
>>> When I was thinking through this the deprecation of variables was
>>> definitely on my mind but the fact that it already had direct access to
>>> environment variables was the simplest path. I think it does make more
>>> sense to add access to environment variables to the parameter context, or
>>> allowing a specific scope just for environment variables in the
>>> expression language.
>>> 
>>> I think giving access to environment variables actually allows more
>>> portability between environments, eg dev, test, prod. Defining those once
>>> and allowing for nifi to pull them in makes sense to me and I think is
>>> common in container environments.
>>> 
>>> Looking forward to discussing more and better approaches.
>>> Chad
>>> 
>>>> On Mon, Oct 19, 2020 at 1:46 PM Andy LoPresto  wrote:
>>>> 
>>>> Hi Chad,
>>>> 
>>>> Parameters were introduced as a way to deprecate (NiFi) variables
>>>> entirely. I’m not sure that introducing a dependency between the two is a
>>>> positive step forward. I think there is a separate conversation to be had
>>>> about allowing parameters access to environment variables, but I think this
>>>> could introduce problems as parameters are designed for flexibility and
>>>> portability, and moving from a system where a parameter was actually a
>>>> pass-through to an environment variable would cause unexpected problems on
>>>> the destination system.
>>>> 
>>>> I think the pros and cons of this need to be clearly enumerated and
>>>> discussed here. Thanks for bringing this up.
>>>> 
>>>> 
>>>> Andy LoPresto
>>>> alopre...@apache.org
>>>> alopresto.apa...@gmail.com
>>>> He/Him
>>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>> 
>>>>>> On Oct 19, 2020, at 9:43 AM, Chad Zobrisky  wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I was configuring an SSL Context Controller Service today and had the
>>>>> keystores and passwords passed into the container via environment
>>>>> variables. I thought it would be nice to be able to reference these from
>>>>> the parameter context. Maybe either giving Parameter Context values the
>>>>> VARIABLE_REGISTRY scope in the Expression Language, or a new scope for
>>>>> references external to nifi?
>>>>> 
>>>>> I think for refreshing the Parameter Context on those external changes,
>>>> it
>>>>> would require an edit/re-apply just as it does now, and would have to
>>>> make
>>>>> sure it is well documented.
>>>>> 
>>>>> I'd be interested in creating a PR for this if the idea makes sense and
>>>> is
>>>>> acceptable.
>>>>> 
>>>>> Thanks,
>>>>> Chad
>>>> 
>>>> 
>> 



Re: Parameter Contexts Expression Language scopes

2020-10-19 Thread u...@moosheimer.com
Chad,

So far I thought that only the NiFi variables are deprecated and access to 
environment variables will still be possible.

If this is not the case, then I agree with you. It should definitely be 
possible to access environment variables. Otherwise I can't imagine how to 
refer to the hostname or the current JAVA path or ... or ... or on each node?!

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 19.10.2020 um 20:00 schrieb Chad Zobrisky :
> 
> Andy,
> 
> Thanks for the response!
> 
> When I was thinking through this the deprecation of variables was
> definitely on my mind but the fact that it already had direct access to
> environment variables was the simplest path. I think it does make more
> sense to add access to environment variables to the parameter context, or
> allowing a specific scope just for environment variables in the
> expression language.
> 
> I think giving access to environment variables actually allows more
> portability between environments, eg dev, test, prod. Defining those once
> and allowing for nifi to pull them in makes sense to me and I think is
> common in container environments.
> 
> Looking forward to discussing more and better approaches.
> Chad
> 
>> On Mon, Oct 19, 2020 at 1:46 PM Andy LoPresto  wrote:
>> 
>> Hi Chad,
>> 
>> Parameters were introduced as a way to deprecate (NiFi) variables
>> entirely. I’m not sure that introducing a dependency between the two is a
>> positive step forward. I think there is a separate conversation to be had
>> about allowing parameters access to environment variables, but I think this
>> could introduce problems as parameters are designed for flexibility and
>> portability, and moving from a system where a parameter was actually a
>> pass-through to an environment variable would cause unexpected problems on
>> the destination system.
>> 
>> I think the pros and cons of this need to be clearly enumerated and
>> discussed here. Thanks for bringing this up.
>> 
>> 
>> Andy LoPresto
>> alopre...@apache.org
>> alopresto.apa...@gmail.com
>> He/Him
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>> 
 On Oct 19, 2020, at 9:43 AM, Chad Zobrisky  wrote:
>>> 
>>> Hello,
>>> 
>>> I was configuring an SSL Context Controller Service today and had the
>>> keystores and passwords passed into the container via environment
>>> variables. I thought it would be nice to be able to reference these from
>>> the parameter context. Maybe either giving Parameter Context values the
>>> VARIABLE_REGISTRY scope in the Expression Language, or a new scope for
>>> references external to nifi?
>>> 
>>> I think for refreshing the Parameter Context on those external changes,
>> it
>>> would require an edit/re-apply just as it does now, and would have to
>> make
>>> sure it is well documented.
>>> 
>>> I'd be interested in creating a PR for this if the idea makes sense and
>> is
>>> acceptable.
>>> 
>>> Thanks,
>>> Chad
>> 
>> 



Re: Regarding Concurrent Tasks & Maximum Timer Driven Thread Count

2020-09-28 Thread u...@moosheimer.com
Mark has made a really good video for those who don't have such deep knowledge 
of threading, system load and concurrency. And for all who wonder what "Run 
Duration" means in the settings.
Very recommendable!

Thank you Mark.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 28.09.2020 um 15:33 schrieb Bryan Bende :
> 
> Hello,
> 
> Mark Payne recently published a video which relates to this topic and
> should answer your question:
> 
> https://www.youtube.com/watch?v=pZq0EbfDBy4
> 
> Thanks,
> 
> Bryan
> 
>> On Fri, Sep 25, 2020 at 2:57 PM Midhun Mohan  wrote:
>> 
>> Hi all,
>> 
>> Lets say, If I want to configure my Nifi processor to have 100/ 200/ 500
>> concurrent tasks, what should my
>> Maximum Timer Driven Thread Count be?
>> 
>> Also Should I need to take care of anything related to instance like, its
>> ram or disk space,(if it is not deployed in a cluster to load balance), to
>> make sure it is not being destroyed by my whole new setup.
>> 
>> 
>> Or else
>> 
>> If I can get a rule saying what all are depending on the having a larger
>> count of concurrent tasks, then it would be helpful
>> 
>> --
>> 
>> 
>> Regards,
>> Midhun Mohan



Re: [DISCUSS] rename master branch, look through code for other related issues

2020-06-18 Thread u...@moosheimer.com
Language is always changing and the meaning of words is changing,
sometimes positively and sometimes negatively.
I think that now is time for change again and we should discuss the use
of phrases and meanings.

Of course we should change "Master Branch" to "Main Branch".
But I also think that we shouldn't just make quick changes because it's
opportune and hastily change a few words.

An example: We could change Master/Slave to Leader/Follower. This may be
a perfect choice for most people in the world.
In German Leader is the English word for "Führer". And it is precisely
this word that we in Germany do not actually want to use for it.

What I mean is that every country and every group (e.g. religion etc.)
has its own history and certain words or phrases are just not a perfect
choice.
We should try to go the ethically correct way worldwide.

This concerns the adaptation of current words and phrases with a view to
all: in English, Indian, Chinese, German etc. but also for indigenous
peoples, different religions etc.
And cultural differences should also be taken into account.

What I would wish for:
Apache.org should set up an "Ethics Board". A group of people of
different genders, all colors, religions and from different countries
and cultures all over our world.
This Ethics Board should find good and for no one discriminating words
or phrases for all the areas that stand out today as offensive.

And it would be nice if not only computer scientists participated, but
also ethicists, philosophers, engineers, various religious people,
chemists, biologists, physicists, sociologists, etc.

And this Council should set binding targets for all projects.

Am 18.06.2020 um 09:36 schrieb Pierre Villard:
>> In my perspective this should be an issue for the entire community. Being
>> able to identify an issue that directly affects another person but not
>> one’s self is the definition of privilege. If I can look at how the use of
>> these words in someone’s daily life or career impacts them negatively,
> when
>> the change would not harm me at all, I see that as a failure on my part. I
>> understand the desire to hear from the silent majority, but active
>> participation and discussion on the mailing list is the exact measure
>> described by the Apache process for participation in the community. Those
>> who speak here are the ones who will have a voice.
> I could not agree more with the above.
>
> Le jeu. 18 juin 2020 à 04:29, Tony Kurc  a écrit :
>
>> I suppose I was a bit remiss in not unwinding and/or summarizing some of
>> what was in that yetus thread to prime the discussion, but a some of what
>> Andy is mentioning is expanded on a bit in this ietf document [1], which is
>> linked in one of the articles.
>>
>> 1. https://tools.ietf.org/id/draft-knodel-terminology-00.html
>>
>>
>> On Wed, Jun 17, 2020, 10:02 PM Andy LoPresto  wrote:
>>
>>> Hi Edward, thanks for sharing your thoughts. I’ll reply inline.
>>>
 - Some of the terms proposed are not industry standard and may
>>> potentially
 cause significant issue for non-english speakers.
>>> I actually believe making these changes will _improve_ the clarity for
>>> non-english speakers. “Whitelist” and “blacklist” confer no inherent
>> reason
>>> to mean allow and deny other than connotative biases. “Allow” and “deny”
>>> explicitly indicate the verb that is happening. Another example is branch
>>> naming. “Masters” don’t have “branches”. “Trunks” do. These terms make
>>> _more_ sense for a non-English speaker than the current terms.
>>>
 - For each change that is made can we guarantee that we will not lose
 clarity of meaning, and then have revert the change down the line if
>> the
 change causes a drop in usage.
>>> I don’t expect the community will opt to change the new terms back to
>> ones
>>> with negative connotations in the future. If there is discussion about
>> it,
>>> this thread will provide good historical context for why the decision was
>>> made to change it, just as the mailing list discussions do for other code
>>> changes.
>>>
 - Of what percentage of people is this truly an issue for and what
 percentage isn't. Any change that has the potential to cause a major
>>> split
 in the community, there must be as close as possible to a majority, and
>>> not
 just from those that are vocal and active on the mailing lists.
 Disscustions on other groups are turning toxic, and in some cases are
 potentially leading to the collapse of these projects where these
>> changes
 are being implemented with what appears to be without the agreement of
>> a
 signifficant chunk of the community.

>>> In my perspective this should be an issue for the entire community. Being
>>> able to identify an issue that directly affects another person but not
>>> one’s self is the definition of privilege. If I can look at how the use
>> of
>>> these words in someone’s daily life or career impacts them negatively,
>> when
>>> the 

Re: [DISCUSS] Advanced search capabilities

2020-02-21 Thread u...@moosheimer.com
Martin,

correct me if I’m wrong, but Neo4J is not open source. Wouldn't it be generally 
better to use Janusgraph for this?
Especially since many major companies use Janusgraph: IBM Cloud, Netflix, 
Uber...

A complete and fully functional lineage is already available with Apache Atlas. 
In my opinion it would be more complicated to build it in NiFi than to 
integrate Apache Atlas, whereas it is already standard in NiFi to bridge to 
Atlas.

Or did I misunderstand something?

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 21.02.2020 um 19:42 schrieb Martin Ebert :
> 
> We still think about building a graph based search (Neo4j) in top of NiFi.
> Would be also fantastic to have it within NiFi.
> 
> There are plenty of examples
> https://blog.grandstack.io/using-neo4js-full-text-search-with-graphql-e3fa484de2ea
> From the idea it could go in this direction - of course much more
> rudimentary. Then one would have the possibility to have only the results
> displayed as text or to find out exploratory connections (graph layout).
> The built-in data lineage function of NiFi would also benefit from the
> power of Neo4j.
> 
> Simon Bence  schrieb am Fr., 21. Feb. 2020, 19:00:
> 
>> Dear Community,
>> 
>> In my project, I do use relatively high number of processors and process
>> groups. The current search function on the NiFi UI has no  capabilitites to
>> narrow the results based on the group, which would make the results more
>> relevant, so I would like to propose a possible solution. Please if you
>> have any comment on this, do not hesitate to share it.
>> 
>> The general approach would be to keep the current text box and extend the
>> server side capabilities to process search query in the similar manner for
>> example the Google search behaves.This extensions I would call "filters".
>> For now I am interested in the ones I will mention below, but I think, it
>> is only a matter of small work for further extend the solution with further
>> ones.
>> 
>> In order to distinguish the filters from the rest of the search query, I
>> propose to put them at the beginning of the query and use the
>> [a-zA-Z0-9\.]{1..n}\:[a-zA-Z0-9\.]{1..n} format. For example a filter might
>> look the following: lorem:ipsum
>> 
>> Adding this, the search query should look like the following:
>> 
>> filter1:value filter2:value rest of the query
>> 
>> As for processing the filters, I suggest the following behaviour:
>> 
>> - Without filters the current behaviour should be kept
>> - Everything after the filters should be handled as the search term
>> - After the first "non filter word", anything should be considered as part
>> of the search term (meaning: to keep the text parsing simple, I would not
>> go in the direction to support filters at the end of the query, etc.)
>> - The ordering of the filters should have no effect on the result
>> - Filter duplications should be eliminated
>> - In case a filter appears multiple times in the query, the first occasion
>> will be used
>> - Unknown filters should be ignored
>> - Only adding filters will not end up with result, at least one character
>> must appear as search term
>> 
>> Suggested filters:
>> 
>> scope
>> Narrows the search based on the user's currently active process group. The
>> allowed values are: "all" and "here". All produces the current behaviour,
>> thus no filtering happens, but "here" should use the current process group
>> as "root" of the search, ignoring everything else (including parent group).
>> Note: This needs a minimal frontend change, because as I did see, currently
>> the current group is not sent with the search query.
>> 
>> group
>> Narrows the search for a given processing group, if it exists. The
>> behaviour is recursive, thus the result will include the contained groups
>> as well. If it is a non-existing group, the result list should be empty.
>> 
>> properties
>> Controls if properties values are included or not. If not provided, the
>> property values will be included. This is because in a lot of cases there
>> is a huge number of results come from property names.
>> 
>> - Valid values for inclusion: yes, true, include, 1
>> - Valid values for exclusion: no, none, false, exclude, 0
>> 
>> It is possible that the range of possible values should be limited (and not
>> being ambiguous), but I see a merit of "permissiveness" here as it is
>> simpler to remember.
>> 
>> Also some example:
>> 
>> 1.
>> scope:here properties:exclude lorem ipsum
>> This should search only in the current group (and it's children), excluding
>> properties and return with components containing the "lorem ipsum"
>> expression.
>> 
>> 2.
>> group:myGroup someQuery
>> This should result the finding of components with someQuery expression, but
>> only within the myGroup group, even if it is not the active one.
>> 
>> 3.
>> scope:all properties:include lorem
>> This should behave the same as "lorem" without filters.
>> 
>> Thanks for reading, I am interested to 

Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
Sorry :-)

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 30.01.2020 um 23:08 schrieb Joe Witt :
> 
> Our data provenance is.  Just not our repository :)
> 
>> On Thu, Jan 30, 2020 at 5:00 PM u...@moosheimer.com 
>> wrote:
>> 
>> Lars
>> 
>> You're absolutely right about what you say.
>> If the data in the NiFi repositories is only stored temporarily for a
>> few hours, then documentation is quite sufficient.
>> 
>> The original question was how to delete data from the data lineage.
>> I assumed to use the NiFi repository as a full Data Lineage System.
>> If NiFi is your central application, then you could avoid having to
>> install Atlas as well. And with Atlas, you would have to install Ranger,
>> Cassandra or even Hadoop and HBase.
>> 
>> Joe has already made it clear to me here that Data Provenance/Data
>> Lineage of NiFi is not designed for this yet.
>> Maybe in the future...
>> 
>> Best
>> Uwe
>> 
>>> Am 30.01.2020 um 22:08 schrieb Lars Winderling:
>>> Dear Uwe and fellow devs,
>>> 
>>> sorry if I completely miss the point here, but I'll try. Also working
>> with NiFi under GDPR-regulations in online ad business. From my point it
>> would be sufficient to ensure that no new data will get stored, if a user
>> requests deletion, and delete all personal data from all respective
>> systems. The NiFi repos will expire their data, which can be argued to
>> equal a delayed deletion. Remember that GDPR is quite strict, but if you
>> have a proper case for this kind of process e.g. due to technical
>> limitations, it needs to be documented, and then it will likely be ok. We
>> do it similarly, and our legal counsel approved this. My response, however,
>> is not legally binding. The regulation says something like you should take
>> appropriate measures. If such a tool like NiFi just doesn't let you delete
>> temporarily stored data instantly, this may seem acceptable.
>>> 
>>> Best,
>>> Lars
>>> 
>>> Am 30. Januar 2020 21:36:31 MEZ schrieb Mike Thomsen <
>> mikerthom...@gmail.com>:
>>>> I suppose the elephant in the room here is what sort of personal data
>>>> is
>>>> being stored in your provenance records? Can't you just refactor your
>>>> flows
>>>> to ensure that the provenance data doesn't meaningful contain anything
>>>> traceable to a person?
>>>> 
>>>> On Thu, Jan 30, 2020 at 12:41 PM u...@moosheimer.com
>>>> 
>>>> wrote:
>>>> 
>>>>> Emanuel
>>>>> 
>>>>> That was not meant disrespectfully by me. And if that's how you felt,
>>>>> then I apologize.
>>>>> 
>>>>>> In what sense does NiFi relates to GDPR compliance ?
>>>>> All person-related data that flows, is read, sent or stored etc.  in
>>>> a
>>>>> company is GDPR relevant.
>>>>> 
>>>>>> - in terms of data FF contents - they too transient (gone in 12hours
>>>> /
>>>>> default).
>>>>> It makes no difference how long the data is stored. And it makes no
>>>>> difference if data is stored on disk or just in memory.
>>>>> 
>>>>> The data can potentially be read, processed by others or sent to
>>>> other
>>>>> systems and so on. Or the data can be used during this time to
>>>> establish
>>>>> relationships to other data (pseudo anonymized data etc.).
>>>>> 
>>>>>> I guess discussion is on the fact FF attributes are kept on the
>>>> data
>>>>>   provenance repo ? (gone in 24h / default)
>>>>> I'm afraid not. It's generally a matter of NiFi storing data - as
>>>>> already mentioned, it doesn't make any difference whether it's on the
>>>>> hard disk or just in memory.
>>>>> 
>>>>>> I wonder where the culprit here ?
>>>>> There's no culprit here. It's generally a problem with GDPR when
>>>>> processing person-related data.
>>>>> It's a problem of person-related data.
>>>>> It is a problem of person-related data, which would fill a book, what
>>>> is
>>>>> person-related, because machine data can also be person-related, for
>>>>> example if I can relate a person directly to the machine and
>>>> place/time.
>>>>> This would allow me to track a pers

Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
Lars

You're absolutely right about what you say.
If the data in the NiFi repositories is only stored temporarily for a
few hours, then documentation is quite sufficient.

The original question was how to delete data from the data lineage.
I assumed to use the NiFi repository as a full Data Lineage System.
If NiFi is your central application, then you could avoid having to
install Atlas as well. And with Atlas, you would have to install Ranger,
Cassandra or even Hadoop and HBase.

Joe has already made it clear to me here that Data Provenance/Data
Lineage of NiFi is not designed for this yet.
Maybe in the future...

Best
Uwe

Am 30.01.2020 um 22:08 schrieb Lars Winderling:
> Dear Uwe and fellow devs,
>
> sorry if I completely miss the point here, but I'll try. Also working with 
> NiFi under GDPR-regulations in online ad business. From my point it would be 
> sufficient to ensure that no new data will get stored, if a user requests 
> deletion, and delete all personal data from all respective systems. The NiFi 
> repos will expire their data, which can be argued to equal a delayed 
> deletion. Remember that GDPR is quite strict, but if you have a proper case 
> for this kind of process e.g. due to technical limitations, it needs to be 
> documented, and then it will likely be ok. We do it similarly, and our legal 
> counsel approved this. My response, however, is not legally binding. The 
> regulation says something like you should take appropriate measures. If such 
> a tool like NiFi just doesn't let you delete temporarily stored data 
> instantly, this may seem acceptable.
>
> Best,
> Lars
>
> Am 30. Januar 2020 21:36:31 MEZ schrieb Mike Thomsen :
>> I suppose the elephant in the room here is what sort of personal data
>> is
>> being stored in your provenance records? Can't you just refactor your
>> flows
>> to ensure that the provenance data doesn't meaningful contain anything
>> traceable to a person?
>>
>> On Thu, Jan 30, 2020 at 12:41 PM u...@moosheimer.com
>> 
>> wrote:
>>
>>> Emanuel
>>>
>>> That was not meant disrespectfully by me. And if that's how you felt,
>>> then I apologize.
>>>
>>>> In what sense does NiFi relates to GDPR compliance ?
>>> All person-related data that flows, is read, sent or stored etc.  in
>> a
>>> company is GDPR relevant.
>>>
>>>> - in terms of data FF contents - they too transient (gone in 12hours
>> /
>>> default).
>>> It makes no difference how long the data is stored. And it makes no
>>> difference if data is stored on disk or just in memory.
>>>
>>> The data can potentially be read, processed by others or sent to
>> other
>>> systems and so on. Or the data can be used during this time to
>> establish
>>> relationships to other data (pseudo anonymized data etc.).
>>>
>>>> I guess discussion is on the fact FF attributes are kept on the
>> data
>>>provenance repo ? (gone in 24h / default)
>>> I'm afraid not. It's generally a matter of NiFi storing data - as
>>> already mentioned, it doesn't make any difference whether it's on the
>>> hard disk or just in memory.
>>>
>>>> I wonder where the culprit here ?
>>> There's no culprit here. It's generally a problem with GDPR when
>>> processing person-related data.
>>> It's a problem of person-related data.
>>> It is a problem of person-related data, which would fill a book, what
>> is
>>> person-related, because machine data can also be person-related, for
>>> example if I can relate a person directly to the machine and
>> place/time.
>>> This would allow me to track a person/employee and this is not
>> allowed
>>> (unless a law allows me to do so).
>>>
>>> All this goes much further and would be far too much to mention now.
>>> In principle, we have a GDPR issue and must act in accordance with
>> the law.
>>> We do not agree with all the regulation either. But all regulations I
>>> know so far have at least one justification. Even if we as enterprise
>>> architects, developers, administrators etc. have our problems with
>> them.
>>> Regards
>>> Uwe
>>>
>>> Am 30.01.2020 um 17:51 schrieb Emanuel Oliveira:
>>>> But enlight me please :) isnt GDPR just about cleaning from
>> persistent
>>>> storage ?
>>>> In what sense does NiFi relates to GDPR compliance ?
>>>>
>>>>- in terms of data FF contents - they too transient (gone in
>> 12hours /
>>>>default).
>>&

Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
You see this very much from a technical perspective.
Purely technical IoT data does not have this problem.

But the question is what is purely technical?

If you process IoT data of vehicles, then at first glance this is purely
technical data.
But if there is a VIN (Vehicle Identify Number) in the data, then it is
person-related data.
This is already legally defined.

Without VIN, however, the data makes no sense. And even if you make the
VIN pseudo anonymous (because you must have a KV table somewhere or
generate a hash), you can always assign the data to a vehicle and thus
to its owner.
You would have to make the data totally anonymous. But that would make
the VIN completely worthless and thus also your IoT data.

There are many books and articles that describe how to find the person
to whom the data can be assigned from pseudo anonymized data.
And GDPR explicitly says that these are then person-related data.

I could continue this list forever. A farmer has IoT sensors on his
tractor and in the field. This data goes to a provider who then
evaluates how to fertilize and plant.
All, all, all person-related data.

This is not as easy as you might think, because there is hardly any data
that is not personalizable.

And in general it also depends on your business. If you explicitly use
personal data in your Use Case, you cannot simply filter that out.
Unless you stop your business.

Am 30.01.2020 um 21:36 schrieb Mike Thomsen:
> I suppose the elephant in the room here is what sort of personal data is
> being stored in your provenance records? Can't you just refactor your flows
> to ensure that the provenance data doesn't meaningful contain anything
> traceable to a person?
>
> On Thu, Jan 30, 2020 at 12:41 PM u...@moosheimer.com 
> wrote:
>
>> Emanuel
>>
>> That was not meant disrespectfully by me. And if that's how you felt,
>> then I apologize.
>>
>>> In what sense does NiFi relates to GDPR compliance ?
>> All person-related data that flows, is read, sent or stored etc.  in a
>> company is GDPR relevant.
>>
>>> - in terms of data FF contents - they too transient (gone in 12hours /
>> default).
>> It makes no difference how long the data is stored. And it makes no
>> difference if data is stored on disk or just in memory.
>>
>> The data can potentially be read, processed by others or sent to other
>> systems and so on. Or the data can be used during this time to establish
>> relationships to other data (pseudo anonymized data etc.).
>>
>>> I guess discussion is on the fact FF attributes are kept on the data
>>provenance repo ? (gone in 24h / default)
>> I'm afraid not. It's generally a matter of NiFi storing data - as
>> already mentioned, it doesn't make any difference whether it's on the
>> hard disk or just in memory.
>>
>>> I wonder where the culprit here ?
>> There's no culprit here. It's generally a problem with GDPR when
>> processing person-related data.
>> It's a problem of person-related data.
>> It is a problem of person-related data, which would fill a book, what is
>> person-related, because machine data can also be person-related, for
>> example if I can relate a person directly to the machine and place/time.
>> This would allow me to track a person/employee and this is not allowed
>> (unless a law allows me to do so).
>>
>> All this goes much further and would be far too much to mention now.
>> In principle, we have a GDPR issue and must act in accordance with the law.
>>
>> We do not agree with all the regulation either. But all regulations I
>> know so far have at least one justification. Even if we as enterprise
>> architects, developers, administrators etc. have our problems with them.
>>
>> Regards
>> Uwe
>>
>> Am 30.01.2020 um 17:51 schrieb Emanuel Oliveira:
>>> But enlight me please :) isnt GDPR just about cleaning from persistent
>>> storage ?
>>> In what sense does NiFi relates to GDPR compliance ?
>>>
>>>- in terms of data FF contents - they too transient (gone in 12hours /
>>>default).
>>>- I guess discussion is on the fact FF attributes are kept on the data
>>>provenance repo ? (gone in 24h / default)
>>>
>>> I wonder wheres the culprit here ? Is it in the situation hwere one wants
>>> to keep a long trace of data provenance like 6 months, but because
>>> attributes are stored on provenance events, then they must be deleted ?
>>> I guess it can only be a problem of deleting attributes from provenance
>>> repo and no FF contents right as they gone fast enough ?
>>>
>>> Best Regards,
>>> *Emanuel Oliveira*
&

Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
g
>> term
>>> but rather the records are there long enough to support flow management
>> use
>>> cases but are always being exported to a long term store such as Atlas or
>>> even just stored in HDFS or other locations for additional use.  One
>>> day...a sweet graph database...
>>>
>>> Thanks
>>> Joe
>>>
>>> On Thu, Jan 30, 2020 at 10:29 AM Emanuel Oliveira 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Some recap on NiFi concepts:
>>>>
>>>>- Content Repository stores FF contents.
>>>>- Data Provenance events -used to check lineage of history of FFs-
>>> only
>>>>stores pointers to FFs (not contents).
>>>>- so one can have data deleted and still access lineage/data
>>> provenance
>>>>history.
>>>>
>>>> Heres a lof of in-depth on the subject, but above 3 points are the
>>>> summary of all:
>>>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>>>>
>>>>
>>>> *DATA - persistent data only exists in 2 scenarios:*
>>>>
>>>>- while your flow file running.
>>>>- archived on content repository for 12h (to allow access contents
>>> when
>>>>using inspect data provenance/lineage).
>>>>
>>>>
>> https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418
>>>>
>>>> *PROVENANCE EVENTS (LINEAGE) OF DATA:*
>>>>
>>>>- contains only provenance attributes and FF uuid etcbut NO
>> CONTENTS,
>>>>available for 24h unless increasing/changed on config files.
>>>>-
>>>>
>>>>
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>>>
>>>>
>>>> So as you see both context by default expire daily. fast enough that
>> dont
>>>> think GDPR is any problem or any action needed.
>>>> Now one can always boosts retention of just data provenance events for
>>>> months, 1 year or whatever suits. But data is long gone anyway.
>>>>
>>>> Best Regards,
>>>> *Emanuel Oliveira*
>>>>
>>>>
>>>>
>>>> On Thu, Jan 30, 2020 at 2:26 PM u...@moosheimer.com >>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>> GDPR doesnt need milisecond realtime deletion right ?)
>>>>> right.
>>>>>
>>>>>> since inbound FFs have
>>>>>>normally hundreds, thousands of records that will need to split,
>>>>> aggregate,
>>>>>>in complex flow file, implementing a clean
>>>>> It depends on your application. Not everyone uses NiFi for IoT and
>>>>> therefore a single record may be included.
>>>>>
>>>>>> In my opinion your answer to business/management gate keepers is
>> that
>>>>> data
>>>>>> will be stored on data provenance for 24h (default) which can be
>>>>>> configured, and that
>>>>> This is not necessarily the point of the Data Lineage, that the
>>>>> information is deleted after 24 hours (or whatever is configured).
>>>>> If Data Lineage is needed (revision, legal requirements etc.), then
>>>>> deleting the data after a defined time is not an option.
>>>>>
>>>>> This is the reason why Atlas supports it.
>>>>>
>>>>> Best Regards,
>>>>> Uwe
>>>>>
>>>>> Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira:
>>>>>> Hi, dont think makes sense an api for atomic records:
>>>>>>
>>>>>>1. one configure retention od data provenance (default 24h is
>>> "good
>>>>>>enough" GDPR doesnt need milisecond realtime deletion right ?)
>>>>>>
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>>>>>2. even if there would be one api to delete FF's with an
>>> attribute =
>>>>>>, that would normally be useless as well, since inbound
>>> FFs
>>>>> have
>>>>>>normally hundreds, thousands of records that will need to split,
>>>>> aggregate,
>>>>>>in complex flow file, implementing a clean up an nano atomic
>> level
>>>>> would be
>>>>>>to hard and extra effort not needed, since your target single
>>> record
>>>>> would
>>>>>>surely be part of multiple FF UUIDs, some only holding your
>>> record,
>>>>> but mot
>>>>>>surefly will have 100s, 100s of other records including your
>>> record
>>>>>>somewhere on the middle.
>>>>>>
>>>>>>
>>>>>> In my opinion your answer to business/management gate keepers is
>> that
>>>>> data
>>>>>> will be stored on data provenance for 24h (default) which can be
>>>>>> configured, and that
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> *Emanuel Oliveira*
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 30, 2020 at 1:54 PM u...@moosheimer.com <
>>> u...@moosheimer.com
>>>>>> wrote:
>>>>>>
>>>>>>> Dear NiFi developer team,
>>>>>>>
>>>>>>> NiFi's Data Provenance and Data Lineage is perfectly adequate in
>> the
>>>>>>> environment of NiFi, so there is often no need to use Atlas.
>>>>>>>
>>>>>>> When using NiFi with customer data a problem arises.
>>>>>>> The problem is the GDPR requirement that a user has the right to
>> be
>>>>>>> forgotten. Unfortunately, I can't find any API call or information
>>> on
>>>>>>> how to delete individual user data from the NiFi Provenance
>>> Repository
>>>>>>> based on a user-defined attribute and its defined characteristics.
>>>>>>>
>>>>>>> A delete request like "delete all data and dependencies where the
>>>>>>> attribute XYZ has the value 123" is currently not possible to my
>>>>> knowledge.
>>>>>>> My questions are:
>>>>>>> Is this actually possible and how? And if not, is it planned?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Uwe
>>>>>>>
>>>>>



Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
Joe,

thank you for the detailed and final clarification.
With your statement I know how to argue with my clients.

I would like to share one last idea.

NiFi is being used more and more in Europe. And SMBs/SMEs are starting
to deal with NiFi.
In contrast to the US, the share of SMBs in Europe is extremely high and
therefore a huge market.
And in Europe it is not uncommon to speak of an SMB when the company has
1000 employees.

For SMBs, NiFi is a good (and often the best) entry into the world of
large data such as IoT or process chain events, and often that is enough
for them.
They attach a Postgres or maybe JanusGraph and whatever else they need
and that is enough for their business case.

It would be interesting to expand the NiFi Data Provenance/Data Lineage
for this customer group.
Not everyone starts with HDF or wants to have a Hadoop installation
right away. But they all have the GDPR problem.
And they don't necessarily want to have Atlas, Ranger and
HBase/Cassandra in addition, because they don't have the
personnel/expertise to do so, because it's not their main business.

Even if the flood of data in the Data Lineage becomes extremely high
over time, it would be interesting to expand the possibilities in NiFi
here as well.
Maybe it would be a good idea to extend the ProvenanceReporter to store
in S3 (internally in a Ceph SAN or externally in a S3 Cloud). Then the
hard disk limits would be solved. The whole thing would be routed over
e.g. Kafka or NiFi S2S or whatever to avoid latency problems.

Maybe my idea is unrealistic, but I think it can't hurt to discuss it.

Thanks
Uwe

Am 30.01.2020 um 16:32 schrieb Joe Witt:
> Mike,
>
> It was created on this side of the Atlantic because when people do care
> about such things - they REALLY care.
>
> I anticipate more and more people will care and I hope that day comes
> soon.  I'm proud of NiFi's ability to be a leader here because if your flow
> management solution between sensors and processing and storage systems
> tells you where things came from and went to it is a heck of a good start.
>
> What exists in our provenance data is information about the data but this
> can be 'any attribute' put on a flow file throughout its life in the flow.
> We simply cannot guarantee this wont be 'content'.  The notion of what is
> metadata vs content gets blurry fast.
>
> Uwe,
>
> The data provenance capabilities within NiFi do no support the ability to
> 'delete records' based on specified parameters.  The only mechanism is
> space or time based age off.  For now, whatever the obligation is to
> respond to a right to be forgotten request should be what the provenance
> within NiFi is configured to hold.  If for instance you have 24 hours then
> provenance in NiFi should hold no more than 24 hours.
>
> I doubt this is something we'll be able to spend time on sooner but I agree
> the idea of being able to purge out records is a good one based on more
> precise parameters.
>
> The intent is not that the built-in nifi provenance store is for long term
> but rather the records are there long enough to support flow management use
> cases but are always being exported to a long term store such as Atlas or
> even just stored in HDFS or other locations for additional use.  One
> day...a sweet graph database...
>
> Thanks
> Joe
>
> On Thu, Jan 30, 2020 at 10:29 AM Emanuel Oliveira 
> wrote:
>
>> Hi,
>>
>> Some recap on NiFi concepts:
>>
>>- Content Repository stores FF contents.
>>- Data Provenance events -used to check lineage of history of FFs- only
>>stores pointers to FFs (not contents).
>>- so one can have data deleted and still access lineage/data provenance
>>history.
>>
>> Heres a lof of in-depth on the subject, but above 3 points are the
>> summary of all:
>> https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
>>
>>
>> *DATA - persistent data only exists in 2 scenarios:*
>>
>>- while your flow file running.
>>- archived on content repository for 12h (to allow access contents when
>>using inspect data provenance/lineage).
>>
>> https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418
>>
>>
>> *PROVENANCE EVENTS (LINEAGE) OF DATA:*
>>
>>- contains only provenance attributes and FF uuid etcbut NO CONTENTS,
>>available for 24h unless increasing/changed on config files.
>>-
>>
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>
>>
>>
>> So as you see both context by default expire daily. fast enough that dont
>> think GDPR is any problem or any ac

Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
@Mike
However, this is also partly very frustrating, what we have to consider here. 
But also pretty fascinating.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 30.01.2020 um 16:23 schrieb Mike Thomsen :
> 
> That's actually a pretty fascinating use case. Our experience on this side
> of the Atlantic is that few people really care about lineage.
> 
>> On Thu, Jan 30, 2020 at 9:48 AM u...@moosheimer.com 
>> wrote:
>> 
>> I think you have the wrong picture.
>> 
>> Data lineage systems like Atlas and similar are pushed because GDPR
>> prescribes it!
>> Data Lineage is by no means a pure "internal diagnostic" but has a legal
>> background.
>> 
>> Thus GDPR defines a recording requirement.
>> It states among other things that
>> - a description of the categories of personal data
>> - a description of the categories of recipients of personal data,
>> including recipients in third countries or international organisations
>> Transfer of personal data to a third country or an international
>> organisation
>> - be recorded in an audit-proof manner.
>> 
>> And if you do all this correctly, then you have to make sure that the
>> data is erasable again (right to be forgotten).
>> 
>> By the way, this does not only apply to special Data Lineage systems but
>> also to all log files, backups etc. At least as long as no other legal
>> regulation prohibits this.
>> Data Lineage is therefore not a nice feature for internal diagnostics
>> but a must.
>> 
>> So far, too few companies have thought of this. But more and more are
>> recognizing the necessity.
>> This is also the reason why formerly Hortonworks and now Cloudera work
>> hard on Atlas.
>> 
>>> Am 30.01.2020 um 15:25 schrieb Mike Thomsen:
>>> IANAL, but I would be surprised if NiFi provenance data even legally
>> falls
>>> under the Right to Be Forgotten because it's internal diagnostic data
>> that
>>> is highly ephemeral.
>>> 
>>> On Thu, Jan 30, 2020 at 9:07 AM Emanuel Oliveira 
>> wrote:
>>> 
>>>> Hi, dont think makes sense an api for atomic records:
>>>> 
>>>>   1. one configure retention od data provenance (default 24h is "good
>>>>   enough" GDPR doesnt need milisecond realtime deletion right ?)
>>>> 
>>>> 
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>>>   2. even if there would be one api to delete FF's with an attribute =
>>>>   , that would normally be useless as well, since inbound FFs
>>>> have
>>>>   normally hundreds, thousands of records that will need to split,
>>>> aggregate,
>>>>   in complex flow file, implementing a clean up an nano atomic level
>>>> would be
>>>>   to hard and extra effort not needed, since your target single record
>>>> would
>>>>   surely be part of multiple FF UUIDs, some only holding your record,
>> but
>>>> mot
>>>>   surefly will have 100s, 100s of other records including your record
>>>>   somewhere on the middle.
>>>> 
>>>> 
>>>> In my opinion your answer to business/management gate keepers is that
>> data
>>>> will be stored on data provenance for 24h (default) which can be
>>>> configured, and that
>>>> 
>>>> 
>>>> Best Regards,
>>>> *Emanuel Oliveira*
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jan 30, 2020 at 1:54 PM u...@moosheimer.com 
>>>> wrote:
>>>> 
>>>>> Dear NiFi developer team,
>>>>> 
>>>>> NiFi's Data Provenance and Data Lineage is perfectly adequate in the
>>>>> environment of NiFi, so there is often no need to use Atlas.
>>>>> 
>>>>> When using NiFi with customer data a problem arises.
>>>>> The problem is the GDPR requirement that a user has the right to be
>>>>> forgotten. Unfortunately, I can't find any API call or information on
>>>>> how to delete individual user data from the NiFi Provenance Repository
>>>>> based on a user-defined attribute and its defined characteristics.
>>>>> 
>>>>> A delete request like "delete all data and dependencies where the
>>>>> attribute XYZ has the value 123" is currently not possible to my
>>>> knowledge.
>>>>> My questions are:
>>>>> Is this actually possible and how? And if not, is it planned?
>>>>> 
>>>>> Thanks
>>>>> Uwe
>>>>> 
>> 
>> 



Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
I think you have the wrong picture.

Data lineage systems like Atlas and similar are pushed because GDPR
prescribes it!
Data Lineage is by no means a pure "internal diagnostic" but has a legal
background.

Thus GDPR defines a recording requirement.
It states among other things that
- a description of the categories of personal data
- a description of the categories of recipients of personal data,
including recipients in third countries or international organisations
Transfer of personal data to a third country or an international
organisation
- be recorded in an audit-proof manner.

And if you do all this correctly, then you have to make sure that the
data is erasable again (right to be forgotten).

By the way, this does not only apply to special Data Lineage systems but
also to all log files, backups etc. At least as long as no other legal
regulation prohibits this.
Data Lineage is therefore not a nice feature for internal diagnostics
but a must.

So far, too few companies have thought of this. But more and more are
recognizing the necessity.
This is also the reason why formerly Hortonworks and now Cloudera work
hard on Atlas.

Am 30.01.2020 um 15:25 schrieb Mike Thomsen:
> IANAL, but I would be surprised if NiFi provenance data even legally falls
> under the Right to Be Forgotten because it's internal diagnostic data that
> is highly ephemeral.
>
> On Thu, Jan 30, 2020 at 9:07 AM Emanuel Oliveira  wrote:
>
>> Hi, dont think makes sense an api for atomic records:
>>
>>1. one configure retention od data provenance (default 24h is "good
>>enough" GDPR doesnt need milisecond realtime deletion right ?)
>>
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>>2. even if there would be one api to delete FF's with an attribute =
>>, that would normally be useless as well, since inbound FFs
>> have
>>normally hundreds, thousands of records that will need to split,
>> aggregate,
>>in complex flow file, implementing a clean up an nano atomic level
>> would be
>>to hard and extra effort not needed, since your target single record
>> would
>>surely be part of multiple FF UUIDs, some only holding your record, but
>> mot
>>surefly will have 100s, 100s of other records including your record
>>somewhere on the middle.
>>
>>
>> In my opinion your answer to business/management gate keepers is that data
>> will be stored on data provenance for 24h (default) which can be
>> configured, and that
>>
>>
>> Best Regards,
>> *Emanuel Oliveira*
>>
>>
>>
>> On Thu, Jan 30, 2020 at 1:54 PM u...@moosheimer.com 
>> wrote:
>>
>>> Dear NiFi developer team,
>>>
>>> NiFi's Data Provenance and Data Lineage is perfectly adequate in the
>>> environment of NiFi, so there is often no need to use Atlas.
>>>
>>> When using NiFi with customer data a problem arises.
>>> The problem is the GDPR requirement that a user has the right to be
>>> forgotten. Unfortunately, I can't find any API call or information on
>>> how to delete individual user data from the NiFi Provenance Repository
>>> based on a user-defined attribute and its defined characteristics.
>>>
>>> A delete request like "delete all data and dependencies where the
>>> attribute XYZ has the value 123" is currently not possible to my
>> knowledge.
>>> My questions are:
>>> Is this actually possible and how? And if not, is it planned?
>>>
>>> Thanks
>>> Uwe
>>>



Re: Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
Hi,

> GDPR doesnt need milisecond realtime deletion right ?)
right.

> since inbound FFs have
>normally hundreds, thousands of records that will need to split, aggregate,
>in complex flow file, implementing a clean
It depends on your application. Not everyone uses NiFi for IoT and
therefore a single record may be included.

> In my opinion your answer to business/management gate keepers is that data
> will be stored on data provenance for 24h (default) which can be
> configured, and that

This is not necessarily the point of the Data Lineage, that the
information is deleted after 24 hours (or whatever is configured).
If Data Lineage is needed (revision, legal requirements etc.), then
deleting the data after a defined time is not an option.

This is the reason why Atlas supports it.

Best Regards,
Uwe

Am 30.01.2020 um 15:06 schrieb Emanuel Oliveira:
> Hi, dont think makes sense an api for atomic records:
>
>1. one configure retention od data provenance (default 24h is "good
>enough" GDPR doesnt need milisecond realtime deletion right ?)
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-provenance-repository-properties
>2. even if there would be one api to delete FF's with an attribute =
>, that would normally be useless as well, since inbound FFs have
>normally hundreds, thousands of records that will need to split, aggregate,
>in complex flow file, implementing a clean up an nano atomic level would be
>to hard and extra effort not needed, since your target single record would
>surely be part of multiple FF UUIDs, some only holding your record, but mot
>surefly will have 100s, 100s of other records including your record
>somewhere on the middle.
>
>
> In my opinion your answer to business/management gate keepers is that data
> will be stored on data provenance for 24h (default) which can be
> configured, and that
>
>
> Best Regards,
> *Emanuel Oliveira*
>
>
>
> On Thu, Jan 30, 2020 at 1:54 PM u...@moosheimer.com 
> wrote:
>
>> Dear NiFi developer team,
>>
>> NiFi's Data Provenance and Data Lineage is perfectly adequate in the
>> environment of NiFi, so there is often no need to use Atlas.
>>
>> When using NiFi with customer data a problem arises.
>> The problem is the GDPR requirement that a user has the right to be
>> forgotten. Unfortunately, I can't find any API call or information on
>> how to delete individual user data from the NiFi Provenance Repository
>> based on a user-defined attribute and its defined characteristics.
>>
>> A delete request like "delete all data and dependencies where the
>> attribute XYZ has the value 123" is currently not possible to my knowledge.
>>
>> My questions are:
>> Is this actually possible and how? And if not, is it planned?
>>
>> Thanks
>> Uwe
>>



Provenance Repository and GDPR

2020-01-30 Thread u...@moosheimer.com
Dear NiFi developer team,

NiFi's Data Provenance and Data Lineage is perfectly adequate in the
environment of NiFi, so there is often no need to use Atlas.

When using NiFi with customer data a problem arises.
The problem is the GDPR requirement that a user has the right to be
forgotten. Unfortunately, I can't find any API call or information on
how to delete individual user data from the NiFi Provenance Repository
based on a user-defined attribute and its defined characteristics.

A delete request like "delete all data and dependencies where the
attribute XYZ has the value 123" is currently not possible to my knowledge.

My questions are:
Is this actually possible and how? And if not, is it planned?

Thanks
Uwe


Re: [System Requirement] Please support to me about system requirement

2019-01-05 Thread u...@moosheimer.com
Hi Tuan,

Have a look at this page, maybe it will help you. 
https://community.hortonworks.com/articles/135337/nifi-sizing-guide-deployment-best-practices.html

Best Regards,
Uwe

> Am 05.01.2019 um 06:38 schrieb Dao Cong Tuan :
> 
> Dear Apache NiFi support team,
>  
> Currently, We development report system and using Open ETL (Apache NiFi) to 
> migrate text file to SQL Server.
>  
> 
>  
> On official web page, System requirements information is incomplete.
> How to estimate RAM. CPU, Diskspace….?
>  
> Please support to me.
> 
>  
>  
>  
> Best Regards,
> -- === --
> 
> Dao Cong Tuan 
> IT Solution Dept - IT Div - ADM Unit.
> 
>Honda Vietnam Co., Ltd.
>Head Office & Factory
>Address: Phuc Yen town, Vinh Phuc prov, Vietnam
>Tel: (+84) 211-3868-888  ext: 6270
>Mobile: (+84)982930348 (Group call - 130661)
>Email: sys_dc_t...@honda.com.vn 
>Website: http://www.honda.com.vn
> 
> Statement of Confidentiality: 
> The contents of this e-mail message and any attachments are confidential and 
> are intended solely for addressee. The information may also be legally 
> privileged. This transmission is sent in trust, for the sole purpose of 
> delivery to the intended recipient. If you have received this transmission in 
> error, any use, reproduction or dissemination of this transmission is 
> strictly prohibited. If you are not the intended recipient, please 
> immediately notify the sender by reply e-mail or phone and delete this 
> message and its attachments, if any.
> -- === --
>  


Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

2018-10-27 Thread u...@moosheimer.com
Do you really want to mix provenance and data lineage with logging/error 
information?

Writing exception information/logging information within an attribute is not a 
bad idea in my opinion.
If a user wants to use this for routing, why not ... or whatever the user wants 
to do.

I could imagine that this can be switched on and off by a property via config. 
E.g. in development on and on production off.

Regards,
Uwe

> Am 26.10.2018 um 09:26 schrieb Pierre Villard :
> 
> Adding another option to the list.
> 
> Peter - if I understand correctly and based on my own experience, the idea
> is not to have an 'exception' attribute to perform custom routing after the
> failure relationship but rather have a more user friendly way to see what
> happened without going through all the logs for a given flow file.
> 
> If that's correct, then could we add this information somehow to the
> provenance event generated by the processor? Ideally adding a new field to
> a provenance event or using the existing 'details' field?
> 
> Pierre
> 
> 
> Le ven. 26 oct. 2018 à 08:40, Koji Kawamura  a
> écrit :
> 
>> Hi all,
>> 
>> I'd like to add another option to Matt's list of solutions:
>> 
>> 4) Add a processor property, 'Enable detailed error handling'
>> (defaults to false), then toggle available list of relationships. This
>> way, existing flows such as Peter's don't have to change, while he can
>> opt-in new relationships. RouteOnAttribute can be a reference
>> implementation.
>> 
>> I like the idea of thinking relationships as potential exceptions. It
>> can be better if relationships have hierarchy.
>> Some users need more granular relationships while others don't.
>> For NiFi 2.0 or later, supporting relationship hierarchy at framework
>> can mitigate having additional property at each processor.
>> 
>> Thanks,
>> Koji
>> On Fri, Oct 26, 2018 at 11:49 AM Matt Burgess 
>> wrote:
>>> 
>>> Peter,
>>> 
>>> Totally agree, RDBMS/JDBC is in a weird class as always, there is a
>>> teaspoon of exception types for an ocean of causes. For NiFi 1.x, it
>>> seems like we need to pick from a set of less-than-ideal solutions:
>>> 
>>> 1) Add new relationships, but then your (possibly hundreds of)
>>> processors are invalid
>>> 2) Add new auto-terminated relationships, but then your
>>> previously-handled errors are "lost"
>>> 3) Add an attribute, but then each NiFi instance/release/flow is
>>> responsible for parsing the error and handling it as desired.
>>> 
>>> We could mitigate 1-2 with a tool that updates your flow/template by
>>> sending all new failure relationships to the same target as the
>>> existing one, but then the tool itself suffers from maintainability
>>> issues (as does option #3). If we could recognize that the new
>>> relationships are self-terminated and then send the errors out to the
>>> original failure relationship, that could be quite confusing to the
>>> user, especially as time goes on (how to suppress the "new" errors,
>>> e.g.).
>>> 
>>> IMHO I think we're between a rock and a hard place here, I guess with
>>> great entropy comes great responsibility :P
>>> 
>>> P.S. For your use case, is the workaround to just keep retrying? Or
>>> are there other constraints at play?
>>> 
>>> Regards,
>>> Matt
>>> 
>>> On Thu, Oct 25, 2018 at 10:27 PM Peter Wicks (pwicks) 
>> wrote:
 
 Matt,
 
 If I were to split an existing failure relationship into several
>> relationships, I do not think I would want to auto-terminate in most cases.
>> Specifically, I'm interested in a failure relationship for a database
>> disconnect during SQL execution (database was online when the connection
>> was verified in the DBCP pool, but went down during execution). If I were
>> to find a way to separate this into its own relationship, I do not think
>> most users would appreciate it being a condition silently not handled by
>> the normal failure path.
 
 Thanks,
  Peter
 
 -Original Message-
 From: Matt Burgess [mailto:mattyb...@apache.org]
 Sent: Friday, October 26, 2018 10:18 AM
 To: dev@nifi.apache.org
 Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that
>> caused failure in an attribute
 
 NiFi (as of the last couple releases I think) has the ability to set
>> auto-terminating relationships; this IMO is one of those use cases (for
>> NiFi 1.x). If new relationships are added, they could default to
>> auto-terminate; then the existing processors should remain valid.
 However we might want an "omnibus Jira" to capture those relationships
>> we'd like to remove the auto-termination from in NiFi 2.0.
 
 Regards,
 Matt
 On Thu, Oct 25, 2018 at 10:12 PM Peter Wicks (pwicks) <
>> pwi...@micron.com> wrote:
> 
> Mark,
> 
> I agree with you that this is the best option in general terms.
>> After thinking about it some more I think the biggest use case is for
>> troubleshooting. If a file routes to failure, you need to be 

Re: Graph database support w/ NiFi

2018-10-27 Thread u...@moosheimer.com
imes seems simple, but only for well-aligned 
> and well-prepared data. Take provenance for example, the lineage is based on 
> time (if you sort the nodes) rather than an explicit relationship.  But that 
> can be for another discussion :)
>
> Regards,
> Matt
>
>> On Oct 15, 2018, at 4:38 PM, Mike Thomsen  wrote:
>>
>> Uwe,
>>
>> I had a chance to get into JanusGraph w/ Gremlin Server today. Any thoughts
>> on how you would integrate that? I have some inchoate thoughts about how to
>> build some sort of Avro-based reader setup so you can do strongly typed
>> associations sorta like this:
>>
>> {
>>  "from": {
>>"type": "PersonRecord",
>>"value": { }
>>  },
>>  "to": {
>>"type": "PersonRecord",
>>"value": { }
>>  },
>>  "direction": "out",
>>  "edgeLabel": "emailed"
>> }
>>
>> We could mix that with the schema registry APIs to generate Gremlin syntax
>> to send to the Gremlin server.
>>
>> First time I've done this, so please (Matt too) let me know what you think.
>>
>> Thanks,
>>
>> Mike
>>
>>> On Sun, Oct 14, 2018 at 6:07 AM Mike Thomsen  wrote:
>>>
>>> We have a Neo4J processor in a PR, but it is very much tied to Neo4J and
>>> Cypher. I was raising the issue that we might want to take that PR and
>>> extend it into an "ExecuteCypherQuery" processor with controller services
>>> that use either cypher for gremlin or the neo4j driver.
>>>
>>> On Sun, Oct 14, 2018 at 6:03 AM u...@moosheimer.com 
>>> wrote:
>>>
>>>> Mike,
>>>>
>>>> Cypher for Gremlin is a good idea. We can start with it and then later
>>>> allow an alternative so that users can use either Cypher or Gremlin
>>>> directly.
>>>>
>>>> To set the focus on Neo4J or Janusgraph or xyz is in my opinion not
>>>> target-oriented.
>>>> We should have a NiFi Graph processor that supports Tinkerpop. Via the
>>>> Gremlin server we can support all Tinkerpop capable graph databases
>>>> (
>>>> https://github.com/apache/tinkerpop/blob/master/gremlin-server/conf/gremlin-server-neo4j.yaml
>>>> ).
>>>>
>>>> Via a controller service we can then connect either Neo4J or Janusgraph
>>>> or any other graph DB.
>>>> Otherwise we would have to build a processor for each Graph DB.
>>>> We don't do that in NiFi for RDBMS either. There we have an ExecuteSQL
>>>> or PutSQL and say about the controller service what we want to connect.
>>>>
>>>> What do you mean Mike?
>>>>
>>>> Best Regards,
>>>> Uwe
>>>>
>>>>> Am 06.10.2018 um 00:15 schrieb Mike Thomsen:
>>>>> Uwe and Matt,
>>>>>
>>>>> Now that we're dipping our toes into Neo4J and Cypher, any thoughts on
>>>> this?
>>>>> https://github.com/opencypher/cypher-for-gremlin
>>>>>
>>>>> I'm wondering if we shouldn't work with mans2singh to take the Neo4J
>>>> work
>>>>> and push it further into having a client API that can let us inject a
>>>>> service that uses that or one that uses Neo4J's drivers.
>>>>>
>>>>> Mike
>>>>>
>>>>> On Mon, May 14, 2018 at 7:13 AM Otto Fowler 
>>>> wrote:
>>>>>> The wiki discussion should list these and other points of concern and
>>>>>> should document the extent to which
>>>>>> they are to be addressed.
>>>>>>
>>>>>>
>>>>>> On May 12, 2018 at 12:37:59, u...@moosheimer.com (u...@moosheimer.com)
>>>>>> wrote:
>>>>>>
>>>>>> Matt,
>>>>>>
>>>>>> You have some interesting ideas that I really like.
>>>>>> GraphReaders and GraphWriters would be interesting. When I started
>>>>>> writing a graph processor with my idea, the concept was not yet
>>>>>> implemented in NiFi.
>>>>>> I don't find GraphML and GraphSON so tingly because they contain e.g.
>>>>>> the Vertex/Edge IDs and serve as import and export format to my
>>>>>> knowledge (correct me if I'm wrong).
>>>>>>
>>>>>> A ConvertRecordToGraph processor is

Re: Graph database support w/ NiFi

2018-10-14 Thread u...@moosheimer.com
Mike,

Cypher for Gremlin is a good idea. We can start with it and then later
allow an alternative so that users can use either Cypher or Gremlin
directly.

To set the focus on Neo4J or Janusgraph or xyz is in my opinion not
target-oriented.
We should have a NiFi Graph processor that supports Tinkerpop. Via the
Gremlin server we can support all Tinkerpop capable graph databases
(https://github.com/apache/tinkerpop/blob/master/gremlin-server/conf/gremlin-server-neo4j.yaml).

Via a controller service we can then connect either Neo4J or Janusgraph
or any other graph DB.
Otherwise we would have to build a processor for each Graph DB.
We don't do that in NiFi for RDBMS either. There we have an ExecuteSQL
or PutSQL and say about the controller service what we want to connect.

What do you mean Mike?

Best Regards,
Uwe

Am 06.10.2018 um 00:15 schrieb Mike Thomsen:
> Uwe and Matt,
>
> Now that we're dipping our toes into Neo4J and Cypher, any thoughts on this?
>
> https://github.com/opencypher/cypher-for-gremlin
>
> I'm wondering if we shouldn't work with mans2singh to take the Neo4J work
> and push it further into having a client API that can let us inject a
> service that uses that or one that uses Neo4J's drivers.
>
> Mike
>
> On Mon, May 14, 2018 at 7:13 AM Otto Fowler  wrote:
>
>> The wiki discussion should list these and other points of concern and
>> should document the extent to which
>> they are to be addressed.
>>
>>
>> On May 12, 2018 at 12:37:59, u...@moosheimer.com (u...@moosheimer.com)
>> wrote:
>>
>> Matt,
>>
>> You have some interesting ideas that I really like.
>> GraphReaders and GraphWriters would be interesting. When I started
>> writing a graph processor with my idea, the concept was not yet
>> implemented in NiFi.
>> I don't find GraphML and GraphSON so tingly because they contain e.g.
>> the Vertex/Edge IDs and serve as import and export format to my
>> knowledge (correct me if I'm wrong).
>>
>> A ConvertRecordToGraph processor is a good approach, the only question
>> is from which format we can convert?
>>
>> I also think to make a graph processor a bit general we would have to
>> provide a query as input which provides the correct vertex from which
>> the graph should be extended.
>> Maybe like your suggestion with a gremlin query or a small gremlin script.
>>
>> If a vertex is found a new edge and a new vertex are added.
>> It asks how we transmit the individual attributes to the edge and vertex
>> as well as the labels of the edge and vertex? Possibly with NiFi
>> attributes?
>>
>> I have some headaches about the complexity.
>> A small example:
>> Imagine we have a set from a CSV file.
>> The columns are Set ID, Token1, Token2, Token3...
>> ID, Token1,Token2,Token3,Token4,Token5
>> 123, Mary, had, a, little, lamp
>>
>> I want to create a vertex with ID 123 (if not exists). Then I want to
>> check for each token if a vertex exists in the graph database (search
>> for vertex with label "Token" and attribute "name"="Mary"). If the
>> vertex does not exist, the vertex has to be created.
>> Since I want to save e.g. Wikipedia to my graph I want to avoid the
>> supernode problem for the token vertices. I create a few distribution
>> vertices for each vertex that belongs to a token. If there is a vertex
>> for Token1(Mary) then I don't want to make the edge from this vertex to
>> my vertex with the ID 123, but from one of the distribution vertices.
>> If the vertex for the token does not exist, the distribution vertices
>> have also to be created ... and so on...
>>
>> Even with this very simple example it seems to become difficult with a
>> universal processor.
>>
>> In any case I think the idea to implement a graph processor in NiFi is a
>> good one.
>> The more we work on it the more good ideas we get and maybe only I can't
>> see the forest for the trees.
>>
>> One question about Titan. To my knowledge, Titan has been dead for a
>> year and a half and Janusgraph is the successor?
>> Titan has become unofficially Datastax Enterprise Graph?!
>> Supporting Titan could become difficult because Titan does not support
>> my knowledge after TinkerPop 3 and is no longer maintained.
>>
>> I like your idea for a wiki page for more ideas. In the many mails one
>> loses oneself otherwise.
>>
>> Regards,
>> Kay-Uwe
>>
>> Am 12.05.2018 um 16:52 schrieb Matt Burgess:
>>> All,
>>>
>>> As Joe implied, I'm very happy that we are discussing graph tech in
>>> relation to NiFi! N

Re: Graph database support w/ NiFi

2018-05-12 Thread u...@moosheimer.com
 concept, we could have controller services / writers that are
> system-specific (see aspect #4).
>
> 3) Arbitrary data -> Graph: Converting non-graph data into a graph
> almost always takes domain knowledge, which NiFi itself won't have and
> will thus have to be provided by the user. We'd need to make it as
> simple as possible but also as powerful and flexible as possible in
> order to get the most value. We can investigate how each of the
> systems in aspect #2 approaches this, and perhaps come up with a good
> user experience around it.
>
> 4) Organization and implementation:  I think we should make sure to
> keep the capabilities very loosely coupled in terms of which
> modules/NARs/JARs provide which capabilities, to allow for maximum
> flexibility and ease of future development.  I would prefer an
> API/libraries module akin to nifi-hadoop-libraries-nar, which would
> only include Apache Tinkerpop and any dependencies needed to do "pure"
> graph stuff, so probably no TP adapters except tinkergraph (and/or its
> faster fork from ShiftLeft [2]). The reason I say that is so NiFi
> components (and even the framework!) could use graphs in a lightweight
> manner, without lots of heavy and possibly unnecessary dependencies.
> Imagine being able to query your own flows using Gremlin or Cypher!  I
> also envision an API much like the Record API in NiFi but for graphs,
> so we'd have GraphReaders and GraphWriters perhaps, they could convert
> from GraphML to GraphSON or Kryo for example, or in conjunction with a
> ConvertRecordToGraph processor, could be used to support the
> capability in aspect #3 above.  I'd also be looking at bringing in
> Gremlin to the scripting processors, or having a Gremlin based
> scripting bundle as NiFi's graph capabilities mature.
>
> You might be able to tell I'm excited about this discussion ;)  Should
> we get a Wiki page going for ideas, and/or keep it going here, or
> something else?  I'm all ears for thoughts, questions, and ideas
> (especially the ones that might seem crazy!)
>
> Regards,
> Matt
>
> [1] http://tinkerpop.apache.org/providers.html
> [2] https://github.com/ShiftLeftSecurity/tinkergraph-gremlin
>
> On Sat, May 12, 2018 at 8:02 AM, u...@moosheimer.com <u...@moosheimer.com> 
> wrote:
>> Hi Mike,
>>
>> graph database support is not quite as easy as it seems.
>> Unlike relational databases, graphs have not only defined vertices and edges 
>> (labeled vertices and edges), they are directed or not and might have 
>> attributes at the nodes and edges, too.
>>
>> This makes it a bit confusing for a general interface.
>>
>> In general, a graph database should always be accessed via TinkerPop 3 (or 
>> higher), since every professional graph database supports TinkerPop.
>> TinkerPop is for graph databases what jdbc is for relational databases.
>>
>> I tried to create a general NiFi processor for graph databases myself and 
>> then quit.
>> Unlike relational databases, graph databases usually have many dependencies.
>>
>> You do not simply create a data set but search for a particular vertex 
>> (which may still have certain edges) and create further edges and vertices 
>> at that.
>> And the search for the correct node is usually context-related.
>>
>> This makes it difficult to do something general for all requirements.
>>
>> In any case I am looking forward to your concept and how you want to solve 
>> it.
>> It's definitely a good idea but hard to solve.
>>
>> Btw.: You forgot the most important graph database - Janusgraph.
>>
>> Mit freundlichen Grüßen / best regards
>> Kay-Uwe Moosheimer
>>
>>> Am 12.05.2018 um 13:01 schrieb Mike Thomsen <mikerthom...@gmail.com>:
>>>
>>> I was wondering if anyone on the dev list had given much thought to graph
>>> database support in NiFi. There are a lot of graph databases out there, and
>>> many of them seem to be half-baked or barely supported. Narrowing it down,
>>> it looks like the best candidates for a no fuss, decent sized graph that we
>>> could build up with NiFi processors would be OrientDB, Neo4J and ArangoDB.
>>> The first two are particularly attractive because they offer JDBC drivers
>>> which opens the potential to making them even part of the standard
>>> JDBC-based processors.
>>>
>>> Anyone have any opinions or insights on this issue? I might have to do
>>> OrientDB anyway, but if someone has a good feel for the market and can make
>>> recommendations that would be appreciated.
>>>
>>> Thanks,
>>>
>>> Mike




Re: Graph database support w/ NiFi

2018-05-12 Thread u...@moosheimer.com
joe

Wouldn't it be good to integrate Apache Atlas more to NiFi?

What I mean is just using something existing before doing it on any new way.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 12.05.2018 um 13:07 schrieb Joe Witt :
> 
> mike
> 
> Do you mean support to send data to a graphdb?
> 
> A really awesome case would be sending provenance data to one and building
> queries, etc... around it!
> 
> I know mattyb would be all over that.
> 
> Thanks
> 
>> On Sat, May 12, 2018, 7:02 AM Mike Thomsen  wrote:
>> 
>> I was wondering if anyone on the dev list had given much thought to graph
>> database support in NiFi. There are a lot of graph databases out there, and
>> many of them seem to be half-baked or barely supported. Narrowing it down,
>> it looks like the best candidates for a no fuss, decent sized graph that we
>> could build up with NiFi processors would be OrientDB, Neo4J and ArangoDB.
>> The first two are particularly attractive because they offer JDBC drivers
>> which opens the potential to making them even part of the standard
>> JDBC-based processors.
>> 
>> Anyone have any opinions or insights on this issue? I might have to do
>> OrientDB anyway, but if someone has a good feel for the market and can make
>> recommendations that would be appreciated.
>> 
>> Thanks,
>> 
>> Mike
>> 



Re: Graph database support w/ NiFi

2018-05-12 Thread u...@moosheimer.com
Hi Mike,

graph database support is not quite as easy as it seems.
Unlike relational databases, graphs have not only defined vertices and edges 
(labeled vertices and edges), they are directed or not and might have 
attributes at the nodes and edges, too.

This makes it a bit confusing for a general interface. 

In general, a graph database should always be accessed via TinkerPop 3 (or 
higher), since every professional graph database supports TinkerPop.
TinkerPop is for graph databases what jdbc is for relational databases.

I tried to create a general NiFi processor for graph databases myself and then 
quit.
Unlike relational databases, graph databases usually have many dependencies.

You do not simply create a data set but search for a particular vertex (which 
may still have certain edges) and create further edges and vertices at that.
And the search for the correct node is usually context-related. 

This makes it difficult to do something general for all requirements.

In any case I am looking forward to your concept and how you want to solve it.
It's definitely a good idea but hard to solve.

Btw.: You forgot the most important graph database - Janusgraph.

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 12.05.2018 um 13:01 schrieb Mike Thomsen :
> 
> I was wondering if anyone on the dev list had given much thought to graph
> database support in NiFi. There are a lot of graph databases out there, and
> many of them seem to be half-baked or barely supported. Narrowing it down,
> it looks like the best candidates for a no fuss, decent sized graph that we
> could build up with NiFi processors would be OrientDB, Neo4J and ArangoDB.
> The first two are particularly attractive because they offer JDBC drivers
> which opens the potential to making them even part of the standard
> JDBC-based processors.
> 
> Anyone have any opinions or insights on this issue? I might have to do
> OrientDB anyway, but if someone has a good feel for the market and can make
> recommendations that would be appreciated.
> 
> Thanks,
> 
> Mike



Re: [DISCUSS] Google Summer of Code 2018

2018-02-27 Thread u...@moosheimer.com
Hi Matt,

not sure it matches the GSoC.
I am thinking about process mining. Take the Provenance/Data Lineage
information from the NiFi repository or from Apache Atlas (and maybe
some additional information from the processors) and analyze whether the
processes are optimal and display it graphically.

See https://en.wikipedia.org/wiki/Process_mining or
https://coda.fluxicon.com/book/intro.html

Best Regards,
Uwe

Am 28.02.2018 um 06:12 schrieb Joe Witt:
> Matt
>
> Did you have some ideas/features/enhancements in mind you think would
> be good to propose?
>
> Thanks
> Joe
>
> On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgess  wrote:
>> If you haven't heard yet, the Apache Software Foundation was selected
>> as an organization for this year's Google Summer of Code [1]. I've
>> seen activity on other Apache projects' mailing lists requesting ideas
>> for issues, features, components, etc. that could be good
>> proposals/ideas for GSoC, and I'd like to also make that request of
>> this community.
>>
>> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no
>> guarantee we would get someone to work on it, but it could be a good
>> push to move some isolated bits of functionality forward that may not
>> get much attention otherwise."
>>
>> Thoughts?
>>
>> Thanks in advance,
>> Matt
>>
>> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/




Re: couple questions

2017-06-23 Thread u...@moosheimer.com
Hi Michael,

you can use Apache Atlas as provenance sink.
There is a bridge for Atlas mentioned on Hortonworks and also a current Task 
(https://issues.apache.org/jira/browse/NIFI-3709).

Best regards
Uwe

> Am 23.06.2017 um 18:58 schrieb Knapp, Michael :
> 
> Hi,
> 
> My team is starting to do more and more with NiFi, and I had several 
> questions for you.
> 
> First, we are thinking of having multiple separate NiFi flows but we want a 
> single source for data provenance.  In the source code I only see these 
> implementations: PersistentProvenanceRepository, 
> VolatileProvenanceRepository, and MockProvenanceRepository.  I was hoping to 
> find a web service that I could run separately from NiFi, and have all my 
> NiFi clusters publish events to that.  Is there any public implementation 
> like that?
> 
> Also, we are thinking seriously about using repositories that are not backed 
> by the local file system.  I am helping an intern write an implementation of 
> ContentRepository that is backed by S3, he has already had some success with 
> this (we started by copying a lot from the VolatileContentRepository).  I’m 
> also interested in implementations backed by Kafka and Pachyderm.  If that 
> works, we will probably also need the other repositories to follow, 
> specifically the FlowFileRepository.  Unfortunately, I cannot find a lot of 
> documentation on how to write these repositories, I have just been figuring 
> things out by reviewing the source code and unit tests, but it is still very 
> confusing to me.  So I was wondering:
> 
> 1.   Has anybody been working on alternative ContentRepository 
> implementations?  Specifically with S3, pachyderm, kafka, or some 
> databases/datastores?
> 
> 2.   Is there any thorough documentation regarding the contracts that 
> these implementations must adhere to? (besides source code and unit tests)
> 
> I’m mainly interested in alternative repositories so I can make NiFi truly 
> fault tolerant (one node dies, and the others immediately take over its 
> work).  Also it would greatly simplify a lot of infrastructure/configuration 
> management for us, could help us save some money, and might help us with 
> compliance issues.  On the down side, it might hurt the file throughput.
> 
> Please let me know,
> 
> Michael Knapp
> 
> 
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.


Re: Closing in on a NiFi 1.2.0 release?

2017-04-11 Thread u...@moosheimer.com
Twitter processor stays. Great news!
Thanks to Joey!

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer

> Am 11.04.2017 um 21:20 schrieb Joe Witt :
> 
> Team,
> 
> Couple of good news updates on the release front is we're in the teens
> on number of tickets AND Joey Frazee figured out a way to clean up the
> twitter/json.org Cat-X dependency issue so our twitter processor
> stays!
> 
> Will keep working the march down to 0 tickets.  A lot of good stuff in
> this release so this should be a fun one!
> 
> Thanks
> Joe
> 
>> On Tue, Apr 4, 2017 at 7:37 PM, Tony Kurc  wrote:
>> Joe et. al,
>> I think this one is close too (mainly dotting i's and crossing t's on
>> license and notice)
>> 
>> https://issues.apache.org/jira/browse/NIFI-3586
>> 
>>> On Tue, Apr 4, 2017 at 2:23 PM, Joe Witt  wrote:
>>> 
>>> Team,
>>> 
>>> Another update on efforts to close-in on the NiFi 1.2.0 release.
>>> We're below 20 JIRAs now and there has been good momentum.  A couple
>>> items still need work but look really important and then there is
>>> review traction/feedback cycles.  Will just keep monitoring it and
>>> actively defending to close the loop on 1.2.0 until we're there.
>>> 
>>> Thanks
>>> Joe
>>> 
 On Tue, Mar 28, 2017 at 9:45 AM, Joe Witt  wrote:
 Team,
 
 Status of JIRA cleanup toward an Apache NiFi 1.2.0 release candidate
 which Mr Bende has so wonderfully volunteered to RM:
 
 There are 20 open JIRAs as of now.
 
 12 of 20 have PRs that appear ready/close to ready.
 
 One pattern I noticed quite a bit on the 1.2.0 release is heavy usage
 of 'squatter JIRAs' whereby someone makes a JIRA and with or without
 any review traction and for non blocking issues sets the fix version.
 This practice should be avoided.  The fix version should be reserved
 for once there is a blocker item or there is something with a patch
 contributed and review progress closing in on a merge.
 
 One of them means we need to punt the Twitter processor most likely.
 Don't believe there were new releases to resolve that licensing issue
 by the third party dependency.  I'll take that on.
  https://issues.apache.org/jira/browse/NIFI-3089
 
 Two of them are build failure issues which means windows and linux
 builds break (highly repeatable):
  https://issues.apache.org/jira/browse/NIFI-3441
  https://issues.apache.org/jira/browse/NIFI-3440
 
 A couple need to either be moved out or addressed for implementation
 or review but it isn't clear to me their status:
  https://issues.apache.org/jira/browse/NIFI-3155
  https://issues.apache.org/jira/browse/NIFI-1280
  https://issues.apache.org/jira/browse/NIFI-2656
  https://issues.apache.org/jira/browse/NIFI-2886
 
 Some are really important and being worked still:
  https://issues.apache.org/jira/browse/NIFI-3520
 
 Thanks
 Joe
 
> On Wed, Mar 22, 2017 at 9:12 PM, Joe Witt  wrote:
> Sweet!  I'll take that deal all day.  Thanks Bryan!
> 
>> On Wed, Mar 22, 2017 at 8:26 PM, Bryan Bende  wrote:
>> Joe,
>> 
>> As of today I believe the PR for NIFI-3380 (component versioning)
>>> should
>> address all of the code review feedback and is in a good place.
>> 
>> Would like to run through a few more tests tomorrow, and baring any
>> additional feedback from reviewers, we could possibly merge that
>>> tomorrow.
>> That PR will also bump master to use the newly released NAR plugin.
>> 
>> Since I got a warm-up with NAR plugin, I don't mind taking on release
>> manager duties for 1.2, although I would still like help on the JIRA
>> whipping. I imagine there's still a bit of work to narrow down the
>> remaining tickets.
>> 
>> -Bryan
>> 
>>> On Wed, Mar 22, 2017 at 7:35 PM Joe Witt  wrote:
>>> 
>>> Bryan
>>> 
>>> How are things looking for what you updated on?  The nar plugin of
>>> course is out.
>>> 
>>> We got another question on the user list for 1.2 so I just want to
>>> make sure we're closing in.  I'll start doing the JIRA whipping.
>>> 
>>> Thanks
>>> JOe
>>> 
>>> On Mon, Mar 13, 2017 at 3:22 PM, Bryan Bende 
>>> wrote:
 Just a quick update on this discussion...
 
 On Friday we were able to post an initial PR for the component
 versioning work [1].
 
 I believe we are ready to move forward with a release of the NAR
>>> Maven
 plugin, there are three tickets to be included in the release [2].
 
 If there are no objections, I can take on the release manager duties
 for the NAR plugin, and can begin to kick off the process tomorrow.
 
 -Bryan
 
 [1] 

Re: Central processor repository

2017-02-25 Thread u...@moosheimer.com
Pretty good idea!
Would appreciate a place where we all can upload sources (together with some 
information) and everybody can use or modify it.

Best Regards,
Uwe

> Am 25.02.2017 um 13:10 schrieb Uwe Geercken :
> 
> Hello,
> 
> I remember a while ago that there was a short discussion if it would be good 
> to have a central place somewhere where people could upload the processors 
> they created and others could download them from this central point. It would 
> make life easier compared to surfing the web to find them and I believe would 
> also add to the popularity of Nifi.
> 
> I don't know if this has been discussed more deeply. Are there any plans for 
> such a central repository?
> 
> Rgds,
> 
> Uwe



Re: Nifi processor documentation not showing up in Help

2017-02-24 Thread u...@moosheimer.com
Hi Uwe,

ok, now I understand.
Haven't recognized that yet.

You are right. There is no help text for any own written processor.
I also found created directories for all of our processors with a valid
index.html file but the content isn't shown on the help page.

@devs
Is this normal behaviour that there are no help informations for own
written processor on the NiFi Help (menu -> right side drop down -> Help)?

Best Regards,
Uwe

Am 24.02.2017 um 12:35 schrieb Uwe Geercken:
> Uwe,
>
> wenn man in Nifi die Help aufruft, dann erscheinen alle Hilfstexte für die 
> Prozessoren. Nifi genriert diese beim Start im Ordner work/docs/components. 
> Auch für meine Prozessoren werden html seite genriert, aber ich kann die 
> Hilfstexte norgendwo finden.
>
> Irgendwo muss ichen Fehler machen.
>
> Gruss,
>
> Uwe
>
>> Gesendet: Freitag, 24. Februar 2017 um 12:29 Uhr
>> Von: "u...@moosheimer.com" <u...@moosheimer.com>
>> An: dev@nifi.apache.org
>> Betreff: Re: Nifi processor documentation not showing up in Help
>>
>> Hi Uwe,
>>
>> It's not really clear to me what exactly is your problem but maybe I can
>> help you?!
>> Send me a direct mail so we can talk in german ;-)
>>
>> Best Regards,
>> Uwe
>>
>> Am 24.02.2017 um 11:19 schrieb Uwe Geercken:
>>> Hello,
>>>  
>>> I am developing some processors for Nifi. They do work in 1.1 but the 
>>> documentation does not show up in the Help.
>>>  
>>> So I set the loglevel to DEBUG and this is what came out:
>>>  
>>> 2017-02-24 11:00:18,538 DEBUG [main] org.apache.nifi.nar.NarUnpacker 
>>> Expanding NAR file: 
>>> /opt/nifi-1.1.1/./lib/com.datamelt.nifi.processors-1.0.0.nar
>>> 2017-02-24 11:00:56,328 DEBUG [main] org.apache.nifi.nar.NarClassLoaders 
>>> Loading NAR file: 
>>> /opt/nifi-1.1.1/./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked
>>> 2017-02-24 11:00:56,330 INFO [main] org.apache.nifi.nar.NarClassLoaders 
>>> Loaded NAR file: 
>>> /opt/nifi-1.1.1/./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked
>>>  as class loader 
>>> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
>>> com.datamelt.nifi.processors.MergeTemplate || 
>>> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
>>> com.datamelt.nifi.processors.GenerateData || 
>>> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
>>> com.datamelt.nifi.processors.SplitToAttribute || 
>>> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
>>> com.datamelt.nifi.processors.RuleEngine || 
>>> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
>>> 2017-02-24 11:00:59,992 DEBUG [main] 
>>> o.apache.nifi.documentation.DocGenerator Documenting: class 
>>> com.datamelt.nifi.processors.MergeTemplate
>>> 2017-02-24 11:01:00,226 DEBUG [main] 
>>> o.apache.nifi.documentation.DocGenerator Documenting: class 
>>> com.datamelt.nifi.processors.GenerateData
>>> 2017-02-24 11:01:00,251 DEBUG [main] 
>>> o.apache.nifi.documentation.DocGenerator Documenting: class 
>>> com.datamelt.nifi.processors.SplitToAttribute
>>> 2017-02-24 11:01:00,367 DEBUG [main] 
>>> o.apache.nifi.documentation.DocGenerator Documenting: class 
>>> com.datamelt.nifi.processors.RuleEngine
>>>  
>>> I see that the appropraite html files are generated in folder 
>>> work/docs/compoments. Here is the folder list and each of the folders 
>>> contains an index.html file:
>>>  
>>> drwxr-xr-x 2 root root 4096 24. Feb 11:01 
>>> com.datamelt.nifi.processors.GenerateData
>>> drwxr-xr-x 2 root root 4096 24. Feb 11:00 
>>> com.datamelt.nifi.processors.MergeTemplate
>>> drwxr-xr-x 2 root root 4096 24. Feb 11:01 
>>> com.datamelt.nifi.processors.RuleEngine
>>> drwxr-xr-x 2 root root 4096 24. Feb 11:01 
>>> com.datamelt.nifi.processors.SplitToAttribute
>>>  
>>> So what am I doing wrong? Can somebody help?
>>>  
>>> Rgds,
>>>  
>>> Uwe
>>
>>



Re: Nifi processor documentation not showing up in Help

2017-02-24 Thread u...@moosheimer.com
Hi Uwe,

It's not really clear to me what exactly is your problem but maybe I can
help you?!
Send me a direct mail so we can talk in german ;-)

Best Regards,
Uwe

Am 24.02.2017 um 11:19 schrieb Uwe Geercken:
> Hello,
>  
> I am developing some processors for Nifi. They do work in 1.1 but the 
> documentation does not show up in the Help.
>  
> So I set the loglevel to DEBUG and this is what came out:
>  
> 2017-02-24 11:00:18,538 DEBUG [main] org.apache.nifi.nar.NarUnpacker 
> Expanding NAR file: 
> /opt/nifi-1.1.1/./lib/com.datamelt.nifi.processors-1.0.0.nar
> 2017-02-24 11:00:56,328 DEBUG [main] org.apache.nifi.nar.NarClassLoaders 
> Loading NAR file: 
> /opt/nifi-1.1.1/./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked
> 2017-02-24 11:00:56,330 INFO [main] org.apache.nifi.nar.NarClassLoaders 
> Loaded NAR file: 
> /opt/nifi-1.1.1/./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked
>  as class loader 
> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
> com.datamelt.nifi.processors.MergeTemplate || 
> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
> com.datamelt.nifi.processors.GenerateData || 
> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
> com.datamelt.nifi.processors.SplitToAttribute || 
> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
> com.datamelt.nifi.processors.RuleEngine || 
> org.apache.nifi.nar.NarClassLoader[./work/nar/extensions/com.datamelt.nifi.processors-1.0.0.nar-unpacked]
> 2017-02-24 11:00:59,992 DEBUG [main] o.apache.nifi.documentation.DocGenerator 
> Documenting: class com.datamelt.nifi.processors.MergeTemplate
> 2017-02-24 11:01:00,226 DEBUG [main] o.apache.nifi.documentation.DocGenerator 
> Documenting: class com.datamelt.nifi.processors.GenerateData
> 2017-02-24 11:01:00,251 DEBUG [main] o.apache.nifi.documentation.DocGenerator 
> Documenting: class com.datamelt.nifi.processors.SplitToAttribute
> 2017-02-24 11:01:00,367 DEBUG [main] o.apache.nifi.documentation.DocGenerator 
> Documenting: class com.datamelt.nifi.processors.RuleEngine
>  
> I see that the appropraite html files are generated in folder 
> work/docs/compoments. Here is the folder list and each of the folders 
> contains an index.html file:
>  
> drwxr-xr-x 2 root root 4096 24. Feb 11:01 
> com.datamelt.nifi.processors.GenerateData
> drwxr-xr-x 2 root root 4096 24. Feb 11:00 
> com.datamelt.nifi.processors.MergeTemplate
> drwxr-xr-x 2 root root 4096 24. Feb 11:01 
> com.datamelt.nifi.processors.RuleEngine
> drwxr-xr-x 2 root root 4096 24. Feb 11:01 
> com.datamelt.nifi.processors.SplitToAttribute
>  
> So what am I doing wrong? Can somebody help?
>  
> Rgds,
>  
> Uwe




Re: [VOTE] Establish Registry, a sub-project of Apache NiFi

2017-02-10 Thread u...@moosheimer.com
+1 (non-binding)

Uwe

> Am 10.02.2017 um 22:18 schrieb Koji Kawamura :
> 
> +1 (non-binding)
> 
> On Feb 11, 2017 5:37 AM, "Jennifer Barnabee" 
> wrote:
> 
> +1 binding
> 
> Sent from my iPhone
> 
>> On Feb 10, 2017, at 2:55 PM, Joe Skora  wrote:
>> 
>> +1 binding
>> 
>> On Fri, Feb 10, 2017 at 2:09 PM, Peter Wicks (pwicks) 
>> wrote:
>> 
>>> +1 (non-binding)
>>> 
>>> -Original Message-
>>> From: Bryan Bende [mailto:bbe...@gmail.com]
>>> Sent: Friday, February 10, 2017 9:41 AM
>>> To: dev@nifi.apache.org
>>> Subject: [VOTE] Establish Registry, a sub-project of Apache NiFi
>>> 
>>> All,
>>> 
>>> Following a solid discussion for the past few days [1] regarding the
>>> establishment of Registry as a sub-project of Apache NiFi, I'd like to
>>> call a formal vote to record this important community decision and
>>> establish consensus.
>>> 
>>> The scope of this project is to define APIs for interacting with
>>> resources that one or more NiFi instances may be interested in, such
>>> as a flow registry for versioned flows, an extension registry for
>>> extensions, and possibly other configuration resources in the future.
>>> In addition, this project will provide reference implementations of
>>> these registries, with the goal of allowing the community to build a
>>> diverse set of implementations, such as a Git provider for versioned
>>> flows, or a bintray provider for an extension registry.
>>> 
>>> I am a +1 and looking forward to the future work in this area.
>>> 
>>> The vote will be open for 72 hours and be a majority rule vote.
>>> 
>>> [ ] +1 Establish Registry, a subproject of Apache NiFi
>>> [ ]   0 Do not care
>>> [ ]  -1 Do not establish Registry, a subproject of Apache NiFi
>>> 
>>> Thanks,
>>> 
>>> Bryan
>>> 
>>> [1] http://mail-archives.apache.org/mod_mbox/nifi-dev/201702.
> mbox/%3CCALo_
>>> M19euo2LLy0PVWmE70FzeLhQRcCtX6TC%3DqoiBVfn4zFQMA%40mail.gmail.com%3E
>>> 


Re: Questions on Topology and custom-processor deployment on NiFi and MiNiFi

2017-01-31 Thread u...@moosheimer.com
Hi Pushkar,

you can automatically update the MiniFi configs using one of three methods:

  * FileChangeIngestor
  * RestChangeIngestor
  * PullHttpChangeIngestor

You have to configure the MiNiFi bootstrap.conf.
Here you can read how to do this:
https://nifi.apache.org/minifi/system-admin-guide.html

To update the MiNiFi config you have to create a Process Group (at NiFi
UI) where you define the MiNiFi workflow.
Save the Process Group as template.

Export the template and convert it to MiNiFi config.yml by using the
MiNiFi Toolkit (https://nifi.apache.org/minifi/download.html).

Hot to transform the template to config.yml?
Change to MINIFI_TOOLKIT_HOME and type: /bin/config.sh transform
[PATH_TO_TEMPLATE_FILE/TEMPLATE_FILE_NAME]
[PATH_TO_CONFIG_GFILE/CONFIG_FILE_NAME]

You don't need to stop anything. If MiNiFi gets a new config then it
checks validity and restarts itself.

Best Regards,
Uwe Moosheimer

Am 31.01.2017 um 19:00 schrieb Pushkara R:
> Hi,
>
> I am a part of a Research Lab in Indian Institute of Science and we are
> currently evaluating MiNiFi as candidate for out IoT deployments. There are
> two things I wanted get clarified as I could not find relavant
> documentation.
>
> 1. It is mentioned in multiple places that using NiFi we can update the
> processor Topology of multiple instances of MiNiFi that are connected to
> it. However, I am not able to figure out how. I assume it would involve
> communication with the MiNiFi bootstrap and passing it the relavant yml
> file, but I don't know how to do it. Could someone please elaborate?
>
> 2. On NiFi, the processor topology can be updated in the UI without
> stopping the entire topology. Only the concerned processors need to be
> stopped. Though MiNiFi does not have a UI, is it necessary that every minor
> edit to the topology needs a restart? Is there a way to go around the
> necessity to restart MiNiFi entirely?
>
> Regards
> Pushkar
>



Re: A few clarifications on MiNiFi

2017-01-26 Thread u...@moosheimer.com
Hi Aakash,

think of MiNiFi like NiFI without UI.

The advantages are (as you can see at https://nifi.apache.org/minifi/):

  * small and lightweight footprint
  * central management of agents
  * generation of data provenance, and
  * integration with NiFi for follow-on dataflow management and full
chain of custody of information

So what is the benefit of MiNiFi?

MiNiFi is designed to run on sensors/clients.
You can define workflows that should run on client side like log
collection/aggregation (ETL), metrics etc. and send them to NiFi or MQTT
or Kafka or what ever.
These tasks should run on the the client side and not on the server
(load sharing)!

You can configure your sensors/clients on a central side and you are
able to distribute new configs automatically (push to clients or all
clients pull the new config periodically).
The benefit is that you can design a workflow an NiFi and send it to
thousand of sensors/clients which are running MiNiFi on a very easy way.
Think of thousand or millions of sensors - how do you like to update
them and change their behaviour?

How do you get the collected data from your sensors?
That's what MiNiFi is for.

> but what are the specific differences between NiFi and MiNiFi
NiFi is the part to define and run workflows within your company. NiFi
ist also the part to define workflows that should run on MiNiFi.
MiNiFi is a small NiFi (as described above) to run on sensors/clients
for things like logfile processing or sending sensor information and so
on to NiFi or others.

We use MiNiFi to collect metrics and logfile information and send them
to NiFi via MQTT (failsafe).
It's pretty straight forward and works like a charm.
Also you generate data provenance information where it starts (on the
client side) and if you use Apache Atlas with NiFi you have one place to
store provenance data for all Enterprise data ;-)

> Also, the documentation is pretty lean at this point. I hope that changes 
> pretty quickly.
You are welcome to help making the documentation better!

Regards,
Uwe Moosheimer

Am 26.01.2017 um 18:16 schrieb Aakash Khochare:
> Greetings,
>
> I am a part of a Research Lab in Indian Institute of Science. We are in 
> process of evaluating various frameworks for our IoT deployments. So, 
> MiNiFi's webpage states "Perspectives of the role of MiNiFi should be from 
> the perspective of the agent acting immediately at, or directly adjacent to, 
> source sensors, systems, or servers. " However targets data ingress and from 
> the overview that I had all the processors are targeted towards 
> ingress/egress/transformation of data. Can anyone specifically point out 
> features that facilitate the "agent acting immediately" aspect? Ofcourse I 
> can make custom processors, but what are the specific differences between 
> NiFi and MiNiFi.
>
>
> Also, the documentation is pretty lean at this point. I hope that changes 
> pretty quickly.
>
>
> Regards,
>
> Aakash Khochare
>
> MTech (Research)
>
> Indian Institute of Science
>



Re: Questions on logging and metrics

2016-04-23 Thread u...@moosheimer.com
Joe,

thanks for the positive response.
Please let me know if I can help somehow.  

Thanks
Uwe

Mit freundlichen Grüßen / best regards
Kay-Uwe Moosheimer
> Am 23.04.2016 um 16:44 schrieb Joe Witt <joe.w...@gmail.com>:
> 
> Uwe
> 
> This is a perfectly fine mailing list - thanks for joining the
> discussion and you bring up some interesting points.  We could have a
> logging appender and relationship with that processor that allow this
> to work cleanly.  Need to think on that more.
> 
> Thanks
> Joe
> 
>> On Sat, Apr 23, 2016 at 9:37 AM, u...@moosheimer.com <u...@moosheimer.com> 
>> wrote:
>> Hi,
>> 
>> I don't know if this is the right mailing list to ask so please correct me 
>> if I'm wrong.
>> 
>> During development of my first couple of processors I wondered why there is 
>> no centreal log processor.
>> 
>> I'm thinking about a "central log processor" which gets all logging events 
>> from class ProcessorLog.
>> If any source uses one of the log methods (getLogger.info(), 
>> getLogger.error() ...) the message is transfered to the log processor if 
>> available - otherwise to the standard log.
>> 
>> Of course the log processor should only be placed once in the canvas.
>> 
>> The advantage of the log processor would be an easy to use way to handle all 
>> log messages at one place. The "central log processor" can be used as any 
>> other processor. By providing a success relation you are able send the log 
>> messages to kafka, cassandra, easticsearch, influx etc. or do anything else.
>> 
>> Every other used processor could have a standard property for logging with 
>> the options
>> - global setting
>> - Trace
>> - Debug
>> - Info
>> - Warn
>> - Error
>> The default would be "global setting" which means that the log level is 
>> defined by the central log processor.
>> If the cental log processor is set to error only getLogger.error() messages 
>> are processed.
>> 
>> There would be the possibility to change the settings to any other log level 
>> on global or only for selected processors. So you could have a global 
>> setting of "error" and some individual settings to "info".
>> 
>> Another central processor could be a "central metrics processor" which would 
>> be used for sending metrics in the same way. There could be a Metrics class 
>> which would be used by a call of Metrics.send(k,v) to send metrics to a 
>> central processor.
>> The Metrics class could automatically add more information like template 
>> name, prozessor name, task, timestamp etc. Again a success relation could be 
>> used to send metrics to Elasticsearch, graphitedb, influxdb etc. or do 
>> anything else.
>> 
>> So my questions are
>> - am I wrong with my ideas?
>> - are there any similar plans?
>> - are there still solutions I don't know?
>> 
>> As mentioned above I don't know if this is the right mailing list to post my 
>> ideas and I apologize if I'm wrong and wasting your time.
>> 
>> Best Regards,
>> Uwe