Re: NIFI Usage for Data Transformation

2018-10-31 Thread Mark Rachelski
Ameer,

Allow me to provide an opinion on this as a user, not as one of the awesome
guys in this group that has built this very cool tool. They are likely to
be very enthusiastic.

In my experience, NiFi is not going to easily replace a custom tool that
has done a number of complex transforms. For example, we still do also use
PentahoDI for some level of elaborate ETL flows.  NiFi does have some basic
capabilities for doing text manipulation including:

   - Jolt Transformer - focused on JSON transforms
   - Regex processing - Straight-up text search/replace
   - Some level of splitting and joining files
   - I am sure someone will point out that you can embed your own custom
   transformer by writing a  NiFi processor (both native or scripted) or
   invoking your own scripts from a NiFi processor. And all of this is
   possible. Although this does require the security of the NiFi server itself
   to be relaxed a bit so that the development team building these scripts to
   deploy them to the server running NiFi. (especially if the scripts are
   external to the flow)
   - Furthermore, NiFi can be used to trigger some other tools that are
   doing heavy transforms such as Spark jobs or other Hadoop-based transforms
   if you have to manipulate large data sets.

What you will likely find is if your transforms require some level of data
joins inside the ETL pipeline that are currently being done by your custom
scripts, NiFi will not be able to help with that without resorting to
driving that through a data engine of some type (be it SQL engine, Hadoop,
etc...)

But, admittedly by guessing, I suspect the thing that your Enterprise
groups are going for is the very natural support for non-functional
requirements that NiFi or any framework worth anything will provide. That
is things like:

   - Monitoring
   - Consistent reporting
   - Provenance of data transforms and traceability
   - Packing a large number of flows into a few machines

With this, they can possibly host NiFi as a service and your teams simply
contribute flows.

At my company, we do not build our data ingest world exclusively around
NiFi, but we do use it pretty widely to get disparate data sets into our
data platform and it helps a lot. It means that the developers can focus on
writing good ingest flows instead of continually ensuring that every custom
script feeds into the non-functional requirements of your own data
platform. But admittedly, any good framework will also help you to address
those requirements with little repeat.

We also still keep some custom ETL jobs that are just not worth the effort
of porting to NiFi. We don't see those jobs as being technical debt. They
work and are not causing us any issues. But they had to be independently
developed to meet those non-functional requirements.

The best thing I can suggest is that you try it and see. As a long time
imperative programmer, I did find it a bit difficult to get my head wrapped
around the dynamic nature of building flows. But once you invest the effort
to learn, it becomes a pretty cool tool in your toolbox. Eventually,
writing flows becomes faster than writing imperative scripts and the
testing cycle is significantly shorter.

Regards,
Mark.

On Thu, Nov 1, 2018 at 4:04 AM Ameer Mawia  wrote:

>
> We have a use case where we take data from a source(text data in csv
> format), do transformation and manipulation of textual record, and output
> the data in another (csv)format. This is being done by a Java based custom
> framework, written specifically for this *transformation* piece.
>
> Recently as Apache NIFI is being adopted at enterprise level by the
> organisation, we have been asked to try *Apache NIFI* and see if can use
> that as a replacement to this custom tool?
>
> *My question is*:
>
>- How much leverage does *Apache NIFI *provides on the flowfile *content
>*manipulation?
>
> I understand *NIFI *is good for creating data flow pipeline, but is it
> good for *extensive TEXT Transformation* as well?   So far I have not
> found obvious way to achieve that.
>
> Appreciate the feedback.
>
> Thanks,
>
> --
> http://ca.linkedin.com/in/ameermawia
> Toronto, ON
>
>
>
> --
> http://ca.linkedin.com/in/ameermawia
> Toronto, ON
>
>


Fwd: NIFI Usage for Data Transformation

2018-10-31 Thread Ameer Mawia
We have a use case where we take data from a source(text data in csv
format), do transformation and manipulation of textual record, and output
the data in another (csv)format. This is being done by a Java based custom
framework, written specifically for this *transformation* piece.

Recently as Apache NIFI is being adopted at enterprise level by the
organisation, we have been asked to try *Apache NIFI* and see if can use
that as a replacement to this custom tool?

*My question is*:

   - How much leverage does *Apache NIFI *provides on the flowfile *content
   *manipulation?

I understand *NIFI *is good for creating data flow pipeline, but is it good
for *extensive TEXT Transformation* as well?   So far I have not found
obvious way to achieve that.

Appreciate the feedback.

Thanks,

-- 
http://ca.linkedin.com/in/ameermawia
Toronto, ON



-- 
http://ca.linkedin.com/in/ameermawia
Toronto, ON


Re: Expression Language

2018-10-31 Thread Bryan Bende
You haven't said which processor/service you are using, but you may want to
check the docs for that component to see whether it supports expression
language from flow file attributes, some only support variable registry.

Most of the elastic search processors look like the host only supports
variable registry [1].

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-elasticsearch-nar/1.8.0/org.apache.nifi.processors.elasticsearch.PutElasticsearch/index.html


On Wed, Oct 31, 2018 at 4:44 PM Boris Tyukin  wrote:

> looks right to me...Did you check flowFile attributes in provenance events
> to make sure your attributes are populated? also check exact spelling and
> casing. If still does not work, show us some screenshots of your flow and
> properties
> Boris
>
> On Wed, Oct 31, 2018 at 4:04 PM Jones, Patrick L.  wrote:
>
>> Howdy,
>>
>>
>>
>>  I’m having trouble with the expression language.  I have 2
>> attributes ElasticIP and ElasticPort.  I’m trying to use them in an Elastic
>> URL.  If I hard code the values in it works find.  If I use the attributes
>> I get a null pointer.  I have tried various variations on:
>>
>> http://${ElasticIP}:${ElasticPort}
>>
>>
>>
>> where ElasticIP = 10.0.0.0 and ElasitPort is 9200
>>
>>
>>
>> Any thoughts on what the expression should be?
>>
>>
>>
>> Thank you
>>
>


Re: Expression Language

2018-10-31 Thread Boris Tyukin
looks right to me...Did you check flowFile attributes in provenance events
to make sure your attributes are populated? also check exact spelling and
casing. If still does not work, show us some screenshots of your flow and
properties
Boris

On Wed, Oct 31, 2018 at 4:04 PM Jones, Patrick L.  wrote:

> Howdy,
>
>
>
>  I’m having trouble with the expression language.  I have 2
> attributes ElasticIP and ElasticPort.  I’m trying to use them in an Elastic
> URL.  If I hard code the values in it works find.  If I use the attributes
> I get a null pointer.  I have tried various variations on:
>
> http://${ElasticIP}:${ElasticPort}
>
>
>
> where ElasticIP = 10.0.0.0 and ElasitPort is 9200
>
>
>
> Any thoughts on what the expression should be?
>
>
>
> Thank you
>


Expression Language

2018-10-31 Thread Jones, Patrick L.

Howdy,

 I'm having trouble with the expression language.  I have 2 attributes 
ElasticIP and ElasticPort.  I'm trying to use them in an Elastic URL.  If I 
hard code the values in it works find.  If I use the attributes I get a null 
pointer.  I have tried various variations on:
http://${ElasticIP}:${ElasticPort}

where ElasticIP = 10.0.0.0 and ElasitPort is 9200

Any thoughts on what the expression should be?

Thank you


Re: GenerateTableFetch Segment Identifier

2018-10-31 Thread Matt Burgess
I’m not at my computer at the moment but I wrote 2 Jiras and put up 2 PRs to 
add these features, hopefully they will make the next release. I can get the 
links later today.

> On Oct 31, 2018, at 10:27 AM, Shawn Weeks  wrote:
> 
> Currently GenerateTableFetch doesn't set any attributes to identify segments 
> and unique executions like UnpackContent does. I was wondering if anyone else 
> thought that might be useful or had another way to figure out this 
> information. I need to be able to track that all segments have succeeded for 
> a specific call to GenerateTableFetch have completed.
> 
> Thanks
> Shawn Weeks


Re: PutHiveStreaming TimelineClientImpl Exception

2018-10-31 Thread Shawn Weeks
You have to either create a hive-site.xml just for NiFi without the hook or 
your yarn-site.xml needs to be in the class path. Another parameter that you 
might have to set to make Hive streaming less chatty is 
hcatalog.hive.client.cache.disabled=true, it was recomened by our vendor to get 
rid of some other error messages.


Thanks

Shawn Weeks


From: Noe Detore 
Sent: Wednesday, October 31, 2018 7:16:15 AM
To: users@nifi.apache.org
Subject: PutHiveStreaming TimelineClientImpl Exception

Hello,

Using NIFI 1.5 PutHiveStreaming processor I am seeing a lot of logs

INFO [ATS Logger 0] o.a.h.y.c.api.impl.TimelineClientImpl Exception caught by 
TimelineClientConnectionRetry, will try 1 more time(s).
Message: java.net.ConnectException: Connection refused
2018-10-31 07:44:51,612 WARN [ATS Logger 0] 
org.apache.hadoop.hive.ql.hooks.ATSHook Failed to create ATS domain 
hive_6407e1d8-2d67-44af-bd0a-04288d6c587b
java.lang.RuntimeException: Failed to connect to timeline server. Connection 
retries limit exceeded. The posted timeline event may be missing...

Data is getting into Hive, but this log is chatty. Any suggestions on how to 
satisfy or remove this ATS requirement?

Thank you




GenerateTableFetch Segment Identifier

2018-10-31 Thread Shawn Weeks
Currently GenerateTableFetch doesn't set any attributes to identify segments 
and unique executions like UnpackContent does. I was wondering if anyone else 
thought that might be useful or had another way to figure out this information. 
I need to be able to track that all segments have succeeded for a specific call 
to GenerateTableFetch have completed.


Thanks

Shawn Weeks


PutHiveStreaming TimelineClientImpl Exception

2018-10-31 Thread Noe Detore
Hello,

Using NIFI 1.5 PutHiveStreaming processor I am seeing a lot of logs

INFO [ATS Logger 0] o.a.h.y.c.api.impl.TimelineClientImpl Exception caught
by TimelineClientConnectionRetry, will try 1 more time(s).
Message: java.net.ConnectException: Connection refused
2018-10-31 07:44:51,612 WARN [ATS Logger 0]
org.apache.hadoop.hive.ql.hooks.ATSHook Failed to create ATS domain
hive_6407e1d8-2d67-44af-bd0a-04288d6c587b
java.lang.RuntimeException: Failed to connect to timeline server.
Connection retries limit exceeded. The posted timeline event may be
missing...

Data is getting into Hive, but this log is chatty. Any suggestions on how
to satisfy or remove this ATS requirement?

Thank you