date:20160330

Re: How to add python modules ?

2016-03-30 Thread Madhukar Thota

i made small progess but seeing different execption not sure why i am
seeing nil value.

error:
22:58:35 EDT
ERROR
6f15a6f2-7744-404c-9961-f545d3f29042

ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to
process session due to
org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException:
org.apache.nifi.processor.exception.FlowFileHandlingException:
org.apache.nifi.processor.exception.FlowFileHandlingException: null is
not known in this session (StandardProcessSession[id=262867803]) in

Re: How to add python modules ?

2016-03-30 Thread Madhukar Thota

Hi Matt,

My Python/Jython skills are poor. Can you provide me an example plz?

-Madhu

On Wed, Mar 30, 2016 at 5:53 PM, Matt Burgess  wrote:

> Mahdu,
>
> Since you won't be able to return your dictionary, another approach would
> be to create the dictionary from the main script and pass it into the
> callback constructor. Then process() can update it, and you can use the
> populated dictionary after process() returns to set attributes and such.
>
> Regards,
> Matt
>
>
> On Mar 30, 2016, at 5:00 PM, Madhukar Thota 
> wrote:
>
> Matt,
>
> I tired the following code but i am getting the following error. Can you
> help me where i am doing wrong?
>
> Error:
>  16:56:10 EDT
> ERROR
> 6f15a6f2-7744-404c-9961-f545d3f29042
>
> ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to process 
> session due to org.apache.nifi.processor.exception.ProcessException: 
> javax.script.ScriptException: TypeError: None required for void return in 
>

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry

Oh gosh your right...lol...to much OS mixing on my fingertips.

Let me try the build of the LZ4  jar again.

Thanks Matt,

Thad
+ThadGuidry

Re: How to add python modules ?

2016-03-30 Thread Matt Burgess

You are returning self.d from process() which is a void method. Needs to return 
None.

Sent from my iPhone

> On Mar 30, 2016, at 5:00 PM, Madhukar Thota  wrote:
> 
> Matt,
> 
> I tired the following code but i am getting the following error. Can you help 
> me where i am doing wrong?
> 
> Error:
>  16:56:10 EDT
> ERROR6f15a6f2-7744-404c-9961-f545d3f29042
> ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to process 
> session due to org.apache.nifi.processor.exception.ProcessException: 
> javax.script.ScriptException: TypeError: None required for void return in 
>

Re: How to add python modules ?

2016-03-30 Thread Madhukar Thota

Matt,

I tired the following code but i am getting the following error. Can you
help me where i am doing wrong?

Error:
 16:56:10 EDT
ERROR
6f15a6f2-7744-404c-9961-f545d3f29042

ExecuteScript[id=6f15a6f2-7744-404c-9961-f545d3f29042] Failed to
process session due to
org.apache.nifi.processor.exception.ProcessException:
javax.script.ScriptException: TypeError: None required for void return
in

Re: String conversion to Int, float double

2016-03-30 Thread Madhukar Thota

Thanks Joe. I updated my config and not seeing the issue anymore.

On Wed, Mar 30, 2016 at 1:18 PM, Joe Witt  wrote:

> Hello
>
> From your screenshot it shows you have both success and failure
> looping back to Kafka.  Do not loop success and you should be fine.
>
> Thanks
> Joe
>
> On Wed, Mar 30, 2016 at 11:16 AM, Madhukar Thota
>  wrote:
> > I was able to construct the Json with right data type output from
> > ExecuteScript and sending to Kafka directly. The problem i am seeing is
> if i
> > send one record to kafka, kafka processor is writing the message again
> and
> > again and not ending the loop. How can i send exactly once message? Any
> > help.
> >
> > Here is what i am doing in my script:
> >
> > import simplejson as json
> > from org.apache.nifi.processor.io import StreamCallback
> > from user_agents import parse
> >
> >
> > def num(s):
> > try:
> > return int(s)
> > except ValueError:
> > try:
> > return float(s)
> > except ValueError:
> > raise ValueError('argument is not a string of number')
> >
> >
> > class PyStreamCallback(StreamCallback):
> > def __init__(self):
> > pass
> >
> > def process(self, inputStream, outputStream):
> > obj = {'browser':
> > str(parse(flowFile.getAttribute('useragent')).browser.family),
> >'browser_version':
> > str(parse(flowFile.getAttribute('useragent')).browser.version_string),
> >'os':
> > str(parse(flowFile.getAttribute('useragent')).os.family),
> >'os_version':
> > str(parse(flowFile.getAttribute('useragent')).os.version_string),
> >'client_ip': flowFile.getAttribute('clientip')}
> >   if flowFile.getAttribute('http.param.t_resp') and
> > flowFile.getAttribute('http.param.t_page') and
> > flowFile.getAttribute('http.param.t_done'):
> > obj['rt_firstbyte'] =
> > num(flowFile.getAttribute('http.param.t_resp'))
> > obj['rt_lastbyte'] =
> > num(flowFile.getAttribute('http.param.t_page'))
> > obj['rt_loadtime'] =
> > num(flowFile.getAttribute('http.param.t_done'))
> >  outputStream.write(bytearray(json.dumps(obj,
> > indent=4).encode('utf-8')))
> >
> >
> > flowFile = session.get()
> > if (flowFile != None):
> > flowFile = session.write(flowFile, PyStreamCallback())
> > session.transfer(flowFile, REL_SUCCESS)
> >
> >
> > Thanks
> >
> > On Tue, Mar 29, 2016 at 2:30 AM, Conrad Crampton
> >  wrote:
> >>
> >> Hi,
> >> Depending on the final destination of the data (json) you could use the
> >> JsonToAvro -> ConvertAvroSchema -> AvroToJson, with the
> ConvertAvroSchema
> >> doing the type conversion. I had to do this as I came across this
> behaviour
> >> previously. I use the Avro directly (after the conversion) as that was
> my
> >> final data format requirement, but I don’t see any reason if you want
> Json
> >> back that this wouldn’t work. I haven’t tried this by the way, but the
> type
> >> conversion certainly works for the final attributes in the Avro
> documents.
> >> Conrad
> >>
> >> From: Madhukar Thota 
> >> Reply-To: "users@nifi.apache.org" 
> >> Date: Friday, 25 March 2016 at 14:01
> >> To: "users@nifi.apache.org" 
> >> Subject: Re: String conversion to Int, float double
> >>
> >> Any Other ways to achieve this?
> >>
> >> On Thu, Mar 24, 2016 at 4:48 PM, Bryan Bende  wrote:
> >>>
> >>> I think the problem is that all attributes are actually Strings
> >>> internally, even after calling toNumber() that is only temporary while
> the
> >>> expression language is executing.
> >>>
> >>> So by the time it gets to AttributesToJson it doesn't have any
> >>> information about the type of each attribute and they all end up as
> Strings.
> >>> I think we would have to come up with a way to pass some type
> information
> >>> along to AttributesToJson in order to get something other than Strings.
> >>>
> >>> -Bryan
> >>>
> >>>
> >>> On Thu, Mar 24, 2016 at 3:30 PM, Madhukar Thota
> >>>  wrote:
> 
>  Hi i am trying to convert string value to integer in UpdateAtrributes
>  using toNumber like this
> 
> 
>  ${http.param.t_resp:toNumber()}  where http.param.t_resp = "132"
> 
>  but when the fileattribute pushed to Attributetojson processor , i am
>  stilling seeing it as string. Am i am doing something wrong? and also
> how
>  can i convert string to float?
> 
> 
> 
> 
> >>>
> >>
> >>
> >>
> >> ***This email originated outside SecureData***
> >>
> >> Click here to report this email as spam.
> >>
> >>
> >>
> >> SecureData, combating cyber threats
> >>
> >> 
> >>
> >> The information contained in this message or any of its attachments may
> be
> >> privileged and confidential and intended for the

Re: String conversion to Int, float double

2016-03-30 Thread Joe Witt

Hello

>From your screenshot it shows you have both success and failure
looping back to Kafka.  Do not loop success and you should be fine.

Thanks
Joe

On Wed, Mar 30, 2016 at 11:16 AM, Madhukar Thota
 wrote:
> I was able to construct the Json with right data type output from
> ExecuteScript and sending to Kafka directly. The problem i am seeing is if i
> send one record to kafka, kafka processor is writing the message again and
> again and not ending the loop. How can i send exactly once message? Any
> help.
>
> Here is what i am doing in my script:
>
> import simplejson as json
> from org.apache.nifi.processor.io import StreamCallback
> from user_agents import parse
>
>
> def num(s):
> try:
> return int(s)
> except ValueError:
> try:
> return float(s)
> except ValueError:
> raise ValueError('argument is not a string of number')
>
>
> class PyStreamCallback(StreamCallback):
> def __init__(self):
> pass
>
> def process(self, inputStream, outputStream):
> obj = {'browser':
> str(parse(flowFile.getAttribute('useragent')).browser.family),
>'browser_version':
> str(parse(flowFile.getAttribute('useragent')).browser.version_string),
>'os':
> str(parse(flowFile.getAttribute('useragent')).os.family),
>'os_version':
> str(parse(flowFile.getAttribute('useragent')).os.version_string),
>'client_ip': flowFile.getAttribute('clientip')}
>   if flowFile.getAttribute('http.param.t_resp') and
> flowFile.getAttribute('http.param.t_page') and
> flowFile.getAttribute('http.param.t_done'):
> obj['rt_firstbyte'] =
> num(flowFile.getAttribute('http.param.t_resp'))
> obj['rt_lastbyte'] =
> num(flowFile.getAttribute('http.param.t_page'))
> obj['rt_loadtime'] =
> num(flowFile.getAttribute('http.param.t_done'))
>  outputStream.write(bytearray(json.dumps(obj,
> indent=4).encode('utf-8')))
>
>
> flowFile = session.get()
> if (flowFile != None):
> flowFile = session.write(flowFile, PyStreamCallback())
> session.transfer(flowFile, REL_SUCCESS)
>
>
> Thanks
>
> On Tue, Mar 29, 2016 at 2:30 AM, Conrad Crampton
>  wrote:
>>
>> Hi,
>> Depending on the final destination of the data (json) you could use the
>> JsonToAvro -> ConvertAvroSchema -> AvroToJson, with the ConvertAvroSchema
>> doing the type conversion. I had to do this as I came across this behaviour
>> previously. I use the Avro directly (after the conversion) as that was my
>> final data format requirement, but I don’t see any reason if you want Json
>> back that this wouldn’t work. I haven’t tried this by the way, but the type
>> conversion certainly works for the final attributes in the Avro documents.
>> Conrad
>>
>> From: Madhukar Thota 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Friday, 25 March 2016 at 14:01
>> To: "users@nifi.apache.org" 
>> Subject: Re: String conversion to Int, float double
>>
>> Any Other ways to achieve this?
>>
>> On Thu, Mar 24, 2016 at 4:48 PM, Bryan Bende  wrote:
>>>
>>> I think the problem is that all attributes are actually Strings
>>> internally, even after calling toNumber() that is only temporary while the
>>> expression language is executing.
>>>
>>> So by the time it gets to AttributesToJson it doesn't have any
>>> information about the type of each attribute and they all end up as Strings.
>>> I think we would have to come up with a way to pass some type information
>>> along to AttributesToJson in order to get something other than Strings.
>>>
>>> -Bryan
>>>
>>>
>>> On Thu, Mar 24, 2016 at 3:30 PM, Madhukar Thota
>>>  wrote:

 Hi i am trying to convert string value to integer in UpdateAtrributes
 using toNumber like this


 ${http.param.t_resp:toNumber()}  where http.param.t_resp = "132"

 but when the fileattribute pushed to Attributetojson processor , i am
 stilling seeing it as string. Am i am doing something wrong? and also how
 can i convert string to float?




>>>
>>
>>
>>
>> ***This email originated outside SecureData***
>>
>> Click here to report this email as spam.
>>
>>
>>
>> SecureData, combating cyber threats
>>
>> 
>>
>> The information contained in this message or any of its attachments may be
>> privileged and confidential and intended for the exclusive use of the
>> intended recipient. If you are not the intended recipient any disclosure,
>> reproduction, distribution or other dissemination or use of this
>> communications is strictly prohibited. The views expressed in this email are
>> those of the individual and not necessarily of SecureData Europe Ltd. Any
>> prices quoted are only valid if followed up by a formal written quote.
>>
>> SecureData Europe Limited.

Re: String conversion to Int, float double

2016-03-30 Thread Madhukar Thota

I was able to construct the Json with right data type output from
ExecuteScript and sending to Kafka directly. The problem i am seeing is if
i send one record to kafka, kafka processor is writing the message again
and again and not ending the loop. How can i send exactly once message? Any
help.

Here is what i am doing in my script:

import simplejson as json
from org.apache.nifi.processor.io import StreamCallback
from user_agents import parse


def num(s):
try:
return int(s)
except ValueError:
try:
return float(s)
except ValueError:
raise ValueError('argument is not a string of number')


class PyStreamCallback(StreamCallback):
def __init__(self):
pass

def process(self, inputStream, outputStream):
obj = {'browser':
str(parse(flowFile.getAttribute('useragent')).browser.family),
   'browser_version':
str(parse(flowFile.getAttribute('useragent')).browser.version_string),
   'os':
str(parse(flowFile.getAttribute('useragent')).os.family),
   'os_version':
str(parse(flowFile.getAttribute('useragent')).os.version_string),
   'client_ip': flowFile.getAttribute('clientip')}
  if flowFile.getAttribute('http.param.t_resp') and
flowFile.getAttribute('http.param.t_page') and
flowFile.getAttribute('http.param.t_done'):
obj['rt_firstbyte'] =
num(flowFile.getAttribute('http.param.t_resp'))
obj['rt_lastbyte'] =
num(flowFile.getAttribute('http.param.t_page'))
obj['rt_loadtime'] =
num(flowFile.getAttribute('http.param.t_done'))
 outputStream.write(bytearray(json.dumps(obj,
indent=4).encode('utf-8')))


flowFile = session.get()
if (flowFile != None):
flowFile = session.write(flowFile, PyStreamCallback())
session.transfer(flowFile, REL_SUCCESS)


Thanks

On Tue, Mar 29, 2016 at 2:30 AM, Conrad Crampton <
conrad.cramp...@secdata.com> wrote:

> Hi,
> Depending on the final destination of the data (json) you could use the
> JsonToAvro -> ConvertAvroSchema -> AvroToJson, with the ConvertAvroSchema
> doing the type conversion. I had to do this as I came across this behaviour
> previously. I use the Avro directly (after the conversion) as that was my
> final data format requirement, but I don’t see any reason if you want Json
> back that this wouldn’t work. I haven’t tried this by the way, but the type
> conversion certainly works for the final attributes in the Avro documents.
> Conrad
>
> From: Madhukar Thota 
> Reply-To: "users@nifi.apache.org" 
> Date: Friday, 25 March 2016 at 14:01
> To: "users@nifi.apache.org" 
> Subject: Re: String conversion to Int, float double
>
> Any Other ways to achieve this?
>
> On Thu, Mar 24, 2016 at 4:48 PM, Bryan Bende  wrote:
>
>> I think the problem is that all attributes are actually Strings
>> internally, even after calling toNumber() that is only temporary while the
>> expression language is executing.
>>
>> So by the time it gets to AttributesToJson it doesn't have any
>> information about the type of each attribute and they all end up as
>> Strings. I think we would have to come up with a way to pass some type
>> information along to AttributesToJson in order to get something other than
>> Strings.
>>
>> -Bryan
>>
>>
>> On Thu, Mar 24, 2016 at 3:30 PM, Madhukar Thota > > wrote:
>>
>>> Hi i am trying to convert string value to integer in UpdateAtrributes
>>> using toNumber like this
>>>
>>>
>>> ${http.param.t_resp:toNumber()}  where http.param.t_resp = "132"
>>>
>>> but when the fileattribute pushed to Attributetojson processor , i am
>>> stilling seeing it as string. Am i am doing something wrong? and also how
>>> can i convert string to float?
>>>
>>>
>>>
>>>
>>>
>>
>
>
> ***This email originated outside SecureData***
>
> Click here  to
> report this email as spam.
>
>
> SecureData, combating cyber threats
>
> --
>
> The information contained in this message or any of its attachments may be
> privileged and confidential and intended for the exclusive use of the
> intended recipient. If you are not the intended recipient any disclosure,
> reproduction, distribution or other dissemination or use of this
> communications is strictly prohibited. The views expressed in this email
> are those of the individual and not necessarily of SecureData Europe Ltd.
> Any prices quoted are only valid if followed up by a formal written quote.
>
> SecureData Europe Limited. Registered in England & Wales 04365896.
> Registered Address: SecureData House, Hermitage Court, Hermitage Lane,
> Maidstone, Kent, ME16 9NT
>

Re: Having on processor block while another one is running

2016-03-30 Thread Thad Guidry

Also to note if your not plugged in to the Data Management industry...

Work Flow Orchestration is also sometimes called Process Management, where
there are specific tools and frameworks to deal with that scope on multiple
levels.
You may have heard of a specific Process Management called Business Process
Management (BPM) and there are other frameworks and tools that even help
with that. BPMN2 is a standard within that.  I myself use a framework
called Activiti http://activiti.org/

Thad
+ThadGuidry

Re: Having on processor block while another one is running

2016-03-30 Thread Oleg Zhurakousky

Vincent

Sorry for the late reply, but here it is

Based on what you have described it appears you have a mix of two problems: 
Work Flow Orchestration and Data Flow.
The main issue is that at the surface it’s not always easy to tell the 
difference, but I’ll try.

Work Flow Orchestration allows one to orchestrate a single process by breaking 
it down in a set of individual components (primarily for simplicity and 
modularization) and then composing such components into one cohesive process.
Data Flow manages individual processes, their lifecycle, execution, input and 
output from a central command/control facility.

So with the above in mind i say you have a mix problem where you have Data Flow 
consisting of simple and complex processors. And its those complex processors 
that need to invoke MR job and then act on its result is what falls into the 
category of Work Flow Orchestration where individual components within such 
process must work with awareness of the overall process they represent.  For 
example:

GetFile (NiFi)
PutHDFS (NiFi)
Process (NiFi Custom Processor) - where you execute the MR Job and react to its 
completion (success or failure) and possibly put something on the output queen 
in NiFi so the next element of Data Flow can kick in.
. . .

So, the 3 elements of Data Flow above are the individual NiFi Processors, yet 
the 3rd one internally represents a complex and orchestrated process. Now, the 
orchestration is just a term and without relying on outside frameworks that 
specifically address the orchestration it would be just a lot of custom code. 
Thankfully NiFi provides support for Spring Application Context container that 
allows you to implement your NiFi processor using work flow orchestration 
frameworks such as Spring Integration and/or Apache Camel.

I’d be more then willing to help you further with that if you’re interested, 
but wanted to see how you feel with the above architecture. I am also working 
on the blog to describe exactly that and how Data Flow and Work Flow can 
complement  one another.

Let me know
Cheers
Oleg

On Mar 29, 2016, at 9:26 AM, Oleg Zhurakousky 
> wrote:

Vincent

I do have a suggestion for you but need a bit more time to craft my response. 
Give me till tonight EST.

Cheers
Oleg
On Mar 29, 2016, at 8:55 AM, Vincent Russell 
> wrote:

Thanks Oleg and Joe,

I am not currently convinced that nifi is the solution as well, but it is a 
nice way for us to manage actions based on the result of a mapreduce job.

Our use cases is to have follow on processors that perform actions based on the 
results of the map reduce jobs.  One processor kicks off the M/R process and 
then the results are sent down the flow.

The problem with our current scenario is that we have two separate flows that 
utilize the same location as the output for the M/R locations.

One simple way might be to use mongo itself has a locking mechanism.

On Mon, Mar 28, 2016 at 7:07 PM, Oleg Zhurakousky 
> wrote:
Vincent

This sounds more like an architectural question and even outside of NiFi in 
order to achieve that especially in the distributed environment one would need 
some kind of coordination component. And while we can think of variety of way 
to accomplish that I am not entirely convinced that this is the right direction.
Would you mind sharing a bit more about your use case and perhaps we can 
jointly come up with a better and hopefully simpler solution?

Cheers
Oleg

On Mar 28, 2016, at 6:45 PM, Vincent Russell 
> wrote:

I have two processors (that aren't  part of the same flow) that write to the 
same resource (a mongo collection) via a map reduce job.

I don't want both to run at the same time.

On Mar 28, 2016 6:28 PM, "Joe Witt" 
> wrote:
Vincent,

Not really and that would largely be by design.  Can you describe the
use case more so we can suggest alternatives or perhaps understand the
motivation better?

Thanks
Joe

On Mon, Mar 28, 2016 at 4:00 PM, Vincent Russell
> wrote:
>
> Is it possible to have one processor block while another specified processor
> is running (within the onTrigger method).
>
> I can do this on a non-clustered nifi with a synchronized block I guess, but
> i wanted to know if there was a more idiomatic way of doing this.
>
> Thanks,
> Vincent

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Joe Witt

Chase,

It is a self-driven subscribe and unsubscribe process.  Please see
here https://nifi.apache.org/mailing_lists.html

Thanks
Joe

On Wed, Mar 30, 2016 at 9:18 AM, Chase Cunningham  wrote:
> take me off this list...
>
> unsubscribe
>
>
> On 3/30/16 10:13 AM, Joe Witt wrote:
>>
>> Did you set the LD_LIBRARY_PATH as Burgress mentioned at the end?
>>
>> I am not in a good position to dig in at the moment so my apologies
>> for the half-help here.  The loading of native libs as I recall is a
>> pretty specific process.  I know a few folks are familiar with it in
>> the community so let's keep the thread alive and encourage them to
>> engage :-)
>>
>> On Wed, Mar 30, 2016 at 9:07 AM, Thad Guidry  wrote:
>>>
>>> Ah, that's really helpful.
>>>
>>> Looks like I did have the native library in the built .jar file
>>>
>>> \lz4-1.3-SNAPSHOT\net\jpountz\util\win32\amd64\liblz4-java.so
>>>
>>> but placing that .so file in my C:\Program
>>> Files\Java\jdk1.8.0_74\jre\lib\amd64 folder results in the same missing
>>> lz4
>>> native NiFi errors
>>>
>>> Ideas ?
>>>
>>> Thad
>>> +ThadGuidry
>>>
>
> --
> Dr. Chase C Cunningham
> CTRC (SW) USN Ret.
> The Cynja LLC Proprietary Business and Technical Information
> CONFIDENTIAL TREATMENT REQUIRED
>

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Chase Cunningham


take me off this list...

unsubscribe

On 3/30/16 10:13 AM, Joe Witt wrote:

Did you set the LD_LIBRARY_PATH as Burgress mentioned at the end?

I am not in a good position to dig in at the moment so my apologies
for the half-help here.  The loading of native libs as I recall is a
pretty specific process.  I know a few folks are familiar with it in
the community so let's keep the thread alive and encourage them to
engage :-)

On Wed, Mar 30, 2016 at 9:07 AM, Thad Guidry  wrote:

Ah, that's really helpful.

Looks like I did have the native library in the built .jar file

\lz4-1.3-SNAPSHOT\net\jpountz\util\win32\amd64\liblz4-java.so

but placing that .so file in my C:\Program
Files\Java\jdk1.8.0_74\jre\lib\amd64 folder results in the same missing lz4
native NiFi errors

Ideas ?

Thad
+ThadGuidry



--
Dr. Chase C Cunningham
CTRC (SW) USN Ret.
The Cynja LLC Proprietary Business and Technical Information
CONFIDENTIAL TREATMENT REQUIRED

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Joe Witt

Did you set the LD_LIBRARY_PATH as Burgress mentioned at the end?

I am not in a good position to dig in at the moment so my apologies
for the half-help here.  The loading of native libs as I recall is a
pretty specific process.  I know a few folks are familiar with it in
the community so let's keep the thread alive and encourage them to
engage :-)

On Wed, Mar 30, 2016 at 9:07 AM, Thad Guidry  wrote:
> Ah, that's really helpful.
>
> Looks like I did have the native library in the built .jar file
>
> \lz4-1.3-SNAPSHOT\net\jpountz\util\win32\amd64\liblz4-java.so
>
> but placing that .so file in my C:\Program
> Files\Java\jdk1.8.0_74\jre\lib\amd64 folder results in the same missing lz4
> native NiFi errors
>
> Ideas ?
>
> Thad
> +ThadGuidry
>

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry

Ah, that's really helpful.

Looks like I did have the native library in the built .jar file

\lz4-1.3-SNAPSHOT\net\jpountz\util\win32\amd64\liblz4-java.so

but placing that .so file in my C:\Program
Files\Java\jdk1.8.0_74\jre\lib\amd64 folder results in the same missing lz4
native NiFi errors

Ideas ?

Thad
+ThadGuidry

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Joe Witt

Thad

This thread [1] seems related.  Take a look and see if that helps.
The basic gist as I understand it is we won't have access to that
native library unless it is pointed to somewhere or unless the Java
code that calls it knows how to set/find it for you.

[1] 
http://apache-nifi-developer-list.39713.n7.nabble.com/java-lang-UnsatisfiedLinkError-in-PutHDFS-with-snappy-compression-td7182.html

On Wed, Mar 30, 2016 at 8:42 AM, Thad Guidry  wrote:
> My badthere is... in the app log...
>
> 2016-03-30 09:39:27,709 INFO [Write-Ahead Local State Provider Maintenance]
> org.wali.MinimalLockingWriteAheadLog
> org.wali.MinimalLockingWriteAheadLog@7615666e checkpointed with 8 Records
> and 0 Swap Files in 69 milliseconds (Stop-the-world time = 6 milliseconds,
> Clear Edit Logs time = 4 millis), max Transaction ID 23
> 2016-03-30 09:39:31,979 INFO [pool-16-thread-1]
> o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile
> Repository
> 2016-03-30 09:39:32,380 INFO [pool-16-thread-1]
> org.wali.MinimalLockingWriteAheadLog
> org.wali.MinimalLockingWriteAheadLog@174f0d06 checkpointed with 3 Records
> and 0 Swap Files in 400 milliseconds (Stop-the-world time = 273
> milliseconds, Clear Edit Logs time = 74 millis), max Transaction ID 9785
> 2016-03-30 09:39:32,380 INFO [pool-16-thread-1]
> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
> Repository with 3 records in 400 milliseconds
> 2016-03-30 09:39:32,523 ERROR [Timer-Driven Process Thread-9]
> o.apache.nifi.processors.hadoop.PutHDFS
> PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS due
> to java.lang.RuntimeException: native lz4 library not available:
> java.lang.RuntimeException: native lz4 library not available
> 2016-03-30 09:39:32,525 ERROR [Timer-Driven Process Thread-9]
> o.apache.nifi.processors.hadoop.PutHDFS
> java.lang.RuntimeException: native lz4 library not available
> at
> org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
> ~[hadoop-common-2.6.2.jar:na]
> at
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> ~[hadoop-common-2.6.2.jar:na]
> at
> org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
> ~[hadoop-common-2.6.2.jar:na]
> at
> org.apache.hadoop.io.compress.Lz4Codec.createOutputStream(Lz4Codec.java:87)
> ~[hadoop-common-2.6.2.jar:na]
> at org.apache.nifi.processors.hadoop.PutHDFS$1.process(PutHDFS.java:279)
> ~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
> at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1807)
> ~[na:na]
> at
> org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1778)
> ~[na:na]
> at org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:270)
> ~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> [nifi-api-0.6.0.jar:0.6.0]
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1057)
> [nifi-framework-core-0.6.0.jar:0.6.0]
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
> [nifi-framework-core-0.6.0.jar:0.6.0]
> at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> [nifi-framework-core-0.6.0.jar:0.6.0]
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:123)
> [nifi-framework-core-0.6.0.jar:0.6.0]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_74]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_74]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_74]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_74]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_74]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_74]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
> 2016-03-30 09:39:34,273 ERROR [Timer-Driven Process Thread-5]
> o.apache.nifi.processors.hadoop.PutHDFS
> PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS due
> to java.lang.RuntimeException: native lz4 library not available:
> java.lang.RuntimeException: native lz4 library not available
> 2016-03-30 09:39:34,274 ERROR [Timer-Driven Process Thread-5]
> o.apache.nifi.processors.hadoop.PutHDFS
> java.lang.RuntimeException: native lz4 library not available
> at
> org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
>

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry

My badthere is... in the app log...

2016-03-30 09:39:27,709 INFO [Write-Ahead Local State Provider Maintenance]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@7615666e checkpointed with 8 Records
and 0 Swap Files in 69 milliseconds (Stop-the-world time = 6 milliseconds,
Clear Edit Logs time = 4 millis), max Transaction ID 23
2016-03-30 09:39:31,979 INFO [pool-16-thread-1]
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile
Repository
2016-03-30 09:39:32,380 INFO [pool-16-thread-1]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@174f0d06 checkpointed with 3 Records
and 0 Swap Files in 400 milliseconds (Stop-the-world time = 273
milliseconds, Clear Edit Logs time = 74 millis), max Transaction ID 9785
2016-03-30 09:39:32,380 INFO [pool-16-thread-1]
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
Repository with 3 records in 400 milliseconds
2016-03-30 09:39:32,523 ERROR [Timer-Driven Process Thread-9]
o.apache.nifi.processors.hadoop.PutHDFS
PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS
due to java.lang.RuntimeException: native lz4 library not available:
java.lang.RuntimeException: native lz4 library not available
2016-03-30 09:39:32,525 ERROR [Timer-Driven Process Thread-9]
o.apache.nifi.processors.hadoop.PutHDFS
java.lang.RuntimeException: native lz4 library not available
at
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.Lz4Codec.createOutputStream(Lz4Codec.java:87)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.nifi.processors.hadoop.PutHDFS$1.process(PutHDFS.java:279)
~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1807)
~[na:na]
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1778)
~[na:na]
at
org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:270)
~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
[nifi-api-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1057)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:123)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_74]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[na:1.8.0_74]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_74]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_74]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_74]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
2016-03-30 09:39:34,273 ERROR [Timer-Driven Process Thread-5]
o.apache.nifi.processors.hadoop.PutHDFS
PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS
due to java.lang.RuntimeException: native lz4 library not available:
java.lang.RuntimeException: native lz4 library not available
2016-03-30 09:39:34,274 ERROR [Timer-Driven Process Thread-5]
o.apache.nifi.processors.hadoop.PutHDFS
java.lang.RuntimeException: native lz4 library not available
at
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.Lz4Codec.createOutputStream(Lz4Codec.java:87)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.nifi.processors.hadoop.PutHDFS$1.process(PutHDFS.java:279)
~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1807)
~[na:na]
at

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry

Joe,

There is no more additional output, even when I set to DEBUG level.

09:36:32 CDT
ERROR
765efcb2-5ab0-4a72-a86f-71865dec264d

PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to
HDFS due to java.lang.RuntimeException: native lz4 library not
available: java.lang.RuntimeException: native lz4 library not
available

09:36:34 CDT
ERROR
765efcb2-5ab0-4a72-a86f-71865dec264d

PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to
HDFS due to java.lang.RuntimeException: native lz4 library not
available: java.lang.RuntimeException: native lz4 library not
available

09:36:35 CDT
ERROR
765efcb2-5ab0-4a72-a86f-71865dec264d

PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to
HDFS due to java.lang.RuntimeException: native lz4 library not
available: java.lang.RuntimeException: native lz4 library not
available


Thad
+ThadGuidry

Having trouble with PutHDFS as Avro, possibly due to codec

2016-03-30 Thread Dmitry Goldenberg

Hi,

I've got a dataflow which successfully writes Avro into a directory in
HDFS.  Avro-tools is able to read the Avro files so that seems fine.

Now I'm trying to create a table in Hive using the imported files and my
schema but the CREATE statement's query is just hanging in the Hive query
editor (same via the 'hive' command line).

When compared against another import, I can see that the Avro is different
as far as the compression codec; my NiFi dataflow in ConvertJSONToAvro got
the codec set as 'snappy'.

Here's the create statement I'm trying:

CREATE EXTERNAL TABLE activations
STORED AS AVRO
LOCATION '/demo/xml-etl/activations'
TBLPROPERTIES ('avro.schema.url'=
'hdfs:/demo/xml-etl/activations-schema/activations-1.avsc',
'avro.output.codec'='snappy');

I've also tried setting the codec as

set avro.output.codec=snappy;

(originally I had tried running the CREATE statement with no codec
specification).

Has anyone else encountered this issue?  Is there any way to set a
different codec when converting to Avro in NiFi?

I see that ConvertJSONToAvro simply sets the codec like so:

writer.setCodec(CodecFactory.snappyCodec());
On the Hive/Hadoop side, I'm running Cloudera 2.6.0-cdh5.4.3.

Thanks,
- Dmitry

Re: Developing dataflows in the canvas editor

2016-03-30 Thread Dmitry Goldenberg

Thanks, Chris and Joe!

The only other thing I'd add is that I kept looking for an Undo capability
with Ctrl-Z or some such but that doesn't seem to be supported.

- Dmitry

On Tue, Mar 29, 2016 at 10:58 PM, Joe Witt  wrote:

> Dmitry these are great questions and Chris that was in my opinion a
> pretty excellent response - 'noob' or not.
>
> The only thing I'd add Dmitry is that some of what you're saying
> regarding templates themselves is very true.  We can do better and so
> much more than we are.  We have a feature proposal/discussion framing
> here [1,2] and please by all means help us shape how this evolves.
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows
> [2] https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry
>
> Thanks
> Joe
>
> On Tue, Mar 29, 2016 at 1:59 PM, McDermott, Chris Kevin (MSDU -
> STaTS/StorefrontRemote)  wrote:
> > Dimitri,
> >
> > From one noob to another, welcome.
> >
> > All modifications to the canvas are automatically saved.  If you want to
> organize multiple flow instances look to process groups.  Drag a process
> group onto the canvas. Double click the process group to open it. Then drag
> a template onto the canvas.  Use the breadcrumbs to navigate back to the
> root process group (root of the canvas).  Create a second process group.
> Wash and repeat.  Process groups can be nested to your hearts content.
> Process groups themselves can be saved as templates.  You can also copy
> then paste in process groups.  And you can drag processors and process
> groups into other process groups, although I am not sure that you can do
> this with multi-select.  They are great for creating a high-level
> abstraction for a complex flow.
> >
> > I find its best to use the zoom controls.  For what its worth Google
> Maps uses the same paradigm for zooming.   I’m not sure these web-apps can
> really understand “gestures”, its just that the browser translates the
> gesture into scroll events which NiFi uses for zooming.
> >
> > Good luck,
> >
> > Chris
> >
> > Date: Tuesday, March 29, 2016 at 1:27 PM
> > To: "users@nifi.apache.org" <
> users@nifi.apache.org>
> > Subject: Developing dataflows in the canvas editor
> > From: Dmitry Goldenberg >
> > Reply-To: "users@nifi.apache.org" <
> users@nifi.apache.org>
> >
> > Hi,
> >
> > These may be just 'noob' impressions from someone who hasn't learned
> enough NiFi yet (I may be missing something obvious).
> > z
> > My first confusion is about dataflows vs. templates.  I've developed a
> couple of templates.  Now I want to drop a template into the canvas and
> treat that as a dataflow or an instance of a template.  But I don't see a
> way to save this instance into any persistent store, or any way to manage
> its lifecycle (edit, delete etc).  Is there something I'm missing or are
> there features in progress related to this?
> >
> > I mean, where does the dataflow go if I kill the browser? It seems to
> persist... but what happens when I want to create a slightly different
> rendition of the same flow?  Is there a namespaced persistence for
> dataflows with CRUD operations supported?  I keep looking for a File ->
> New, File -> Open, File -> Save type of metaphor.
> >
> > My second item is the mouse roller. Yes, the mouse roller which at least
> on the Mac causes the canvas to zoom in or zoom out.  Having used other
> products, the typical metaphor seems to be that rolling gets you a vertical
> scrollbar and you can scroll up and down, with a separate UI gesture that
> lets you zoom in/out.  I can't seem to get used to this zooming behavior.
> >
> > Thoughts?
> >
> > Thanks,
> > - Dmitry
>

Re: Sqoop Support in NIFI

2016-03-30 Thread Simon Ball

Are you planning to use something like Hive or Spark to query the data? Both 
will work fine with Avro formatted data under a table. I’m not sure what you 
mean by “Table Structure” or if you have a particular format in mind, but there 
is I believe talk of adding processors that will write direct to ORC format so 
convert the Avro data to ORC within NiFi.

Simon

On 30 Mar 2016, at 07:06, prabhu Mahendran 
> wrote:

For Below reasons i have choose Sqoop in NIFI Processor is the best method to 
move data in Table Structure.

If once move the Table from oracle or sql server into HDFS then whole moved 
data which must be in Table format not in avro or json..etc.

For Example:Table Data from Oracle which is in form of Table Structure and 
using Execute SQL to move those data into HDFS  which is in avro or 
json format.but i need that data in Table Structure.

And I have try QueryDatabaseTable Processor in nifi-0.6.0 It can return the 
Table record in avro format but i need those data in Table Structure.

So anyone please help me to solve this.

On Tue, Mar 29, 2016 at 3:02 PM, Simon Ball 
> wrote:
Another processor that may be of interest to you is the QueryDatabaseTable 
processor, which has just been released in 0.6.0. This provides incremental 
load capabilities similar to sqoop.

If you’re looking for the schema type functionality, bear in mind that the 
ExecuteSQL (and new Query processor) preserve schema with Avro.

Sqoop also allows import to HBase, which you can do with PutHBaseJson (use the 
ConvertAvroToJson processor to feed this).

Distributed partitoned queries isn’t in there yet, but I believe is on the way, 
so sqoop may have the edge for that use case today.

Granted, NiFi doesn’t have much by way of HCatalog integration at the moment, 
but most of the functionality you’ll find in Sqoop is in NiFi. Unless you are 
looking to move terabytes at a time, then NiFi should be able to handle most of 
what you would use sqoop for, so it would be very interesting to hear more 
detail on your use case, and why you needed sqoop on top of NiFi.

Simon

On 29 Mar 2016, at 09:06, prabhu Mahendran 
> wrote:

Hi,

Yes, In my case i have created the Custom processor with Sqoop API which 
accommodates complete functionality of sqoop.
As per you concern we have able to move the data only from HDFS to SQl or Vice 
versa, But sqoop having more functionality which we can achieve it by 
Sqoop.RunTool() in org.apache.sqoop.sqoop. The Sqoop Java client will works 
well and Implement that API into new Sqoop NIFI processor Doesn't work!

On Tue, Mar 29, 2016 at 12:49 PM, Conrad Crampton 
> wrote:
Hi,
If you could explain exactly what you are trying to achieve I.e. What part of 
the data pipeline you are looking to use NiFi for and where you wish to retain 
Sqoop I could perhaps have a more informed input (although I have only been 
using NiFi myself for a few weeks). Sqoop obviously can move the data from RDBM 
systems through to HDFS (and vice versa) as can NiFi, not sure why you would 
want the mix (or at least I can’t see it from the description you have provided 
thus far).
I have limited knowledge of Sqoop, but either way, I am sure you could ‘drive’ 
Sqoop from a custom NiFi processor if you so choose, and you can ‘drive’ NiFi 
externally (using the REST api) - if Sqoop can consume it.
Regards
Conrad

From: prabhu Mahendran >
Reply-To: "users@nifi.apache.org" 
>
Date: Tuesday, 29 March 2016 at 07:55
To: "users@nifi.apache.org" 
>
Subject: Re: Sqoop Support in NIFI

Hi Conrad,

Thanks for Quick Response.

Yeah.Combination of Execute SQL and Put HDFS works well instead of Sqoop.But is 
there any possible to use Sqoop(client) to do like this?

Prabhu Mahendran

On Tue, Mar 29, 2016 at 12:04 PM, Conrad Crampton 
> wrote:
Hi,
Why use sqoop at all? Use a combination of ExecuteSQL [1] and PutHDFS [2].
I have just replace the use of Flume using a combination of ListenSyslog and 
PutHDFS which I guess is a similar architectural pattern.
HTH
Conrad

http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/index.html
 [1]
http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/index.html
 [2]

From: prabhu Mahendran >
Reply-To: "users@nifi.apache.org" 
>
Date: Tuesday, 29 March

Re: Sqoop Support in NIFI

2016-03-30 Thread prabhu Mahendran

For Below reasons i have choose Sqoop in NIFI Processor is the best method
to move data in Table Structure.

If once move the Table from oracle or sql server into HDFS then whole
moved data which must be in Table format not in avro or
json..etc.

For Example:Table Data from Oracle which is in form of Table Structure
and using Execute SQL to move those data into HDFS  which is in
avro or json format.but i need that data in Table Structure.

And I have try QueryDatabaseTable Processor in nifi-0.6.0 It can return the
Table record in avro format but i need those data in Table Structure.

So anyone please help me to solve this.





On Tue, Mar 29, 2016 at 3:02 PM, Simon Ball  wrote:

> Another processor that may be of interest to you is the QueryDatabaseTable
> processor, which has just been released in 0.6.0. This provides incremental
> load capabilities similar to sqoop.
>
> If you’re looking for the schema type functionality, bear in mind that the
> ExecuteSQL (and new Query processor) preserve schema with Avro.
>
> Sqoop also allows import to HBase, which you can do with PutHBaseJson (use
> the ConvertAvroToJson processor to feed this).
>
> Distributed partitoned queries isn’t in there yet, but I believe is on the
> way, so sqoop may have the edge for that use case today.
>
> Granted, NiFi doesn’t have much by way of HCatalog integration at the
> moment, but most of the functionality you’ll find in Sqoop is in NiFi.
> Unless you are looking to move terabytes at a time, then NiFi should be
> able to handle most of what you would use sqoop for, so it would be very
> interesting to hear more detail on your use case, and why you needed sqoop
> on top of NiFi.
>
> Simon
>
>
> On 29 Mar 2016, at 09:06, prabhu Mahendran 
> wrote:
>
> Hi,
>
> Yes, In my case i have created the Custom processor with Sqoop API which
> accommodates complete functionality of sqoop.
> As per you concern we have able to move the data only from HDFS to SQl or
> Vice versa, But sqoop having more functionality which we can achieve it by
> Sqoop.RunTool() in org.apache.sqoop.sqoop. The Sqoop Java client will works
> well and Implement that API into new Sqoop NIFI processor Doesn't work!
>
> On Tue, Mar 29, 2016 at 12:49 PM, Conrad Crampton <
> conrad.cramp...@secdata.com> wrote:
>
>> Hi,
>> If you could explain exactly what you are trying to achieve I.e. What
>> part of the data pipeline you are looking to use NiFi for and where you
>> wish to retain Sqoop I could perhaps have a more informed input (although I
>> have only been using NiFi myself for a few weeks). Sqoop obviously can move
>> the data from RDBM systems through to HDFS (and vice versa) as can NiFi,
>> not sure why you would want the mix (or at least I can’t see it from the
>> description you have provided thus far).
>> I have limited knowledge of Sqoop, but either way, I am sure you could
>> ‘drive’ Sqoop from a custom NiFi processor if you so choose, and you can
>> ‘drive’ NiFi externally (using the REST api) - if Sqoop can consume it.
>> Regards
>> Conrad
>>
>>
>> From: prabhu Mahendran 
>> Reply-To: "users@nifi.apache.org" 
>> Date: Tuesday, 29 March 2016 at 07:55
>> To: "users@nifi.apache.org" 
>> Subject: Re: Sqoop Support in NIFI
>>
>> Hi Conrad,
>>
>> Thanks for Quick Response.
>>
>> Yeah.Combination of Execute SQL and Put HDFS works well instead
>> of Sqoop.But is there any possible to use Sqoop(client) to do like this?
>>
>> Prabhu Mahendran
>>
>> On Tue, Mar 29, 2016 at 12:04 PM, Conrad Crampton <
>> conrad.cramp...@secdata.com> wrote:
>>
>>> Hi,
>>> Why use sqoop at all? Use a combination of ExecuteSQL [1] and PutHDFS
>>> [2].
>>> I have just replace the use of Flume using a combination of ListenSyslog
>>> and PutHDFS which I guess is a similar architectural pattern.
>>> HTH
>>> Conrad
>>>
>>>
>>>
>>> http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ExecuteSQL/index.html
>>>  [1]
>>>
>>> http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.hadoop.PutHDFS/index.html
>>>  [2]
>>>
>>> From: prabhu Mahendran 
>>> Reply-To: "users@nifi.apache.org" 
>>> Date: Tuesday, 29 March 2016 at 07:27
>>> To: "users@nifi.apache.org" 
>>> Subject: Sqoop Support in NIFI
>>>
>>> Hi,
>>>
>>> I am new to nifi.
>>>
>>>I have to know that  "Is there is any Support for Sqoop with help
>>> of NIFI Processors?."
>>>
>>> And in which way to done the following case with help of Sqoop.
>>>
>>> Move data from oracle,SqlServer,MySql into HDFS and vice versa.
>>>
>>>
>>> Thanks,
>>> Prabhu Mahendran
>>>
>>>
>>>
>>>
>>> ***This email originated outside SecureData***
>>>
>>> Click here  to
>>> report this email as spam.
>>>
>>>
>>> SecureData, combating cyber

Re: How to add python modules ?

Re: How to add python modules ?

Re: PutHDFS and LZ4 compression ERROR

Re: How to add python modules ?

Re: How to add python modules ?

Re: String conversion to Int, float double

Re: String conversion to Int, float double

Re: String conversion to Int, float double

Re: Having on processor block while another one is running

Re: Having on processor block while another one is running

Re: PutHDFS and LZ4 compression ERROR

Re: PutHDFS and LZ4 compression ERROR

Re: PutHDFS and LZ4 compression ERROR

Re: PutHDFS and LZ4 compression ERROR

Re: PutHDFS and LZ4 compression ERROR

Re: PutHDFS and LZ4 compression ERROR

Re: PutHDFS and LZ4 compression ERROR

Having trouble with PutHDFS as Avro, possibly due to codec

Re: Developing dataflows in the canvas editor

Re: Sqoop Support in NIFI

Re: Sqoop Support in NIFI

21 matches

Site Navigation

Mail list logo

Footer information