Re: data flow from one s3 bucket to another

2016-11-18 Thread Gop Krr
Thanks Koji. I was able to solve this problem but what you have attached is
very useful.

On Wed, Nov 16, 2016 at 6:43 PM, Koji Kawamura 
wrote:

> Hello Gop,
>
> Have you already found how to move data around S3 buckets? I hope you do.
> But just in case if you haven't yet, I wrote a simple NiFi flow and
> shared it in Gist:
> https://gist.github.com/ijokarumawak/26ff675039e252d177b1195f3576cf9a
>
> I misconfigured region and got an error once, but after I setup bucket
> name, region and credential correctly, it worked as expected.
> I'd recommend to test a S3 related flow using ListS3 processor, to see
> if the credential can properly access the target S3 bucket.
>
> Thanks,
> Koji
>
> On Fri, Oct 28, 2016 at 7:35 AM, Gop Krr  wrote:
> > Has anyone implemented data copy from one s3 bucket to another. i would
> > greatly appreciate if you can share with me your sample processors
> > configuration.
> > Thanks
> > Rai
>


Re: nifi is running out of memory

2016-10-31 Thread Gop Krr
Thanks Joe for checking. Yes, I got past it and I was successfully able to
demo it to the team :) Now, the next challenge is to drive the performance
out of nifi for the high throughput.

On Mon, Oct 31, 2016 at 7:08 PM, Joe Witt  wrote:

> Krish,
>
> Did you ever get past this?
>
> Thanks
> Joe
>
> On Fri, Oct 28, 2016 at 2:36 PM, Gop Krr  wrote:
> > James, permission issue got resolved. I still don't see any write.
> >
> > On Fri, Oct 28, 2016 at 10:34 AM, Gop Krr  wrote:
> >>
> >> Thanks James.. I am looking into permission issue and update the
> thread. I
> >> will also make the changes as you per your recommendation.
> >>
> >> On Fri, Oct 28, 2016 at 10:23 AM, James Wing  wrote:
> >>>
> >>> From the screenshot and the error message, I interpret the sequence of
> >>> events to be something like this:
> >>>
> >>> 1.) ListS3 succeeds and generates flowfiles with attributes referencing
> >>> S3 objects, but no content (0 bytes)
> >>> 2.) FetchS3Object fails to pull the S3 object content with an Access
> >>> Denied error, but the failed flowfiles are routed on to PutS3Object
> (35,179
> >>> files / 0 bytes in the "putconnector" queue)
> >>> 3.) PutS3Object is succeeding, writing the 0 byte content from ListS3
> >>>
> >>> I recommend a couple thing for FetchS3Object:
> >>>
> >>> * Only allow the "success" relationship to continue to PutS3Object.
> >>> Separate the "failure" relationship to either loop back to
> FetchS3Object or
> >>> go to a LogAttibute processor, or other handling path.
> >>> * It looks like the permissions aren't working, you might want to
> >>> double-check the access keys or try a sample file with the AWS CLI.
> >>>
> >>> Thanks,
> >>>
> >>> James
> >>>
> >>>
> >>> On Fri, Oct 28, 2016 at 10:01 AM, Gop Krr  wrote:
> >>>>
> >>>> This is how my nifi flow looks like.
> >>>>
> >>>> On Fri, Oct 28, 2016 at 9:57 AM, Gop Krr  wrote:
> >>>>>
> >>>>> Thanks Bryan, Joe, Adam and Pierre. I went past this issue by
> switching
> >>>>> to 0.71.  Now it is able to list the files from buckets and create
> those
> >>>>> files in the another bucket. But write is not happening and I am
> getting the
> >>>>> permission issue ( I have attached below for the reference) Could
> this be
> >>>>> the setting of the buckets or it has more to do with the access key.
> All the
> >>>>> files which are creaetd in the new bucket are of 0 byte.
> >>>>> Thanks
> >>>>> Rai
> >>>>>
> >>>>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3]
> >>>>> o.a.nifi.processors.aws.s3.FetchS3Object FetchS3Object[id=x]
> Failed to
> >>>>> retrieve S3 Object for
> >>>>> StandardFlowFileRecord[uuid=y,claim=,offset=0,name=
> x.gz,size=0];
> >>>>> routing to failure: com.amazonaws.services.s3.
> model.AmazonS3Exception:
> >>>>> Access Denied (Service: Amazon S3; Status Code: 403; Error Code:
> >>>>> AccessDenied; Request ID: xxx), S3 Extended Request ID:
> >>>>> lu8tAqRxu+ouinnVvJleHkUUyK6J6rIQCTw0G8G6
> DB6NOPGec0D1KB6cfUPsj08IQXI8idtiTp4=
> >>>>>
> >>>>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3]
> >>>>> o.a.nifi.processors.aws.s3.FetchS3Object
> >>>>>
> >>>>> com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied
> >>>>> (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;
> Request ID:
> >>>>> 0F34E71C0697B1D8)
> >>>>>
> >>>>> at
> >>>>> com.amazonaws.http.AmazonHttpClient.handleErrorResponse(
> AmazonHttpClient.java:1219)
> >>>>> ~[aws-java-sdk-core-1.10.32.jar:na]
> >>>>>
> >>>>> at
> >>>>> com.amazonaws.http.AmazonHttpClient.executeOneRequest(
> AmazonHttpClient.java:803)
> >>>>> ~[aws-java-sdk-core-1.10.32.jar:na]
> >>>>>
> >>>>> at
> >>>>> com.amazonaws.http.AmazonHttpClient.executeHelper(
> AmazonHttpClient.java:505)
> >>>>> ~[aws-java

Re: nifi is running out of memory

2016-10-28 Thread Gop Krr
Thanks James.. I am looking into permission issue and update the thread. I
will also make the changes as you per your recommendation.

On Fri, Oct 28, 2016 at 10:23 AM, James Wing  wrote:

> From the screenshot and the error message, I interpret the sequence of
> events to be something like this:
>
> 1.) ListS3 succeeds and generates flowfiles with attributes referencing S3
> objects, but no content (0 bytes)
> 2.) FetchS3Object fails to pull the S3 object content with an Access
> Denied error, but the failed flowfiles are routed on to PutS3Object (35,179
> files / 0 bytes in the "putconnector" queue)
> 3.) PutS3Object is succeeding, writing the 0 byte content from ListS3
>
> I recommend a couple thing for FetchS3Object:
>
> * Only allow the "success" relationship to continue to PutS3Object.
> Separate the "failure" relationship to either loop back to FetchS3Object or
> go to a LogAttibute processor, or other handling path.
> * It looks like the permissions aren't working, you might want to
> double-check the access keys or try a sample file with the AWS CLI.
>
> Thanks,
>
> James
>
>
> On Fri, Oct 28, 2016 at 10:01 AM, Gop Krr  wrote:
>
>> This is how my nifi flow looks like.
>>
>> On Fri, Oct 28, 2016 at 9:57 AM, Gop Krr  wrote:
>>
>>> Thanks Bryan, Joe, Adam and Pierre. I went past this issue by switching
>>> to 0.71.  Now it is able to list the files from buckets and create those
>>> files in the another bucket. But write is not happening and I am getting
>>> the permission issue ( I have attached below for the reference) Could this
>>> be the setting of the buckets or it has more to do with the access key. All
>>> the files which are creaetd in the new bucket are of 0 byte.
>>> Thanks
>>> Rai
>>>
>>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3]
>>> o.a.nifi.processors.aws.s3.FetchS3Object FetchS3Object[id=x] Failed
>>> to retrieve S3 Object for StandardFlowFileRecord[uuid=yy
>>> yyy,claim=,offset=0,name=x.gz,size=0]; routing to failure:
>>> com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied
>>> (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;
>>> Request ID: xxx), S3 Extended Request ID: lu8tAqRxu+ouinnVvJleHkUUyK6J6r
>>> IQCTw0G8G6DB6NOPGec0D1KB6cfUPsj08IQXI8idtiTp4=
>>>
>>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3]
>>> o.a.nifi.processors.aws.s3.FetchS3Object
>>>
>>> com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied
>>> (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied;
>>> Request ID: 0F34E71C0697B1D8)
>>>
>>> at 
>>> com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1219)
>>> ~[aws-java-sdk-core-1.10.32.jar:na]
>>>
>>> at 
>>> com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:803)
>>> ~[aws-java-sdk-core-1.10.32.jar:na]
>>>
>>> at 
>>> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:505)
>>> ~[aws-java-sdk-core-1.10.32.jar:na]
>>>
>>> at 
>>> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:317)
>>> ~[aws-java-sdk-core-1.10.32.jar:na]
>>>
>>> at 
>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3595)
>>> ~[aws-java-sdk-s3-1.10.32.jar:na]
>>>
>>> at 
>>> com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1116)
>>> ~[aws-java-sdk-s3-1.10.32.jar:na]
>>>
>>> at 
>>> org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:106)
>>> ~[nifi-aws-processors-0.7.1.jar:0.7.1]
>>>
>>> at 
>>> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>>> [nifi-api-0.7.1.jar:0.7.1]
>>>
>>> at org.apache.nifi.controller.StandardProcessorNode.onTrigger(S
>>> tandardProcessorNode.java:1054) [nifi-framework-core-0.7.1.jar:0.7.1]
>>>
>>> at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask
>>> .call(ContinuallyRunProcessorTask.java:136)
>>> [nifi-framework-core-0.7.1.jar:0.7.1]
>>>
>>> at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask
>>> .call(ContinuallyRunProcessorTask.java:47)
>>> [nifi-framework-core-0.7.1.jar:0.7.1]
>>>
>>> at org.apache.nifi.controller.schedulin

Re: nifi is running out of memory

2016-10-28 Thread Gop Krr
Thanks Bryan, Joe, Adam and Pierre. I went past this issue by switching to
0.71.  Now it is able to list the files from buckets and create those files
in the another bucket. But write is not happening and I am getting the
permission issue ( I have attached below for the reference) Could this be
the setting of the buckets or it has more to do with the access key. All
the files which are creaetd in the new bucket are of 0 byte.
Thanks
Rai

2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3]
o.a.nifi.processors.aws.s3.FetchS3Object FetchS3Object[id=x] Failed to
retrieve S3 Object for
StandardFlowFileRecord[uuid=y,claim=,offset=0,name=x.gz,size=0];
routing to failure: com.amazonaws.services.s3.model.AmazonS3Exception:
Access Denied (Service: Amazon S3; Status Code: 403; Error Code:
AccessDenied; Request ID: xxx), S3 Extended Request ID:
lu8tAqRxu+ouinnVvJleHkUUyK6J6rIQCTw0G8G6DB6NOPGec0D1KB6cfUPsj08IQXI8idtiTp4=

2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3]
o.a.nifi.processors.aws.s3.FetchS3Object

com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service:
Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID:
0F34E71C0697B1D8)

at 
com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1219)
~[aws-java-sdk-core-1.10.32.jar:na]

at
com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:803)
~[aws-java-sdk-core-1.10.32.jar:na]

at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:505)
~[aws-java-sdk-core-1.10.32.jar:na]

at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:317)
~[aws-java-sdk-core-1.10.32.jar:na]

at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3595)
~[aws-java-sdk-s3-1.10.32.jar:na]

at
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1116)
~[aws-java-sdk-s3-1.10.32.jar:na]

at
org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:106)
~[nifi-aws-processors-0.7.1.jar:0.7.1]

at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
[nifi-api-0.7.1.jar:0.7.1]

at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054)
[nifi-framework-core-0.7.1.jar:0.7.1]

at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
[nifi-framework-core-0.7.1.jar:0.7.1]

at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
[nifi-framework-core-0.7.1.jar:0.7.1]

at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127)
[nifi-framework-core-0.7.1.jar:0.7.1]

at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_101]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[na:1.8.0_101]

at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_101]

at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_101]

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_101]

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_101]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]

On Fri, Oct 28, 2016 at 6:31 AM, Pierre Villard  wrote:

> Quick remark: the fix has also been merged in master and will be in
> release 1.1.0.
>
> Pierre
>
> 2016-10-28 15:22 GMT+02:00 Gop Krr :
>
>> Thanks Adam. I will try 0.7.1 and update the community on the outcome. If
>> it works then I can create a patch for 1.x
>> Thanks
>> Rai
>>
>> On Thu, Oct 27, 2016 at 7:41 PM, Adam Lamar  wrote:
>>
>>> Hey All,
>>>
>>> I believe OP is running into a bug fixed here:
>>> https://issues.apache.org/jira/browse/NIFI-2631
>>>
>>> Basically, ListS3 attempts to commit all the files it finds
>>> (potentially 100k+) at once, rather than in batches. NIFI-2631
>>> addresses the issue. Looks like the fix is out in 0.7.1 but not yet in
>>> a 1.x release.
>>>
>>> Cheers,
>>> Adam
>>>
>>>
>>> On Thu, Oct 27, 2016 at 7:59 PM, Joe Witt  wrote:
>>> > Looking at this line [1] makes me think the FetchS3 processor is
>>> > properly streaming the bytes directly to the content repository.
>>> >
>>> > Looking at the screenshot showing nothing out of the ListS3 processor
>>> > makes me think the bucket has so many things in it that the processor
>>> > or associat

Re: nifi is running out of memory

2016-10-28 Thread Gop Krr
Thanks Adam. I will try 0.7.1 and update the community on the outcome. If
it works then I can create a patch for 1.x
Thanks
Rai

On Thu, Oct 27, 2016 at 7:41 PM, Adam Lamar  wrote:

> Hey All,
>
> I believe OP is running into a bug fixed here:
> https://issues.apache.org/jira/browse/NIFI-2631
>
> Basically, ListS3 attempts to commit all the files it finds
> (potentially 100k+) at once, rather than in batches. NIFI-2631
> addresses the issue. Looks like the fix is out in 0.7.1 but not yet in
> a 1.x release.
>
> Cheers,
> Adam
>
>
> On Thu, Oct 27, 2016 at 7:59 PM, Joe Witt  wrote:
> > Looking at this line [1] makes me think the FetchS3 processor is
> > properly streaming the bytes directly to the content repository.
> >
> > Looking at the screenshot showing nothing out of the ListS3 processor
> > makes me think the bucket has so many things in it that the processor
> > or associated library isn't handling that well and is just listing
> > everything with no mechanism of max buffer size.  Krish please try
> > with the largest heap you can and let us know what you see.
> >
> > [1] https://github.com/apache/nifi/blob/master/nifi-nar-
> bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/
> org/apache/nifi/processors/aws/s3/FetchS3Object.java#L107
> >
> > On Thu, Oct 27, 2016 at 9:37 PM, Joe Witt  wrote:
> >> moving dev to bcc
> >>
> >> Yes I believe the issue here is that FetchS3 doesn't do chunked
> >> transfers and so is loading all into memory.  I've not verified this
> >> in the code yet but it seems quite likely.  Krish if you can verify
> >> that going with a larger heap gets you in the game can you please file
> >> a JIRA.
> >>
> >> Thanks
> >> Joe
> >>
> >> On Thu, Oct 27, 2016 at 9:34 PM, Bryan Bende  wrote:
> >>> Hello,
> >>>
> >>> Are you running with all of the default settings?
> >>>
> >>> If so you would probably want to try increasing the memory settings in
> >>> conf/bootstrap.conf.
> >>>
> >>> They default to 512mb, you may want to try bumping it up to 1024mb.
> >>>
> >>> -Bryan
> >>>
> >>> On Thu, Oct 27, 2016 at 5:46 PM, Gop Krr  wrote:
> >>>>
> >>>> Hi All,
> >>>>
> >>>> I have very simple data flow, where I need to move s3 data from one
> bucket
> >>>> in one account to another bucket under another account. I have
> attached my
> >>>> processor configuration.
> >>>>
> >>>>
> >>>> 2016-10-27 20:09:57,626 ERROR [Flow Service Tasks Thread-2]
> >>>> org.apache.nifi.NiFi An Unknown Error Occurred in Thread Thread[Flow
> Service
> >>>> Tasks Thread-2,5,main]: java.lang.OutOfMemoryError: Java heap space
> >>>>
> >>>> I am very new to NiFi and trying ot get few of the use cases going. I
> need
> >>>> help from the community.
> >>>>
> >>>> Thanks again
> >>>>
> >>>> Rai
> >>>>
> >>>>
> >>>>
> >>>
>


data flow from one s3 bucket to another

2016-10-27 Thread Gop Krr
Has anyone implemented data copy from one s3 bucket to another. i would
greatly appreciate if you can share with me your sample processors
configuration.
Thanks
Rai


Re: PutDynamoDB processor

2016-10-18 Thread Gop Krr
Hi James,
I have started exploring the option of building the scan operator for the
DynamoDB.
I will let you know, how is it going.
Thanks
Rai

On Fri, Oct 14, 2016 at 11:42 AM, James Wing  wrote:

> Correct, but I'm afraid I'm no expert on DynamoDB.  It is my understanding
> that you have to iterate through the keys in the source table one-by-one,
> then put each key's content into the destination table.  You can speed this
> up by using multiple iterators, each covering a distinct portion of the key
> range.
>
> Amazon does provide tools as part of AWS Data Pipeline that might help
> automate this, and if all you want is an identical export and import, that
> is probably easier than NiFi.  But I believe the underlying process is very
> similar, just that Amazon using an ElasticMapReduce cluster instead of
> NiFi.  A key point being that the export and import operations count
> against your provisioned throughput, Amazon provides no shortcut around
> paying for the I/O.  But this might work now, today, without any custom
> code.
>
> Cross-Region Export and Import of DynamoDB Tables
> https://aws.amazon.com/blogs/aws/cross-region-import-and-
> export-of-dynamodb-tables/
>
> AWS Data Pipeline - Export DynamoDB Table to S3
> http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-
> exportddbtos3.html
>
> AWS Data Pipeline - Import DynamoDB Backup Data from S3
> http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-
> exports3toddb.html
>
> Thanks,
>
> James
>
>
> On Fri, Oct 14, 2016 at 10:58 AM, Gop Krr  wrote:
>
>> Thanks James. I would be happy to contribute the scan processor for
>> DynamoDB. Just to clarify, based on your comment, we can't take all the
>> rows of the DynamoDB table and put it into another table. We have to do it
>> for one record at a time?
>>
>> On Fri, Oct 14, 2016 at 10:50 AM, James Wing  wrote:
>>
>>> NiFi's GetDynamoDB processor uses the underlying BatchGetItem API, which
>>> requires item keys as inputs.  Iterating over the keys in a table would
>>> require the Scan API, but NiFi does not have a processor to scan a DynamoDB
>>> table.
>>>
>>> This would be a great addition to NiFi.  If you have any interest in
>>> working on a scan processor, please open a JIRA ticket at
>>> https://issues.apache.org/jira/browse/NIFI.
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> On Thu, Oct 13, 2016 at 2:12 PM, Gop Krr  wrote:
>>>
>>>> Thanks James. I am looking to iterate through the table so that it
>>>> takes hash key values one by one. Do I achieve it through the expression
>>>> language? if I write an script to do that, how do I pass it to my 
>>>> processor?
>>>> Thanks
>>>> Niraj
>>>>
>>>> On Thu, Oct 13, 2016 at 1:42 PM, James Wing  wrote:
>>>>
>>>>> Rai,
>>>>>
>>>>> The GetDynamoDB processor requires a hash key value to look up an item
>>>>> in the table.  The default setting is an Expression Language statement 
>>>>> that
>>>>> reads the hash key value from a flowfile attribute,
>>>>> dynamodb.item.hash.key.value.  But this is not required.  You can change 
>>>>> it
>>>>> to any attribute expression ${my.hash.key}, or even hard-code a single key
>>>>> "item123" if you wish.
>>>>>
>>>>> Does that help?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>>>
>>>>> On Thu, Oct 13, 2016 at 12:17 PM, Gop Krr  wrote:
>>>>>
>>>>>> Hi All,
>>>>>> I have been trying to use get and load processor for the dynamodb and
>>>>>> I am almost there. I am able to run the get processor and I see, data is
>>>>>> flowing :)
>>>>>> But I see the following error in my nifi-app.log file:
>>>>>>
>>>>>> 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9]
>>>>>> o.a.n.p.aws.dynamodb.GetDynamoDB 
>>>>>> GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580]
>>>>>> Hash key value '' is required for flow file 
>>>>>> StandardFlowFileRecord[uuid=44
>>>>>> 554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim
>>>>>> [resourceClaim=StandardResourceClaim[id=1476381755460-37287,
>>>>>> container=default, section=423], offset=0, length=1048576],offset=0,name=
>>>>>> 2503473718684086,size=1048576]
>>>>>>
>>>>>>
>>>>>> I understand that, its looking for the Hash Key Value but I am not
>>>>>> sure, how do I pass it.  In the setting tab, nifi automatically populates
>>>>>> this: ${dynamodb.item.hash.key.value} but looks like this is not the
>>>>>> right way to do it. Can I get some guidance on this? Thanks for all the
>>>>>> help.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Rai
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: PutDynamoDB processor

2016-10-14 Thread Gop Krr
Thanks James. I would be happy to contribute the scan processor for
DynamoDB. Just to clarify, based on your comment, we can't take all the
rows of the DynamoDB table and put it into another table. We have to do it
for one record at a time?

On Fri, Oct 14, 2016 at 10:50 AM, James Wing  wrote:

> NiFi's GetDynamoDB processor uses the underlying BatchGetItem API, which
> requires item keys as inputs.  Iterating over the keys in a table would
> require the Scan API, but NiFi does not have a processor to scan a DynamoDB
> table.
>
> This would be a great addition to NiFi.  If you have any interest in
> working on a scan processor, please open a JIRA ticket at
> https://issues.apache.org/jira/browse/NIFI.
>
> Thanks,
>
> James
>
> On Thu, Oct 13, 2016 at 2:12 PM, Gop Krr  wrote:
>
>> Thanks James. I am looking to iterate through the table so that it takes
>> hash key values one by one. Do I achieve it through the expression
>> language? if I write an script to do that, how do I pass it to my processor?
>> Thanks
>> Niraj
>>
>> On Thu, Oct 13, 2016 at 1:42 PM, James Wing  wrote:
>>
>>> Rai,
>>>
>>> The GetDynamoDB processor requires a hash key value to look up an item
>>> in the table.  The default setting is an Expression Language statement that
>>> reads the hash key value from a flowfile attribute,
>>> dynamodb.item.hash.key.value.  But this is not required.  You can change it
>>> to any attribute expression ${my.hash.key}, or even hard-code a single key
>>> "item123" if you wish.
>>>
>>> Does that help?
>>>
>>> Thanks,
>>>
>>> James
>>>
>>> On Thu, Oct 13, 2016 at 12:17 PM, Gop Krr  wrote:
>>>
>>>> Hi All,
>>>> I have been trying to use get and load processor for the dynamodb and I
>>>> am almost there. I am able to run the get processor and I see, data is
>>>> flowing :)
>>>> But I see the following error in my nifi-app.log file:
>>>>
>>>> 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9]
>>>> o.a.n.p.aws.dynamodb.GetDynamoDB 
>>>> GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580]
>>>> Hash key value '' is required for flow file StandardFlowFileRecord[uuid=44
>>>> 554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim
>>>> [resourceClaim=StandardResourceClaim[id=1476381755460-37287,
>>>> container=default, section=423], offset=0, length=1048576],offset=0,name=
>>>> 2503473718684086,size=1048576]
>>>>
>>>>
>>>> I understand that, its looking for the Hash Key Value but I am not
>>>> sure, how do I pass it.  In the setting tab, nifi automatically populates
>>>> this: ${dynamodb.item.hash.key.value} but looks like this is not the
>>>> right way to do it. Can I get some guidance on this? Thanks for all the
>>>> help.
>>>>
>>>> Best,
>>>>
>>>> Rai
>>>>
>>>
>>>
>>
>


Re: PutDynamoDB processor

2016-10-13 Thread Gop Krr
Thanks James. I am looking to iterate through the table so that it takes
hash key values one by one. Do I achieve it through the expression
language? if I write an script to do that, how do I pass it to my processor?
Thanks
Niraj

On Thu, Oct 13, 2016 at 1:42 PM, James Wing  wrote:

> Rai,
>
> The GetDynamoDB processor requires a hash key value to look up an item in
> the table.  The default setting is an Expression Language statement that
> reads the hash key value from a flowfile attribute,
> dynamodb.item.hash.key.value.  But this is not required.  You can change it
> to any attribute expression ${my.hash.key}, or even hard-code a single key
> "item123" if you wish.
>
> Does that help?
>
> Thanks,
>
> James
>
> On Thu, Oct 13, 2016 at 12:17 PM, Gop Krr  wrote:
>
>> Hi All,
>> I have been trying to use get and load processor for the dynamodb and I
>> am almost there. I am able to run the get processor and I see, data is
>> flowing :)
>> But I see the following error in my nifi-app.log file:
>>
>> 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9]
>> o.a.n.p.aws.dynamodb.GetDynamoDB 
>> GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580]
>> Hash key value '' is required for flow file StandardFlowFileRecord[uuid=44
>> 554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim
>> [resourceClaim=StandardResourceClaim[id=1476381755460-37287,
>> container=default, section=423], offset=0, length=1048576],offset=0,name=
>> 2503473718684086,size=1048576]
>>
>>
>> I understand that, its looking for the Hash Key Value but I am not sure,
>> how do I pass it.  In the setting tab, nifi automatically populates
>> this: ${dynamodb.item.hash.key.value} but looks like this is not the
>> right way to do it. Can I get some guidance on this? Thanks for all the
>> help.
>>
>> Best,
>>
>> Rai
>>
>
>


PutDynamoDB processor

2016-10-13 Thread Gop Krr
Hi All,
I have been trying to use get and load processor for the dynamodb and I am
almost there. I am able to run the get processor and I see, data is flowing
:)
But I see the following error in my nifi-app.log file:

2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9]
o.a.n.p.aws.dynamodb.GetDynamoDB
GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580] Hash key value '' is
required for flow file
StandardFlowFileRecord[uuid=44554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1476381755460-37287,
container=default, section=423], offset=0,
length=1048576],offset=0,name=2503473718684086,size=1048576]


I understand that, its looking for the Hash Key Value but I am not sure,
how do I pass it.  In the setting tab, nifi automatically populates
this: ${dynamodb.item.hash.key.value} but looks like this is not the right
way to do it. Can I get some guidance on this? Thanks for all the help.

Best,

Rai


Re: Book/Training for NiFi

2016-10-13 Thread Gop Krr
Thanks Andy. Appreciate your guidance.

On Thu, Oct 13, 2016 at 10:39 AM, Andy LoPresto 
wrote:

> Hi Rai,
>
> There are some excellent documents on the Apache NiFi site [1] to help you
> learn. There is an Administrator Guide [2], a User Guide [3], a Developer
> Guide [4], a NiFi In-Depth document [5], an Expression Language Guide [6]
> and processor and component documentation [7] as well. Currently, we are
> unaware of any official “book” resources.
>
> Any corporate offerings are separate from the Apache project and should be
> investigated with said company.
>
> [1] https://nifi.apache.org/
> [2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
> [3] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html
> [4] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html
> [5] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
> [6] https://nifi.apache.org/docs/nifi-docs/html/
> expression-language-guide.html
> [7] https://nifi.apache.org/docs.html
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Oct 13, 2016, at 10:28 AM, Gop Krr  wrote:
>
> Hi All,
> Is there any book for apache NiFi?
> Also, does Hortonworks conducts training for NiFi?
> Thanks
> Rai
>
>
>


Book/Training for NiFi

2016-10-13 Thread Gop Krr
Hi All,
Is there any book for apache NiFi?
Also, does Hortonworks conducts training for NiFi?
Thanks
Rai


Re: NiFi for backup solution

2016-10-13 Thread Gop Krr
Thanks Joe and Matt.
@Joe, based on your comment, I need to use NiFi as a producer which puts
the data on Kafka queue and then have NiFi consumer, which writes the data
back to the destination. Is my understanding correct?

@Matt, My use case is for the DynamoDB. I will look into whether
incremental copy is supported for Dynamodb.
Thanks again and felt so good to see the vibrant community. I got my
questions answered within five minutes. Kudos to NiFi community.

On Thu, Oct 13, 2016 at 8:17 AM, Matt Burgess  wrote:

> Rai,
>
> There are incremental data movement processors in NiFi depending on
> your source/target. For example, if your sources are files, you can
> use ListFile in combination with FetchFile, the former will keep track
> of which files it has found thus far, so if you put new files into the
> location (or update existing ones), only those new/updated files will
> be processed the next time.
>
> For database (RDBMS) sources, there are the QueryDatabaseTable and
> GenerateTableFetch processors, which support the idea of "maximum
> value columns", such that for each of said columns, the processor(s)
> will keep track of the maximum value observed in that column, then for
> future executions of the processor, only rows whose values in those
> columns exceed the currently-observed maximum will be retrieved, then
> the maximum will be updated, and so forth.
>
> The Usage documentation for these processors can be found at
> https://nifi.apache.org/docs.html (left-hand side under Processors).
>
> Regards,
> Matt
>
> On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr  wrote:
> > Hi All,
> > I am learning NiFi as well as trying to deploy it in production for few
> use
> > cases. One of the use case is ETL and another use case is, using NiFi as
> a
> > backup solution, where it takes the data from one source and moves to
> > another database|file. Is anyone using NiFi for this purpose? Does NiFi
> > support incremental data move?
> > It would be awesome if someone can point me to right documentation.
> > Thanks
> > Rai
>


NiFi for backup solution

2016-10-13 Thread Gop Krr
Hi All,
I am learning NiFi as well as trying to deploy it in production for few
 use cases. One of the use case is ETL and another use case is, using NiFi
as a backup solution, where it takes the data from one source and moves to
another database|file. Is anyone using NiFi for this purpose? Does NiFi
support incremental data move?
It would be awesome if someone can point me to right documentation.
Thanks
Rai


upstream system is invalid

2016-10-07 Thread Gop Krr
Hi, I am getting the following error:
" The upstream system is invalid because it needs an upstream system and
there is none"

I am trying to move data from one dynamodb table to another without doing
any transformation and I have configured two processors, one for get and
another for put. Do I need to do anything else?
Thanks for the help.
Regards
Niraj