Re: data flow from one s3 bucket to another
Thanks Koji. I was able to solve this problem but what you have attached is very useful. On Wed, Nov 16, 2016 at 6:43 PM, Koji Kawamura wrote: > Hello Gop, > > Have you already found how to move data around S3 buckets? I hope you do. > But just in case if you haven't yet, I wrote a simple NiFi flow and > shared it in Gist: > https://gist.github.com/ijokarumawak/26ff675039e252d177b1195f3576cf9a > > I misconfigured region and got an error once, but after I setup bucket > name, region and credential correctly, it worked as expected. > I'd recommend to test a S3 related flow using ListS3 processor, to see > if the credential can properly access the target S3 bucket. > > Thanks, > Koji > > On Fri, Oct 28, 2016 at 7:35 AM, Gop Krr wrote: > > Has anyone implemented data copy from one s3 bucket to another. i would > > greatly appreciate if you can share with me your sample processors > > configuration. > > Thanks > > Rai >
Re: nifi is running out of memory
Thanks Joe for checking. Yes, I got past it and I was successfully able to demo it to the team :) Now, the next challenge is to drive the performance out of nifi for the high throughput. On Mon, Oct 31, 2016 at 7:08 PM, Joe Witt wrote: > Krish, > > Did you ever get past this? > > Thanks > Joe > > On Fri, Oct 28, 2016 at 2:36 PM, Gop Krr wrote: > > James, permission issue got resolved. I still don't see any write. > > > > On Fri, Oct 28, 2016 at 10:34 AM, Gop Krr wrote: > >> > >> Thanks James.. I am looking into permission issue and update the > thread. I > >> will also make the changes as you per your recommendation. > >> > >> On Fri, Oct 28, 2016 at 10:23 AM, James Wing wrote: > >>> > >>> From the screenshot and the error message, I interpret the sequence of > >>> events to be something like this: > >>> > >>> 1.) ListS3 succeeds and generates flowfiles with attributes referencing > >>> S3 objects, but no content (0 bytes) > >>> 2.) FetchS3Object fails to pull the S3 object content with an Access > >>> Denied error, but the failed flowfiles are routed on to PutS3Object > (35,179 > >>> files / 0 bytes in the "putconnector" queue) > >>> 3.) PutS3Object is succeeding, writing the 0 byte content from ListS3 > >>> > >>> I recommend a couple thing for FetchS3Object: > >>> > >>> * Only allow the "success" relationship to continue to PutS3Object. > >>> Separate the "failure" relationship to either loop back to > FetchS3Object or > >>> go to a LogAttibute processor, or other handling path. > >>> * It looks like the permissions aren't working, you might want to > >>> double-check the access keys or try a sample file with the AWS CLI. > >>> > >>> Thanks, > >>> > >>> James > >>> > >>> > >>> On Fri, Oct 28, 2016 at 10:01 AM, Gop Krr wrote: > >>>> > >>>> This is how my nifi flow looks like. > >>>> > >>>> On Fri, Oct 28, 2016 at 9:57 AM, Gop Krr wrote: > >>>>> > >>>>> Thanks Bryan, Joe, Adam and Pierre. I went past this issue by > switching > >>>>> to 0.71. Now it is able to list the files from buckets and create > those > >>>>> files in the another bucket. But write is not happening and I am > getting the > >>>>> permission issue ( I have attached below for the reference) Could > this be > >>>>> the setting of the buckets or it has more to do with the access key. > All the > >>>>> files which are creaetd in the new bucket are of 0 byte. > >>>>> Thanks > >>>>> Rai > >>>>> > >>>>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3] > >>>>> o.a.nifi.processors.aws.s3.FetchS3Object FetchS3Object[id=x] > Failed to > >>>>> retrieve S3 Object for > >>>>> StandardFlowFileRecord[uuid=y,claim=,offset=0,name= > x.gz,size=0]; > >>>>> routing to failure: com.amazonaws.services.s3. > model.AmazonS3Exception: > >>>>> Access Denied (Service: Amazon S3; Status Code: 403; Error Code: > >>>>> AccessDenied; Request ID: xxx), S3 Extended Request ID: > >>>>> lu8tAqRxu+ouinnVvJleHkUUyK6J6rIQCTw0G8G6 > DB6NOPGec0D1KB6cfUPsj08IQXI8idtiTp4= > >>>>> > >>>>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3] > >>>>> o.a.nifi.processors.aws.s3.FetchS3Object > >>>>> > >>>>> com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied > >>>>> (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; > Request ID: > >>>>> 0F34E71C0697B1D8) > >>>>> > >>>>> at > >>>>> com.amazonaws.http.AmazonHttpClient.handleErrorResponse( > AmazonHttpClient.java:1219) > >>>>> ~[aws-java-sdk-core-1.10.32.jar:na] > >>>>> > >>>>> at > >>>>> com.amazonaws.http.AmazonHttpClient.executeOneRequest( > AmazonHttpClient.java:803) > >>>>> ~[aws-java-sdk-core-1.10.32.jar:na] > >>>>> > >>>>> at > >>>>> com.amazonaws.http.AmazonHttpClient.executeHelper( > AmazonHttpClient.java:505) > >>>>> ~[aws-java
Re: nifi is running out of memory
Thanks James.. I am looking into permission issue and update the thread. I will also make the changes as you per your recommendation. On Fri, Oct 28, 2016 at 10:23 AM, James Wing wrote: > From the screenshot and the error message, I interpret the sequence of > events to be something like this: > > 1.) ListS3 succeeds and generates flowfiles with attributes referencing S3 > objects, but no content (0 bytes) > 2.) FetchS3Object fails to pull the S3 object content with an Access > Denied error, but the failed flowfiles are routed on to PutS3Object (35,179 > files / 0 bytes in the "putconnector" queue) > 3.) PutS3Object is succeeding, writing the 0 byte content from ListS3 > > I recommend a couple thing for FetchS3Object: > > * Only allow the "success" relationship to continue to PutS3Object. > Separate the "failure" relationship to either loop back to FetchS3Object or > go to a LogAttibute processor, or other handling path. > * It looks like the permissions aren't working, you might want to > double-check the access keys or try a sample file with the AWS CLI. > > Thanks, > > James > > > On Fri, Oct 28, 2016 at 10:01 AM, Gop Krr wrote: > >> This is how my nifi flow looks like. >> >> On Fri, Oct 28, 2016 at 9:57 AM, Gop Krr wrote: >> >>> Thanks Bryan, Joe, Adam and Pierre. I went past this issue by switching >>> to 0.71. Now it is able to list the files from buckets and create those >>> files in the another bucket. But write is not happening and I am getting >>> the permission issue ( I have attached below for the reference) Could this >>> be the setting of the buckets or it has more to do with the access key. All >>> the files which are creaetd in the new bucket are of 0 byte. >>> Thanks >>> Rai >>> >>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3] >>> o.a.nifi.processors.aws.s3.FetchS3Object FetchS3Object[id=x] Failed >>> to retrieve S3 Object for StandardFlowFileRecord[uuid=yy >>> yyy,claim=,offset=0,name=x.gz,size=0]; routing to failure: >>> com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied >>> (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; >>> Request ID: xxx), S3 Extended Request ID: lu8tAqRxu+ouinnVvJleHkUUyK6J6r >>> IQCTw0G8G6DB6NOPGec0D1KB6cfUPsj08IQXI8idtiTp4= >>> >>> 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3] >>> o.a.nifi.processors.aws.s3.FetchS3Object >>> >>> com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied >>> (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; >>> Request ID: 0F34E71C0697B1D8) >>> >>> at >>> com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1219) >>> ~[aws-java-sdk-core-1.10.32.jar:na] >>> >>> at >>> com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:803) >>> ~[aws-java-sdk-core-1.10.32.jar:na] >>> >>> at >>> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:505) >>> ~[aws-java-sdk-core-1.10.32.jar:na] >>> >>> at >>> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:317) >>> ~[aws-java-sdk-core-1.10.32.jar:na] >>> >>> at >>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3595) >>> ~[aws-java-sdk-s3-1.10.32.jar:na] >>> >>> at >>> com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1116) >>> ~[aws-java-sdk-s3-1.10.32.jar:na] >>> >>> at >>> org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:106) >>> ~[nifi-aws-processors-0.7.1.jar:0.7.1] >>> >>> at >>> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) >>> [nifi-api-0.7.1.jar:0.7.1] >>> >>> at org.apache.nifi.controller.StandardProcessorNode.onTrigger(S >>> tandardProcessorNode.java:1054) [nifi-framework-core-0.7.1.jar:0.7.1] >>> >>> at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask >>> .call(ContinuallyRunProcessorTask.java:136) >>> [nifi-framework-core-0.7.1.jar:0.7.1] >>> >>> at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask >>> .call(ContinuallyRunProcessorTask.java:47) >>> [nifi-framework-core-0.7.1.jar:0.7.1] >>> >>> at org.apache.nifi.controller.schedulin
Re: nifi is running out of memory
Thanks Bryan, Joe, Adam and Pierre. I went past this issue by switching to 0.71. Now it is able to list the files from buckets and create those files in the another bucket. But write is not happening and I am getting the permission issue ( I have attached below for the reference) Could this be the setting of the buckets or it has more to do with the access key. All the files which are creaetd in the new bucket are of 0 byte. Thanks Rai 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3] o.a.nifi.processors.aws.s3.FetchS3Object FetchS3Object[id=x] Failed to retrieve S3 Object for StandardFlowFileRecord[uuid=y,claim=,offset=0,name=x.gz,size=0]; routing to failure: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: xxx), S3 Extended Request ID: lu8tAqRxu+ouinnVvJleHkUUyK6J6rIQCTw0G8G6DB6NOPGec0D1KB6cfUPsj08IQXI8idtiTp4= 2016-10-28 16:45:25,438 ERROR [Timer-Driven Process Thread-3] o.a.nifi.processors.aws.s3.FetchS3Object com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 0F34E71C0697B1D8) at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1219) ~[aws-java-sdk-core-1.10.32.jar:na] at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:803) ~[aws-java-sdk-core-1.10.32.jar:na] at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:505) ~[aws-java-sdk-core-1.10.32.jar:na] at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:317) ~[aws-java-sdk-core-1.10.32.jar:na] at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3595) ~[aws-java-sdk-s3-1.10.32.jar:na] at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1116) ~[aws-java-sdk-s3-1.10.32.jar:na] at org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:106) ~[nifi-aws-processors-0.7.1.jar:0.7.1] at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) [nifi-api-0.7.1.jar:0.7.1] at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1054) [nifi-framework-core-0.7.1.jar:0.7.1] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) [nifi-framework-core-0.7.1.jar:0.7.1] at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) [nifi-framework-core-0.7.1.jar:0.7.1] at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:127) [nifi-framework-core-0.7.1.jar:0.7.1] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] On Fri, Oct 28, 2016 at 6:31 AM, Pierre Villard wrote: > Quick remark: the fix has also been merged in master and will be in > release 1.1.0. > > Pierre > > 2016-10-28 15:22 GMT+02:00 Gop Krr : > >> Thanks Adam. I will try 0.7.1 and update the community on the outcome. If >> it works then I can create a patch for 1.x >> Thanks >> Rai >> >> On Thu, Oct 27, 2016 at 7:41 PM, Adam Lamar wrote: >> >>> Hey All, >>> >>> I believe OP is running into a bug fixed here: >>> https://issues.apache.org/jira/browse/NIFI-2631 >>> >>> Basically, ListS3 attempts to commit all the files it finds >>> (potentially 100k+) at once, rather than in batches. NIFI-2631 >>> addresses the issue. Looks like the fix is out in 0.7.1 but not yet in >>> a 1.x release. >>> >>> Cheers, >>> Adam >>> >>> >>> On Thu, Oct 27, 2016 at 7:59 PM, Joe Witt wrote: >>> > Looking at this line [1] makes me think the FetchS3 processor is >>> > properly streaming the bytes directly to the content repository. >>> > >>> > Looking at the screenshot showing nothing out of the ListS3 processor >>> > makes me think the bucket has so many things in it that the processor >>> > or associat
Re: nifi is running out of memory
Thanks Adam. I will try 0.7.1 and update the community on the outcome. If it works then I can create a patch for 1.x Thanks Rai On Thu, Oct 27, 2016 at 7:41 PM, Adam Lamar wrote: > Hey All, > > I believe OP is running into a bug fixed here: > https://issues.apache.org/jira/browse/NIFI-2631 > > Basically, ListS3 attempts to commit all the files it finds > (potentially 100k+) at once, rather than in batches. NIFI-2631 > addresses the issue. Looks like the fix is out in 0.7.1 but not yet in > a 1.x release. > > Cheers, > Adam > > > On Thu, Oct 27, 2016 at 7:59 PM, Joe Witt wrote: > > Looking at this line [1] makes me think the FetchS3 processor is > > properly streaming the bytes directly to the content repository. > > > > Looking at the screenshot showing nothing out of the ListS3 processor > > makes me think the bucket has so many things in it that the processor > > or associated library isn't handling that well and is just listing > > everything with no mechanism of max buffer size. Krish please try > > with the largest heap you can and let us know what you see. > > > > [1] https://github.com/apache/nifi/blob/master/nifi-nar- > bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/ > org/apache/nifi/processors/aws/s3/FetchS3Object.java#L107 > > > > On Thu, Oct 27, 2016 at 9:37 PM, Joe Witt wrote: > >> moving dev to bcc > >> > >> Yes I believe the issue here is that FetchS3 doesn't do chunked > >> transfers and so is loading all into memory. I've not verified this > >> in the code yet but it seems quite likely. Krish if you can verify > >> that going with a larger heap gets you in the game can you please file > >> a JIRA. > >> > >> Thanks > >> Joe > >> > >> On Thu, Oct 27, 2016 at 9:34 PM, Bryan Bende wrote: > >>> Hello, > >>> > >>> Are you running with all of the default settings? > >>> > >>> If so you would probably want to try increasing the memory settings in > >>> conf/bootstrap.conf. > >>> > >>> They default to 512mb, you may want to try bumping it up to 1024mb. > >>> > >>> -Bryan > >>> > >>> On Thu, Oct 27, 2016 at 5:46 PM, Gop Krr wrote: > >>>> > >>>> Hi All, > >>>> > >>>> I have very simple data flow, where I need to move s3 data from one > bucket > >>>> in one account to another bucket under another account. I have > attached my > >>>> processor configuration. > >>>> > >>>> > >>>> 2016-10-27 20:09:57,626 ERROR [Flow Service Tasks Thread-2] > >>>> org.apache.nifi.NiFi An Unknown Error Occurred in Thread Thread[Flow > Service > >>>> Tasks Thread-2,5,main]: java.lang.OutOfMemoryError: Java heap space > >>>> > >>>> I am very new to NiFi and trying ot get few of the use cases going. I > need > >>>> help from the community. > >>>> > >>>> Thanks again > >>>> > >>>> Rai > >>>> > >>>> > >>>> > >>> >
data flow from one s3 bucket to another
Has anyone implemented data copy from one s3 bucket to another. i would greatly appreciate if you can share with me your sample processors configuration. Thanks Rai
Re: PutDynamoDB processor
Hi James, I have started exploring the option of building the scan operator for the DynamoDB. I will let you know, how is it going. Thanks Rai On Fri, Oct 14, 2016 at 11:42 AM, James Wing wrote: > Correct, but I'm afraid I'm no expert on DynamoDB. It is my understanding > that you have to iterate through the keys in the source table one-by-one, > then put each key's content into the destination table. You can speed this > up by using multiple iterators, each covering a distinct portion of the key > range. > > Amazon does provide tools as part of AWS Data Pipeline that might help > automate this, and if all you want is an identical export and import, that > is probably easier than NiFi. But I believe the underlying process is very > similar, just that Amazon using an ElasticMapReduce cluster instead of > NiFi. A key point being that the export and import operations count > against your provisioned throughput, Amazon provides no shortcut around > paying for the I/O. But this might work now, today, without any custom > code. > > Cross-Region Export and Import of DynamoDB Tables > https://aws.amazon.com/blogs/aws/cross-region-import-and- > export-of-dynamodb-tables/ > > AWS Data Pipeline - Export DynamoDB Table to S3 > http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template- > exportddbtos3.html > > AWS Data Pipeline - Import DynamoDB Backup Data from S3 > http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template- > exports3toddb.html > > Thanks, > > James > > > On Fri, Oct 14, 2016 at 10:58 AM, Gop Krr wrote: > >> Thanks James. I would be happy to contribute the scan processor for >> DynamoDB. Just to clarify, based on your comment, we can't take all the >> rows of the DynamoDB table and put it into another table. We have to do it >> for one record at a time? >> >> On Fri, Oct 14, 2016 at 10:50 AM, James Wing wrote: >> >>> NiFi's GetDynamoDB processor uses the underlying BatchGetItem API, which >>> requires item keys as inputs. Iterating over the keys in a table would >>> require the Scan API, but NiFi does not have a processor to scan a DynamoDB >>> table. >>> >>> This would be a great addition to NiFi. If you have any interest in >>> working on a scan processor, please open a JIRA ticket at >>> https://issues.apache.org/jira/browse/NIFI. >>> >>> Thanks, >>> >>> James >>> >>> On Thu, Oct 13, 2016 at 2:12 PM, Gop Krr wrote: >>> >>>> Thanks James. I am looking to iterate through the table so that it >>>> takes hash key values one by one. Do I achieve it through the expression >>>> language? if I write an script to do that, how do I pass it to my >>>> processor? >>>> Thanks >>>> Niraj >>>> >>>> On Thu, Oct 13, 2016 at 1:42 PM, James Wing wrote: >>>> >>>>> Rai, >>>>> >>>>> The GetDynamoDB processor requires a hash key value to look up an item >>>>> in the table. The default setting is an Expression Language statement >>>>> that >>>>> reads the hash key value from a flowfile attribute, >>>>> dynamodb.item.hash.key.value. But this is not required. You can change >>>>> it >>>>> to any attribute expression ${my.hash.key}, or even hard-code a single key >>>>> "item123" if you wish. >>>>> >>>>> Does that help? >>>>> >>>>> Thanks, >>>>> >>>>> James >>>>> >>>>> On Thu, Oct 13, 2016 at 12:17 PM, Gop Krr wrote: >>>>> >>>>>> Hi All, >>>>>> I have been trying to use get and load processor for the dynamodb and >>>>>> I am almost there. I am able to run the get processor and I see, data is >>>>>> flowing :) >>>>>> But I see the following error in my nifi-app.log file: >>>>>> >>>>>> 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9] >>>>>> o.a.n.p.aws.dynamodb.GetDynamoDB >>>>>> GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580] >>>>>> Hash key value '' is required for flow file >>>>>> StandardFlowFileRecord[uuid=44 >>>>>> 554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim >>>>>> [resourceClaim=StandardResourceClaim[id=1476381755460-37287, >>>>>> container=default, section=423], offset=0, length=1048576],offset=0,name= >>>>>> 2503473718684086,size=1048576] >>>>>> >>>>>> >>>>>> I understand that, its looking for the Hash Key Value but I am not >>>>>> sure, how do I pass it. In the setting tab, nifi automatically populates >>>>>> this: ${dynamodb.item.hash.key.value} but looks like this is not the >>>>>> right way to do it. Can I get some guidance on this? Thanks for all the >>>>>> help. >>>>>> >>>>>> Best, >>>>>> >>>>>> Rai >>>>>> >>>>> >>>>> >>>> >>> >> >
Re: PutDynamoDB processor
Thanks James. I would be happy to contribute the scan processor for DynamoDB. Just to clarify, based on your comment, we can't take all the rows of the DynamoDB table and put it into another table. We have to do it for one record at a time? On Fri, Oct 14, 2016 at 10:50 AM, James Wing wrote: > NiFi's GetDynamoDB processor uses the underlying BatchGetItem API, which > requires item keys as inputs. Iterating over the keys in a table would > require the Scan API, but NiFi does not have a processor to scan a DynamoDB > table. > > This would be a great addition to NiFi. If you have any interest in > working on a scan processor, please open a JIRA ticket at > https://issues.apache.org/jira/browse/NIFI. > > Thanks, > > James > > On Thu, Oct 13, 2016 at 2:12 PM, Gop Krr wrote: > >> Thanks James. I am looking to iterate through the table so that it takes >> hash key values one by one. Do I achieve it through the expression >> language? if I write an script to do that, how do I pass it to my processor? >> Thanks >> Niraj >> >> On Thu, Oct 13, 2016 at 1:42 PM, James Wing wrote: >> >>> Rai, >>> >>> The GetDynamoDB processor requires a hash key value to look up an item >>> in the table. The default setting is an Expression Language statement that >>> reads the hash key value from a flowfile attribute, >>> dynamodb.item.hash.key.value. But this is not required. You can change it >>> to any attribute expression ${my.hash.key}, or even hard-code a single key >>> "item123" if you wish. >>> >>> Does that help? >>> >>> Thanks, >>> >>> James >>> >>> On Thu, Oct 13, 2016 at 12:17 PM, Gop Krr wrote: >>> >>>> Hi All, >>>> I have been trying to use get and load processor for the dynamodb and I >>>> am almost there. I am able to run the get processor and I see, data is >>>> flowing :) >>>> But I see the following error in my nifi-app.log file: >>>> >>>> 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9] >>>> o.a.n.p.aws.dynamodb.GetDynamoDB >>>> GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580] >>>> Hash key value '' is required for flow file StandardFlowFileRecord[uuid=44 >>>> 554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim >>>> [resourceClaim=StandardResourceClaim[id=1476381755460-37287, >>>> container=default, section=423], offset=0, length=1048576],offset=0,name= >>>> 2503473718684086,size=1048576] >>>> >>>> >>>> I understand that, its looking for the Hash Key Value but I am not >>>> sure, how do I pass it. In the setting tab, nifi automatically populates >>>> this: ${dynamodb.item.hash.key.value} but looks like this is not the >>>> right way to do it. Can I get some guidance on this? Thanks for all the >>>> help. >>>> >>>> Best, >>>> >>>> Rai >>>> >>> >>> >> >
Re: PutDynamoDB processor
Thanks James. I am looking to iterate through the table so that it takes hash key values one by one. Do I achieve it through the expression language? if I write an script to do that, how do I pass it to my processor? Thanks Niraj On Thu, Oct 13, 2016 at 1:42 PM, James Wing wrote: > Rai, > > The GetDynamoDB processor requires a hash key value to look up an item in > the table. The default setting is an Expression Language statement that > reads the hash key value from a flowfile attribute, > dynamodb.item.hash.key.value. But this is not required. You can change it > to any attribute expression ${my.hash.key}, or even hard-code a single key > "item123" if you wish. > > Does that help? > > Thanks, > > James > > On Thu, Oct 13, 2016 at 12:17 PM, Gop Krr wrote: > >> Hi All, >> I have been trying to use get and load processor for the dynamodb and I >> am almost there. I am able to run the get processor and I see, data is >> flowing :) >> But I see the following error in my nifi-app.log file: >> >> 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9] >> o.a.n.p.aws.dynamodb.GetDynamoDB >> GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580] >> Hash key value '' is required for flow file StandardFlowFileRecord[uuid=44 >> 554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim >> [resourceClaim=StandardResourceClaim[id=1476381755460-37287, >> container=default, section=423], offset=0, length=1048576],offset=0,name= >> 2503473718684086,size=1048576] >> >> >> I understand that, its looking for the Hash Key Value but I am not sure, >> how do I pass it. In the setting tab, nifi automatically populates >> this: ${dynamodb.item.hash.key.value} but looks like this is not the >> right way to do it. Can I get some guidance on this? Thanks for all the >> help. >> >> Best, >> >> Rai >> > >
PutDynamoDB processor
Hi All, I have been trying to use get and load processor for the dynamodb and I am almost there. I am able to run the get processor and I see, data is flowing :) But I see the following error in my nifi-app.log file: 2016-10-13 18:02:38,823 ERROR [Timer-Driven Process Thread-9] o.a.n.p.aws.dynamodb.GetDynamoDB GetDynamoDB[id=7d906337-0157-1000-5868-479d0e0e3580] Hash key value '' is required for flow file StandardFlowFileRecord[uuid=44554c23-1618-47db-b46e-04ffd737748e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1476381755460-37287, container=default, section=423], offset=0, length=1048576],offset=0,name=2503473718684086,size=1048576] I understand that, its looking for the Hash Key Value but I am not sure, how do I pass it. In the setting tab, nifi automatically populates this: ${dynamodb.item.hash.key.value} but looks like this is not the right way to do it. Can I get some guidance on this? Thanks for all the help. Best, Rai
Re: Book/Training for NiFi
Thanks Andy. Appreciate your guidance. On Thu, Oct 13, 2016 at 10:39 AM, Andy LoPresto wrote: > Hi Rai, > > There are some excellent documents on the Apache NiFi site [1] to help you > learn. There is an Administrator Guide [2], a User Guide [3], a Developer > Guide [4], a NiFi In-Depth document [5], an Expression Language Guide [6] > and processor and component documentation [7] as well. Currently, we are > unaware of any official “book” resources. > > Any corporate offerings are separate from the Apache project and should be > investigated with said company. > > [1] https://nifi.apache.org/ > [2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html > [3] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html > [4] https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html > [5] https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html > [6] https://nifi.apache.org/docs/nifi-docs/html/ > expression-language-guide.html > [7] https://nifi.apache.org/docs.html > > Andy LoPresto > alopre...@apache.org > *alopresto.apa...@gmail.com * > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On Oct 13, 2016, at 10:28 AM, Gop Krr wrote: > > Hi All, > Is there any book for apache NiFi? > Also, does Hortonworks conducts training for NiFi? > Thanks > Rai > > >
Book/Training for NiFi
Hi All, Is there any book for apache NiFi? Also, does Hortonworks conducts training for NiFi? Thanks Rai
Re: NiFi for backup solution
Thanks Joe and Matt. @Joe, based on your comment, I need to use NiFi as a producer which puts the data on Kafka queue and then have NiFi consumer, which writes the data back to the destination. Is my understanding correct? @Matt, My use case is for the DynamoDB. I will look into whether incremental copy is supported for Dynamodb. Thanks again and felt so good to see the vibrant community. I got my questions answered within five minutes. Kudos to NiFi community. On Thu, Oct 13, 2016 at 8:17 AM, Matt Burgess wrote: > Rai, > > There are incremental data movement processors in NiFi depending on > your source/target. For example, if your sources are files, you can > use ListFile in combination with FetchFile, the former will keep track > of which files it has found thus far, so if you put new files into the > location (or update existing ones), only those new/updated files will > be processed the next time. > > For database (RDBMS) sources, there are the QueryDatabaseTable and > GenerateTableFetch processors, which support the idea of "maximum > value columns", such that for each of said columns, the processor(s) > will keep track of the maximum value observed in that column, then for > future executions of the processor, only rows whose values in those > columns exceed the currently-observed maximum will be retrieved, then > the maximum will be updated, and so forth. > > The Usage documentation for these processors can be found at > https://nifi.apache.org/docs.html (left-hand side under Processors). > > Regards, > Matt > > On Thu, Oct 13, 2016 at 11:05 AM, Gop Krr wrote: > > Hi All, > > I am learning NiFi as well as trying to deploy it in production for few > use > > cases. One of the use case is ETL and another use case is, using NiFi as > a > > backup solution, where it takes the data from one source and moves to > > another database|file. Is anyone using NiFi for this purpose? Does NiFi > > support incremental data move? > > It would be awesome if someone can point me to right documentation. > > Thanks > > Rai >
NiFi for backup solution
Hi All, I am learning NiFi as well as trying to deploy it in production for few use cases. One of the use case is ETL and another use case is, using NiFi as a backup solution, where it takes the data from one source and moves to another database|file. Is anyone using NiFi for this purpose? Does NiFi support incremental data move? It would be awesome if someone can point me to right documentation. Thanks Rai
upstream system is invalid
Hi, I am getting the following error: " The upstream system is invalid because it needs an upstream system and there is none" I am trying to move data from one dynamodb table to another without doing any transformation and I have configured two processors, one for get and another for put. Do I need to do anything else? Thanks for the help. Regards Niraj