Anup,
With the 0.1.0 release that we are working on right now, there are two new 
processors: ListHDFS, FetchHDFS, that are able to keep state about what has 
been pulled from HDFS. This way you can keep the data in HDFS and still only 
pull in new data. Will this help?
Thanks-Mark

> From: [email protected]
> To: [email protected]
> Subject: RE: Fetch change list
> Date: Tue, 5 May 2015 15:32:07 +0000
> 
> Thanks Corey for that info. But the major problem I'm facing is I am backing 
> up a large set of data into HDFS (with a GetHDFS , source retained as true) 
> and then trying to fetch the delta from it. (get only the files which have 
> arrived recently by using the min Age and max Age). But I'm unable to get the 
> exact delta if I have 'keep source file' as true..
> I played around a lot with schedule time and min & max age but didn't help.
> 
> -----Original Message-----
> From: Corey Flowers [mailto:[email protected]]
> Sent: Tuesday, May 05, 2015 5:35 PM
> To: [email protected]
> Subject: Re: Fetch change list
> 
> Ok, the get file that is running, is basically causing a race condition 
> between all of the servers in your cluster. That is why you are seeing the 
> "NoSuchFile" error. If you change the scheduling strategy on that processor 
> to "On Primary node" Then the only system that will try to pick up data from 
> that mount point, is the server you have designated "primary node".
> This should fix that issue.
> 
> On Mon, May 4, 2015 at 11:30 PM, Sethuram, Anup <[email protected]>
> wrote:
> 
> > Yes Corey, Right now the pickup directory is from a network share
> > mount point. The data is picked up from one location and transferred
> > to the other. I'm using site-to-site communication.
> >
> > -----Original Message-----
> > From: Corey Flowers [mailto:[email protected]]
> > Sent: Monday, May 04, 2015 7:57 PM
> > To: [email protected]
> > Subject: Re: Fetch change list
> >
> > Good morning Anup!
> >
> >          Is the pickup directory coming from a network share mount point?
> >
> > On Mon, May 4, 2015 at 10:11 AM, Sethuram, Anup
> > <[email protected]
> > >
> > wrote:
> >
> > > Hi ,
> > >                 I'm trying to fetch a set of files which have
> > > recently changed in a "filesystem". Also I'm supposed to keep the
> > > original copy as it is.
> > > For obtaining the latest files that have changed, I'm using a
> > > PutFile with "replace" strategy piped to a GetFile with a minimum
> > > age of 5 sec,  max file age of 30 sec, Keep source file as true,
> > >
> > > Also, running it in clustered mode. I'm seeing the below issues
> > >
> > > -          The queue starts growing if there's an error.
> > >
> > > -          Continuous errors with 'NoSuchFileException'
> > >
> > > -          Penalizing StandardFlowFileErrors
> > >
> > >
> > >
> > >
> > > ERROR
> > >
> > > 0ab3b920-1f05-4f24-b861-4fded3d5d826
> > >
> > > 161.91.234.248:7087
> > >
> > > GetFile[id=0ab3b920-1f05-4f24-b861-4fded3d5d826] Failed to retrieve
> > > files due to
> > > org.apache.nifi.processor.exception.FlowFileAccessException: Failed
> > > to import data from /nifi/UNZ/log201403230000.log for
> > > StandardFlowFileRecord[uuid=f29bda59-8611-427c-b4d7-c921ee5e74b8,cla
> > > im =,offset=0,name=6908587554457536,size=0]
> > > due to java.nio.file.NoSuchFileException:
> > > /nifi/UNZ/log201403230000.log
> > >
> > > 18:45:56 IST
> > >
> > >
> > >
> > > 10:54:50 IST
> > >
> > > ERROR
> > >
> > > c552b5bc-f627-3cc3-b3d0-545c519eafd9
> > >
> > > 161.91.234.248:6087
> > >
> > > PutFile[id=c552b5bc-f627-3cc3-b3d0-545c519eafd9] Penalizing
> > > StandardFlowFileRecord[uuid=876e51f7-9a3d-4bf9-9d11-9073a5c950ad,cla
> > > im =1430717088883-73580,offset=0,name=file1.log,size=29314779]
> > > and transferring to failure due to
> > > org.apache.nifi.processor.exception.ProcessException: Could not
> > > rename
> > > /nifi/UNZ/.file1.log:
> > org.apache.nifi.processor.exception.ProcessException:
> > > Could not rename: /nifi/UNZ/.file1.log
> > >
> > > 10:54:56 IST
> > >
> > > ERROR
> > >
> > > 60662bb3-490a-3b47-9371-e11c12cdfa1a
> > >
> > > 161.91.234.248:7087
> > >
> > > PutFile[id=60662bb3-490a-3b47-9371-e11c12cdfa1a] Penalizing
> > > StandardFlowFileRecord[uuid=522a2401-8269-4f0f-aff5-152d25cdcefa,cla
> > > im =1430717094668-73059,offset=1533296,name=file2.log,size=28014262]
> > > and transferring to failure due to
> > > org.apache.nifi.processor.exception.ProcessException: Could not rename:
> > > /data/softwares/RS/nifi/OUT/.file2.log:
> > > org.apache.nifi.processor.exception.ProcessException: Could not rename:
> > > /nifi/OUT/.file2.log
> > >
> > >
> > >
> > > Do I have to tweak the Run schedule or keep the same minimum file
> > > age and maximum file age to overcome this issue?
> > > What might be an elegant solution in NiFi?
> > >
> > >
> > > Thanks,
> > > anup
> > >
> > > ________________________________
> > > The information contained in this message may be confidential and
> > > legally protected under applicable law. The message is intended
> > > solely for the addressee(s). If you are not the intended recipient,
> > > you are hereby notified that any use, forwarding, dissemination, or
> > > reproduction of this message is strictly prohibited and may be
> > > unlawful. If you are not the intended recipient, please contact the
> > > sender by return e-mail and destroy all copies of the original message.
> > >
> >
> >
> >
> > --
> > Corey Flowers
> > Vice President, Onyx Point, Inc
> > (410) 541-6699
> > [email protected]
> >
> > -- This account not approved for unencrypted proprietary information
> > --
> >
> > ________________________________
> > The information contained in this message may be confidential and
> > legally protected under applicable law. The message is intended solely
> > for the addressee(s). If you are not the intended recipient, you are
> > hereby notified that any use, forwarding, dissemination, or
> > reproduction of this message is strictly prohibited and may be
> > unlawful. If you are not the intended recipient, please contact the
> > sender by return e-mail and destroy all copies of the original message.
> >
> 
> 
> 
> --
> Corey Flowers
> Vice President, Onyx Point, Inc
> (410) 541-6699
> [email protected]
> 
> -- This account not approved for unencrypted proprietary information --
> 
> ________________________________
> The information contained in this message may be confidential and legally 
> protected under applicable law. The message is intended solely for the 
> addressee(s). If you are not the intended recipient, you are hereby notified 
> that any use, forwarding, dissemination, or reproduction of this message is 
> strictly prohibited and may be unlawful. If you are not the intended 
> recipient, please contact the sender by return e-mail and destroy all copies 
> of the original message.
                                          

Reply via email to