Re: Review Request 39711: Lifecycle does not allow feed with frequency greater than days(1)

Ajay Yadava Wed, 28 Oct 2015 21:40:43 -0700


> On Oct. 28, 2015, 11:46 p.m., Sowmya Ramesh wrote:
> > common/src/main/java/org/apache/falcon/entity/FeedHelper.java, line 813
> > <https://reviews.apache.org/r/39711/diff/4/?file=1111629#file1111629line813>
> >
> >     Sorry, for multiple comments. I didn't review Lifecycle feature so I 
> > didn't have the complete picture.
> >     
> >     Frequency in the retention stage is not mandatory and if teh frequency 
> > is not set by user then
> >     1> If feed frequency < 6 hrs its set to 6 hrs
> >     2> If its > 6 hrs its set to feed frequency
> >     
> >     Shouldn't it fallbaack to current behavior for retenting the data? < 
> > 6hrs set to 6 hrs and > 6hrs set to 1 day?
> >     
> >     This is required for 2 reasons
> >     1> Current understanding of users is that if feed frequency > 6 hrs , 
> > retention job will run every day. We shouldn't deviate from this.
> >     
> >     2> I also spoke with Venkatesh about why was it set to 1 day. He 
> > mentioned in case retention fails and reruns fail too we don't want to keep 
> > the data till it runs next time if feed frequency is used. This can cause 
> > SEC retention vioalation and also cause memory issues if feed frequency is 
> > say one year. If job runs every day it catches up for the scenario 
> > mentioned above.
> >     
> >     Any specific reason to change the old behavior?

Sowmya and I had an offline discussion to address this. Updating the gist here.

We try to fall back to old behaviour as much as possible but it fails the extra 
validations in lifecycle retention. The current behaviour is to retain old 
behaviour as much as possible within new constraints (specifically retention 
shouldn't be more frequent than data availability).

Keeping retention frequency as a fallback to retries is not the best thing to 
do in such scenarios. If it fails all retries there is no guarantee that it 
will succeed next time as well. It means system is not able to recover on it's 
own and needs manual intervention. Best way to deal with such scenarios is to 
have appropriate monitoring and alerting (e.g. they can now have email alerts 
on failure of retention workflow).

The said kind of set up also fails for a majority of frequencies e.g. minutely, 
hourly, daily (all apart from roll ups like monthly) will not ensure the above 
guarantee from the reasoning mentioned. So the guarantee is already broken, if 
it was ever the intent. 

Also, the above behaviour is a wastage of resources  99% of the times to solve 
for that rare 1% case. Coordinators will run and they will have nothing to do.

- Ajay

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39711/#review104372
-----------------------------------------------------------

On Oct. 28, 2015, 6:04 p.m., Ajay Yadava wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39711/
> -----------------------------------------------------------
> 
> (Updated Oct. 28, 2015, 6:04 p.m.)
> 
> 
> Review request for Falcon.
> 
> 
> Bugs: FALCON-1560
>     https://issues.apache.org/jira/browse/FALCON-1560
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> Lifecycle does not allow feed with frequency greater than days(1)
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/entity/FeedHelper.java 5c252a8 
>   common/src/test/java/org/apache/falcon/entity/FeedHelperTest.java 4020d36 
>   
> common/src/test/java/org/apache/falcon/entity/parser/FeedEntityParserTest.java
>  905be68 
> 
> Diff: https://reviews.apache.org/r/39711/diff/
> 
> 
> Testing
> -------
> 
> Added unit test for the scenarios.
> 
> 
> Thanks,
> 
> Ajay Yadava
> 
>

Re: Review Request 39711: Lifecycle does not allow feed with frequency greater than days(1)

Reply via email to