Hi John,

To avoid reading each line/record of the input we usually divide the data
by date, e.g. all data for a day in one file. This way you can avoid
scanning data for all dates during retention. Usually this sort of
modelling is a good idea for general processing of data also as consumers
typically consume data for a time range. Sometimes it is not possible to
*produce* data in such fashion and we have to write aggregator processes to
batch data. If this is not possible to divide data by date for your use
case then there is no way to delete data for a particular date without
reading each line/record of the input file, with or without falcon.



On Mon, Jan 25, 2016 at 5:03 AM, John Smith <[email protected]> wrote:

> Ok,
> but in general to execute/or process that kind of requirement there is
> no other way as to read each line/record of the input file.
>
>
>
>
> On Mon, Jan 25, 2016 at 12:23 AM, Venkat Ramachandran
> <[email protected]> wrote:
> > It's a good idea to open a JIRA with your requirements.
> > You can either implement a custom pig job that reads and removes the
> > expired rows or you can leverage the new Lifecycle feature introduced in
> > Falcon 0.8 that allows you to provide your own plugin for retention
> > implementation.
>

Reply via email to