Hi John, To avoid reading each line/record of the input we usually divide the data by date, e.g. all data for a day in one file. This way you can avoid scanning data for all dates during retention. Usually this sort of modelling is a good idea for general processing of data also as consumers typically consume data for a time range. Sometimes it is not possible to *produce* data in such fashion and we have to write aggregator processes to batch data. If this is not possible to divide data by date for your use case then there is no way to delete data for a particular date without reading each line/record of the input file, with or without falcon.
On Mon, Jan 25, 2016 at 5:03 AM, John Smith <[email protected]> wrote: > Ok, > but in general to execute/or process that kind of requirement there is > no other way as to read each line/record of the input file. > > > > > On Mon, Jan 25, 2016 at 12:23 AM, Venkat Ramachandran > <[email protected]> wrote: > > It's a good idea to open a JIRA with your requirements. > > You can either implement a custom pig job that reads and removes the > > expired rows or you can leverage the new Lifecycle feature introduced in > > Falcon 0.8 that allows you to provide your own plugin for retention > > implementation. >
