Hi John,

Just to add to Ajay's point of splitting the data.It can be easily done in
Pig Using Multistorage for more details please refer :

https://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html

Thanks
Praveen

On Mon, Jan 25, 2016 at 10:33 AM, Ajay Yadav <[email protected]> wrote:

> Hi John,
>
> To avoid reading each line/record of the input we usually divide the data
> by date, e.g. all data for a day in one file. This way you can avoid
> scanning data for all dates during retention. Usually this sort of
> modelling is a good idea for general processing of data also as consumers
> typically consume data for a time range. Sometimes it is not possible to
> *produce* data in such fashion and we have to write aggregator processes to
> batch data. If this is not possible to divide data by date for your use
> case then there is no way to delete data for a particular date without
> reading each line/record of the input file, with or without falcon.
>
>
>
> On Mon, Jan 25, 2016 at 5:03 AM, John Smith <[email protected]> wrote:
>
> > Ok,
> > but in general to execute/or process that kind of requirement there is
> > no other way as to read each line/record of the input file.
> >
> >
> >
> >
> > On Mon, Jan 25, 2016 at 12:23 AM, Venkat Ramachandran
> > <[email protected]> wrote:
> > > It's a good idea to open a JIRA with your requirements.
> > > You can either implement a custom pig job that reads and removes the
> > > expired rows or you can leverage the new Lifecycle feature introduced
> in
> > > Falcon 0.8 that allows you to provide your own plugin for retention
> > > implementation.
> >
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Reply via email to