TimePartition is showing you when the data showed up. I think SeqID + StreamName is the right thing to match on -- if the data is re-collected later, but it's the same data, yeah, you want to treat it as duplicate.
On Wed, Mar 28, 2012 at 12:56 AM, IvyTang <[email protected]> wrote: > Thanks to the simple archiver , we do remove almost all the duplicate > chunks. > > But we found that there are still few ,very few duplicate chunks left . > > And strangely , these chunks's key are't the same. The DataType,StreamName > and SeqId are the same , but the TimePartition are different. The log in > these chunks are the same. > > Could we just distinguish the duplicate chunks using the DataType,StreamName > and SeqId ? What's the TimePartition meaning for? > > Thanks! > > > -- > Best regards, > > Ivy Tang > > > -- Ari Rabkin [email protected] UC Berkeley Computer Science Department
