Hi Tim,

This should be an issue. I'll file a jira to fix this.
Some MOB hfiles that are still being flushed are missed in snapshotting.
For the temporary solution, you can run 'flush tablename' before running 
'snapshot tablename snapshotname'. This can avoid this issue. Thanks again for 
your findings.

Regards,
Jingcheng

-----Original Message-----
From: Tim Robertson [mailto:timrobertson...@gmail.com] 
Sent: Friday, October 14, 2016 1:54 PM
To: dev@hbase.apache.org
Subject: Re: Data loss in MOB snapshot and clone?

Thanks for trying that Jingcheng

I'll get time to do some testing next week on this and see if I can come up 
with a reproducible test.
I can confirm for non-MOB is it all fine, and fields below the MOB threshold 
were not lost in the original process.

Cheers,
Tim

On Thu, Oct 13, 2016 at 5:31 PM, Du, Jingcheng <jingcheng...@intel.com>
wrote:

> Hi Tim,
>
> Normally after the snapshot is cloned/restored, there will be an .link 
> directory (the format is .link-{hfileName}) in the archive directory 
> of the table for both mob and non-mob tables, and the hfile of 
> {hfileName} will be archived to the same directory with the .link directory.
> The hfile won't be deleted by the file cleaner if the .link directory 
> is not empty which means this hfile is still referenced by others. And 
> the cleaners of HFileLinkCleaner and SnapshotHFileCleaner can guarantee this.
>
> I did the same test based on the code in HBase master for both mob and 
> non-mob tables, and data are not lost.
>
> Tim, would you mind trying the steps for normal tables to see if the 
> data will be lost? Just one row is enough for the table. Thanks a lot.
>
> Regards,
> Jingcheng
>
> -----Original Message-----
> From: Tim Robertson [mailto:timrobertson...@gmail.com]
> Sent: Thursday, October 13, 2016 4:48 PM
> To: dev@hbase.apache.org
> Subject: Re: Data loss in MOB snapshot and clone?
>
> Thanks Jingcheng
>
> Yes, it just references the source MOB data until MOB compaction.
>
> Based on that, I think this really is a critical bug.  It allowed the 
> MOBs to be deleted before that happened, and thus broken references 
> and data loss.  Or am I misunderstanding you please?
>
>
>
> On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng 
> <jingcheng...@intel.com>
> wrote:
>
> > Hi Tim,
> >
> > > was this running a background task to copy the MOB data when the
> > snapshot was cloned and I just deleted the source before the copy 
> > was complete?
> > The MOB data can be copied when mob compaction happens. But the MOB 
> > files should not be deleted even if they are not copied and after 
> > the source table is deleted. The archive cleaner should keep them 
> > until all the references are gone. Let me check the code again.
> >
> > > when running "snapshot and clone" it just references the source 
> > > MOB data
> > until a (?) change?
> > Yes, it just references the source MOB data until MOB compaction.
> >
> > > snapshot and clone just doesn't support MOB?
> > It supports.
> >
> > Regards,
> > Jingcheng
> >
> > -----Original Message-----
> > From: Tim Robertson [mailto:timrobertson...@gmail.com]
> > Sent: Thursday, October 13, 2016 1:56 AM
> > To: dev@hbase.apache.org
> > Subject: Re: Data loss in MOB snapshot and clone?
> >
> > Thanks - well it is now on the CDH community forum too.
> >
> > Jonathan Hsieh pretty much described what I see in his comment on
> > HBASE-12332
> > https://issues.apache.org/jira/browse/HBASE-12332?
> > focusedCommentId=14241478&page=com.atlassian.jira.
> > plugin.system.issuetabpanels:comment-tabpanel#comment-14241478
> >
> >
> >
> > On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <h...@cloudera.com> wrote:
> >
> > > Hi Tim,,
> > >
> > > Just read more details, it may not be related with the issue we 
> > > fixed (mob compaction related).
> > > I am doing a similar test to see if I can reproduce it.
> > >
> > > Thanks,
> > > Huaxiang
> > > > On Oct 12, 2016, at 10:29 AM, Tim Robertson 
> > > > <timrobertson...@gmail.com>
> > > wrote:
> > > >
> > > > Thanks Ted, Huaxiang
> > > >
> > > > I'll move this to a Cloudera forum and comment back here if it 
> > > > appears unrelated.
> > > >
> > > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <h...@cloudera.com
> > > <mailto:h...@cloudera.com>> wrote:
> > > >
> > > >> By the way, I forgot the forum link:
> > > >> http://community.cloudera.com <
> > > http://community.cloudera.com/> <
> > > >> http://community.cloudera.com/ 
> > > >> <http://community.cloudera.com/>>
> > > >>
> > > >> Thanks,
> > > >> Huaxiang
> > > >>
> > > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <h...@cloudera.com
> > <mailto:
> > > h...@cloudera.com>> wrote:
> > > >>>
> > > >>> Hi Tim,
> > > >>>
> > > >>>   I believe that it runs into an issue which is specific to 
> > > >>> cloudera
> > > >> release we fixed recently. For details, could you discuss it in 
> > > >> cdh
> > > forum?
> > > >>> Copy me(h...@cloudera.com <mailto:h...@cloudera.com> <mailto:
> > > h...@cloudera.com <mailto:h...@cloudera.com>>) in the forum so I
> > > >> can explain more there.
> > > >>>
> > > >>>   Thanks,
> > > >>>   Huaxiang
> > > >>>
> > > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhih...@gmail.com <mailto:
> > > yuzhih...@gmail.com> <mailto:
> > > >> yuzhih...@gmail.com <mailto:yuzhih...@gmail.com>>> wrote:
> > > >>>>
> > > >>>> Have you looked at HBASE-16578 ?
> > > >>>>
> > > >>>> Cheers
> > > >>>>
> > > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson <
> > > timrobertson...@gmail.com <mailto:timrobertson...@gmail.com>
> > > >> <mailto:timrobertson...@gmail.com 
> > > >> <mailto:timrobertson...@gmail.com>>>
> > > wrote:
> > > >>>>>
> > > >>>>> Hi devs,
> > > >>>>> [Had a quick chat with Lars G. about this and before opening 
> > > >>>>> a Jira I thought I'd raise it here first]
> > > >>>>>
> > > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10.
> > > >>>>>
> > > >>>>> Before I dig into this further, I'd like to just ask if 
> > > >>>>> anyone has
> > > seen
> > > >>>>> this before?
> > > >>>>>
> > > >>>>> The initial state was a table (tim_test) built with MOB 
> > > >>>>> support and a
> > > >> few
> > > >>>>> 10's million rows and 10's billions of cells.
> > > >>>>>
> > > >>>>> I wanted to rename the table to get this into production and 
> > > >>>>> did so
> > > as
> > > >>>>> follows:
> > > >>>>>
> > > >>>>> snapshot 'tim_test', 'tim_test-snapshot'
> > > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map'
> > > >>>>>
> > > >>>>> At this stage the application all looked good, and so I 
> > > >>>>> continued
> > > with:
> > > >>>>>
> > > >>>>> delete_snapshot 'tim_test-snapshot'
> > > >>>>> disable 'tim_test'
> > > >>>>> drop ‘tim_test’
> > > >>>>>
> > > >>>>> Then things went... awry and data just started dropping out 
> > > >>>>> in the
> > > app.
> > > >>>>> Before long, all MOB data seemingly is gone.
> > > >>>>>
> > > >>>>> The references in the new table MOB folder appear to point 
> > > >>>>> to the
> > > >> source
> > > >>>>> table (e.g.
> > > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bd
> > > >>>>> fe
> > > >>>>> ed
> > > >>>>> 2f5f
> > > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06-
> > > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe
> > > 8ae6318dfba2).
> > > >>>>>
> > > >>>>> The RS logs full of ERROR like:
> > > >>>>>
> > > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase.
> > > >> regionserver.HStore:
> > > >>>>> The mob file
> > > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e
> > > >> bfa2ddd66b48
> > > >>>>> could not be found in the locations 
> > > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/
> > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326
> > > >> <hdfs://ha-nn/hbase/mobdir/
> > > <hdfs://ha-nn/hbase/mobdir/>
> > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > >> 6>
> > > >> ,
> > > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/
> > > <hdfs://ha-nn/hbase/archive/data/default/tim_test/>
> > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326]
> > > <hdfs://ha-nn/hbase/archive/
> > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432
> > > >> 6]
> > > >> >
> > > >>>>>
> > > >>>>> What I don't know is:
> > > >>>>> 1) was this running a background task to copy the MOB data 
> > > >>>>> when the snapshot was cloned and I just deleted the source 
> > > >>>>> before the copy was complete?
> > > >>>>> - or
> > > >>>>> 2) when running "snapshot and clone" it just references the 
> > > >>>>> source
> > > MOB
> > > >>>>> data until a (?) change?
> > > >>>>> 3) snapshot and clone just doesn't support MOB?
> > > >>>>>
> > > >>>>> Can anyone shed some light on this easily before I dig into 
> > > >>>>> it
> > > please?
> > > >>>>>
> > > >>>>> While this situation exists (at least in 1.0.0) might it be 
> > > >>>>> good to
> > > get
> > > >>>>> info about data loss for MOB tables into the snapshot clone docs?
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Tim
> > >
> > >
> >
>

Reply via email to