Thanks Jingcheng You are probably better placed to describe the true problem than me, so please do create the issue. I'll try and find time next week to offer a unit test unless someone gets to it first.
On Fri, Oct 14, 2016 at 12:47 PM, Du, Jingcheng <jingcheng...@intel.com> wrote: > Hi Tim, > > This should be an issue. I'll file a jira to fix this. > Some MOB hfiles that are still being flushed are missed in snapshotting. > For the temporary solution, you can run 'flush tablename' before running > 'snapshot tablename snapshotname'. This can avoid this issue. Thanks again > for your findings. > > Regards, > Jingcheng > > -----Original Message----- > From: Tim Robertson [mailto:timrobertson...@gmail.com] > Sent: Friday, October 14, 2016 1:54 PM > To: dev@hbase.apache.org > Subject: Re: Data loss in MOB snapshot and clone? > > Thanks for trying that Jingcheng > > I'll get time to do some testing next week on this and see if I can come > up with a reproducible test. > I can confirm for non-MOB is it all fine, and fields below the MOB > threshold were not lost in the original process. > > Cheers, > Tim > > On Thu, Oct 13, 2016 at 5:31 PM, Du, Jingcheng <jingcheng...@intel.com> > wrote: > > > Hi Tim, > > > > Normally after the snapshot is cloned/restored, there will be an .link > > directory (the format is .link-{hfileName}) in the archive directory > > of the table for both mob and non-mob tables, and the hfile of > > {hfileName} will be archived to the same directory with the .link > directory. > > The hfile won't be deleted by the file cleaner if the .link directory > > is not empty which means this hfile is still referenced by others. And > > the cleaners of HFileLinkCleaner and SnapshotHFileCleaner can guarantee > this. > > > > I did the same test based on the code in HBase master for both mob and > > non-mob tables, and data are not lost. > > > > Tim, would you mind trying the steps for normal tables to see if the > > data will be lost? Just one row is enough for the table. Thanks a lot. > > > > Regards, > > Jingcheng > > > > -----Original Message----- > > From: Tim Robertson [mailto:timrobertson...@gmail.com] > > Sent: Thursday, October 13, 2016 4:48 PM > > To: dev@hbase.apache.org > > Subject: Re: Data loss in MOB snapshot and clone? > > > > Thanks Jingcheng > > > > Yes, it just references the source MOB data until MOB compaction. > > > > Based on that, I think this really is a critical bug. It allowed the > > MOBs to be deleted before that happened, and thus broken references > > and data loss. Or am I misunderstanding you please? > > > > > > > > On Thu, Oct 13, 2016 at 9:45 AM, Du, Jingcheng > > <jingcheng...@intel.com> > > wrote: > > > > > Hi Tim, > > > > > > > was this running a background task to copy the MOB data when the > > > snapshot was cloned and I just deleted the source before the copy > > > was complete? > > > The MOB data can be copied when mob compaction happens. But the MOB > > > files should not be deleted even if they are not copied and after > > > the source table is deleted. The archive cleaner should keep them > > > until all the references are gone. Let me check the code again. > > > > > > > when running "snapshot and clone" it just references the source > > > > MOB data > > > until a (?) change? > > > Yes, it just references the source MOB data until MOB compaction. > > > > > > > snapshot and clone just doesn't support MOB? > > > It supports. > > > > > > Regards, > > > Jingcheng > > > > > > -----Original Message----- > > > From: Tim Robertson [mailto:timrobertson...@gmail.com] > > > Sent: Thursday, October 13, 2016 1:56 AM > > > To: dev@hbase.apache.org > > > Subject: Re: Data loss in MOB snapshot and clone? > > > > > > Thanks - well it is now on the CDH community forum too. > > > > > > Jonathan Hsieh pretty much described what I see in his comment on > > > HBASE-12332 > > > https://issues.apache.org/jira/browse/HBASE-12332? > > > focusedCommentId=14241478&page=com.atlassian.jira. > > > plugin.system.issuetabpanels:comment-tabpanel#comment-14241478 > > > > > > > > > > > > On Wed, Oct 12, 2016 at 7:51 PM, Huaxiang Sun <h...@cloudera.com> > wrote: > > > > > > > Hi Tim,, > > > > > > > > Just read more details, it may not be related with the issue we > > > > fixed (mob compaction related). > > > > I am doing a similar test to see if I can reproduce it. > > > > > > > > Thanks, > > > > Huaxiang > > > > > On Oct 12, 2016, at 10:29 AM, Tim Robertson > > > > > <timrobertson...@gmail.com> > > > > wrote: > > > > > > > > > > Thanks Ted, Huaxiang > > > > > > > > > > I'll move this to a Cloudera forum and comment back here if it > > > > > appears unrelated. > > > > > > > > > > On Wed, Oct 12, 2016 at 7:24 PM, Huaxiang Sun <h...@cloudera.com > > > > <mailto:h...@cloudera.com>> wrote: > > > > > > > > > >> By the way, I forgot the forum link: > > > > >> http://community.cloudera.com < > > > > http://community.cloudera.com/> < > > > > >> http://community.cloudera.com/ > > > > >> <http://community.cloudera.com/>> > > > > >> > > > > >> Thanks, > > > > >> Huaxiang > > > > >> > > > > >>> On Oct 12, 2016, at 10:10 AM, Huaxiang Sun <h...@cloudera.com > > > <mailto: > > > > h...@cloudera.com>> wrote: > > > > >>> > > > > >>> Hi Tim, > > > > >>> > > > > >>> I believe that it runs into an issue which is specific to > > > > >>> cloudera > > > > >> release we fixed recently. For details, could you discuss it in > > > > >> cdh > > > > forum? > > > > >>> Copy me(h...@cloudera.com <mailto:h...@cloudera.com> <mailto: > > > > h...@cloudera.com <mailto:h...@cloudera.com>>) in the forum so I > > > > >> can explain more there. > > > > >>> > > > > >>> Thanks, > > > > >>> Huaxiang > > > > >>> > > > > >>>> On Oct 12, 2016, at 8:13 AM, Ted Yu <yuzhih...@gmail.com > <mailto: > > > > yuzhih...@gmail.com> <mailto: > > > > >> yuzhih...@gmail.com <mailto:yuzhih...@gmail.com>>> wrote: > > > > >>>> > > > > >>>> Have you looked at HBASE-16578 ? > > > > >>>> > > > > >>>> Cheers > > > > >>>> > > > > >>>>> On Oct 12, 2016, at 8:02 AM, Tim Robertson < > > > > timrobertson...@gmail.com <mailto:timrobertson...@gmail.com> > > > > >> <mailto:timrobertson...@gmail.com > > > > >> <mailto:timrobertson...@gmail.com>>> > > > > wrote: > > > > >>>>> > > > > >>>>> Hi devs, > > > > >>>>> [Had a quick chat with Lars G. about this and before opening > > > > >>>>> a Jira I thought I'd raise it here first] > > > > >>>>> > > > > >>>>> We have just experienced data loss in HBase 1.0.0-cdh5.4.10. > > > > >>>>> > > > > >>>>> Before I dig into this further, I'd like to just ask if > > > > >>>>> anyone has > > > > seen > > > > >>>>> this before? > > > > >>>>> > > > > >>>>> The initial state was a table (tim_test) built with MOB > > > > >>>>> support and a > > > > >> few > > > > >>>>> 10's million rows and 10's billions of cells. > > > > >>>>> > > > > >>>>> I wanted to rename the table to get this into production and > > > > >>>>> did so > > > > as > > > > >>>>> follows: > > > > >>>>> > > > > >>>>> snapshot 'tim_test', 'tim_test-snapshot' > > > > >>>>> clone_snapshot 'tim_test-snapshot', 'prod_b_map' > > > > >>>>> > > > > >>>>> At this stage the application all looked good, and so I > > > > >>>>> continued > > > > with: > > > > >>>>> > > > > >>>>> delete_snapshot 'tim_test-snapshot' > > > > >>>>> disable 'tim_test' > > > > >>>>> drop ‘tim_test’ > > > > >>>>> > > > > >>>>> Then things went... awry and data just started dropping out > > > > >>>>> in the > > > > app. > > > > >>>>> Before long, all MOB data seemingly is gone. > > > > >>>>> > > > > >>>>> The references in the new table MOB folder appear to point > > > > >>>>> to the > > > > >> source > > > > >>>>> table (e.g. > > > > >>>>> /hbase/mobdir/data/default/prod_b_map/ba42a2e8e9b669d9fc85bd > > > > >>>>> fe > > > > >>>>> ed > > > > >>>>> 2f5f > > > > >> 2a/EPSG_4326/tim_test=14bf5f1737ac65c34615ed97c0b7de06- > > > > >> d41d8cd98f00b204e9800998ecf8427e20161006ff8baa70d21f408caefe > > > > 8ae6318dfba2). > > > > >>>>> > > > > >>>>> The RS logs full of ERROR like: > > > > >>>>> > > > > >>>>> 2016-10-12 15:19:14,640 ERROR org.apache.hadoop.hbase. > > > > >> regionserver.HStore: > > > > >>>>> The mob file > > > > >>>>> d41d8cd98f00b204e9800998ecf8427e20161006b59865f80e604781a79e > > > > >> bfa2ddd66b48 > > > > >>>>> could not be found in the locations > > > > >>>>> [hdfs://ha-nn/hbase/mobdir/data/default/tim_test/ > > > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326 > > > > >> <hdfs://ha-nn/hbase/mobdir/ > > > > <hdfs://ha-nn/hbase/mobdir/> > > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432 > > > > >> 6> > > > > >> , > > > > >>>>> hdfs://ha-nn/hbase/archive/data/default/tim_test/ > > > > <hdfs://ha-nn/hbase/archive/data/default/tim_test/> > > > > >> 14bf5f1737ac65c34615ed97c0b7de06/EPSG_4326] > > > > <hdfs://ha-nn/hbase/archive/ > > > > >> data/default/tim_test/14bf5f1737ac65c34615ed97c0b7de06/EPSG_432 > > > > >> 6] > > > > >> > > > > > >>>>> > > > > >>>>> What I don't know is: > > > > >>>>> 1) was this running a background task to copy the MOB data > > > > >>>>> when the snapshot was cloned and I just deleted the source > > > > >>>>> before the copy was complete? > > > > >>>>> - or > > > > >>>>> 2) when running "snapshot and clone" it just references the > > > > >>>>> source > > > > MOB > > > > >>>>> data until a (?) change? > > > > >>>>> 3) snapshot and clone just doesn't support MOB? > > > > >>>>> > > > > >>>>> Can anyone shed some light on this easily before I dig into > > > > >>>>> it > > > > please? > > > > >>>>> > > > > >>>>> While this situation exists (at least in 1.0.0) might it be > > > > >>>>> good to > > > > get > > > > >>>>> info about data loss for MOB tables into the snapshot clone > docs? > > > > >>>>> > > > > >>>>> Thanks, > > > > >>>>> Tim > > > > > > > > > > > > > >