Yes, I am planning on filing a JIRA for that shortly.

-Aleks S.

On Mon, Jan 14, 2013 at 11:01 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Thanks for the write up.
>
> Would the new tests be sub-tasks of HBASE-7290 ?
>
> Cheers
>
> On Mon, Jan 14, 2013 at 10:32 AM, Aleksandr Shulman <al...@cloudera.com
> >wrote:
>
> > Hi everyone,
> >
> > I'd like to start a thread about Cloudera's testing efforts on the
> upcoming
> > snapshots feature. This is a new feature and it's important that we
> explain
> > our testing efforts and get the community's opinion on what we'd all like
> > to see tested. My hope is that from this discussion, we can get more
> ideas
> > about what needs to be tested and gain confidence in the testing we have
> in
> > place.
> >
> > Before I begin, I'd like to introduce myself. I'm Aleks Shulman. I'm a
> > software engineer at Cloudera, working primarily on HBase. Within HBase,
> I
> > am focusing on the quality side of things. What this means to me is an
> > conversation unto itself, but in brief, I will be writing tests and test
> > frameworks. I will also be an advocate for the user experience, with
> > particular focus on API compatibility and ease-of-use.
> >
> > So let's discuss snapshots:
> > There are two main areas that should be tested and they correspond nicely
> > into what can be done as unit tests and what is better left as Jenkins
> job
> > or some other automation, unit testing and non-unit testing. We've been
> > working on this for a bit, so there is already some progress in these
> > areas:
> >
> > Unit testing - In progress or completed:
> >
> > 1. HBase Snapshots Repeatability and Idempotency Test:
> > This test class verifies proper behavior with regards performing
> > restore/clone operations on tables that themselves were created as a
> clone
> > or restored from a snapshot. This is an interesting set of cases because
> of
> > the way snapshots work. They work by pointing to the original HFiles.
> > We can use these tests to verify correctness in the file system and test
> > closure under deletion of the original table.
> >
> > 2. HBase Snapshots HTable Descriptor Test
> > This test class verifies proper behavior with regards to changes to the
> > information about the table itself before and after snapshotting in the
> > 'before' table and the 'after' table.
> >
> > 3. HBase Snapshots HFileLink Test
> > This test class inspects the correctness of the HFileLink files. It looks
> > into their permissioning, the naming convention, and how they respond
> > events. Events may include an HFile being deleted or moved.
> >
> > 4. HBase Snapshots Table Dimensions Test
> > This test class inspects operations on tables that are empty, have only
> one
> > row, have one or two CFs, etc. Basically if there is an edge scenario in
> > what the table looks like, that may affect the way it snapshotted or
> > restored/cloned.
> >
> > 5. HBase Snapshots Independence Test
> > This test should verify that all aspects of table independence are
> > guaranteed between the original table and the restored snapshot/clone.
> > This includes things like data mutations, compactions, splits, etc. It
> also
> > includes metadata changes.
> >
> > 6.  HBase Snapshots Aborted or Failed Snapshot Cleanup
> > Verifies that no cruft is left over after an attempt to snapshot a table
> > fails or is aborted. We should be able to account for every file in the
> > file system before and after.
> >
> > 7. HBase Snapshots HFile Archive Test
> > This test task is to fill in any gaps in testing of archiving as it
> relates
> > to snapshots. The snapshots relies on the HFileArchiver/LogArchiver with
> > two new cleaners (SnapshotHFile/SnapshotLog Cleaners), so we'd need to go
> > through and find out what needs to be tested between them.
> >
> > 8. HBase Snapshots Export Test
> > This test should verify that export of a snapshot to another cluster
> works
> > properly.
> > Implemented as: mvn clean test -PlocalTests
> > -Dtest=org.apache.hadoop.hbase.snapshot.TestExportSnapshot
> > However, we need to add more test around chmod, chown and checksums
> >
> > 9. HBase Snapshots Concurrent Snapshots Test
> > This test class will enforce proper behavior in situations where race
> > conditions can occur. For example, if one process attempts to restore a
> > table and another one tries to do so simultaneously, what happens? We
> need
> > to know how dangerous this could be and whether it is possible for data
> to
> > be lost.
> > Covered in HBASE-7536.
> >
> > Unit testing - Lightly tested so far, or tests we are hoping to write
> soon:
> >
> > 1. HBase Snapshots File System Correctness Tests -
> >
> > This test class verifies proper behavior with regards to what the file
> > system looks like. What the file system contains should be predictable
> > after certain events, both snapshot-specific and environment-specific.
> > For example, after a snapshot, we should expect there to be files in the
> > /hbase/.snapshot/ folder. Also, after a split occurs on the base table
> and
> > the underlying HFiles go through flux, we should be able to know
> beforehand
> > where files move. In particular, this is important to test after repeated
> > deletions and modifications. Also -- we want to make sure no cruft
> remains
> > after various operations occur.
> >
> >
> > 2. HBase Snapshots (Re)Naming Test [Note: Renaming snapshots is not
> > supported yet!]
> >
> > These tests should verify valid/invalid names for snapshots. In
> particular,
> > it should use the rename_snapshot command to attempt to rename to a table
> > that already exists, or to a snapshot that already exists (or had existed
> > but was deleted).
> > Things like special characters or semantically-meaningful characters are
> > important as well. Other things that need to be tested are what happens
> if
> > a snapshot is created, deleted, the underlying table is modified, and
> then
> > another snapshot is taken. The snapshot should contain the most recent
> > data.
> >
> >
> > 3. Snapshots logline test:
> > Verifies that the proper loglines are generated for events.
> > Manual testing for this might include making sure that spurious,
> > misleading, or unnecessary log lines are not present.
> >
> > 4. HBase Snapshots Aborted or Failed Clone or Restore
> >
> > Verifies that no cruft is left over after an attempt to restore or clone
> a
> > snapshotted table fails or is aborted and that further snapshots can take
> > place. This may be tricky and could require writing some additional
> > utilities.
> >
> > Non-unit testing:
> >
> > This area of testing is less straightforward and more exploratory in
> > nature. It's open-ended but with some direction. Particularly, we want to
> > test a lot of "what if this happens when we do something related
> > snapshots". By "this happens", I mean compactions, splits, processes
> dying,
> > master failing over to backup master, etc. By "something related to
> > snapshots", that could mean taking a snapshot, restoring a snapshot, or
> > cloning a snapshot, among other things. In addition, we can see what
> > happens as scaling factors, (e.g. the number of regions, amount of data
> per
> > node, duration of test, and frequency of compactions/splits) increases.
> > Finally, we should benchmark the time it takes to take/restore/clone a
> > snapshot and see how it changes with scale factors.
> >
> > We are testing some of these combination internally. When we see
> something
> > go awry, we fix and rerun the trial, with the expectation that the
> feature
> > becomes more stable and reliant.
> >
> > Some of the things we have tried:
> > -Long running tests: Run repeated snapshots while verifying that all is
> > well.
> >
> > -Meanness tets:
> > 1. Killing the master
> > 2. Performing a compaction
> > 3. Table enable/disable
> >
> > Feel free to follow-up with questions.
> >
> > --
> > Best Regards,
> >
> > Aleks Shulman
> > 847.814.5804
> >  Cloudera
> >
>



-- 
Best Regards,

Aleks Shulman
847.814.5804
Cloudera

Reply via email to