[ 
https://issues.apache.org/jira/browse/HBASE-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vasu Mariyala updated HBASE-9490:
---------------------------------

    Attachment: 0.96-trunk-Independent-Test-Execution.patch
                0.94-Independent-Test-Execution.patch
    
> Provide independent execution environment for small tests
> ---------------------------------------------------------
>
>                 Key: HBASE-9490
>                 URL: https://issues.apache.org/jira/browse/HBASE-9490
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vasu Mariyala
>            Assignee: Vasu Mariyala
>         Attachments: 0.94-Independent-Test-Execution.patch, 
> 0.96-trunk-Independent-Test-Execution.patch
>
>
> Some of the state related to schema metrics is stored in static variables and 
> since the small test cases are run in a single jvm, it is causing random 
> behavior in the output of the tests.
> An example scenario is the test case failures in HBASE-8930
> {code}
>     for (SchemaMetrics cfm : tableAndFamilyToMetrics.values()) {
>         if (metricName.startsWith(CF_PREFIX + CF_PREFIX)) {
>           throw new AssertionError("Column family prefix used twice: " +
>               metricName);
>         }
> {code}
> The above code throws an error when the metric name starts with "cf.cf.". It 
> would be helpful if any one sheds some light on the reason behind checking 
> for "cf.cf."
> The scenarios in which we would have a metric name start with "cf.cf." are as 
> follows (See generateSchemaMetricsPrefix method of SchemaMetrics)
> a) The column family name should be "cf"
> AND
> b) The table name is either "" or use table name globally should be false 
> (useTableNameGlobally variable of SchemaMetrics).
> Table name is empty only in the case of ALL_SCHEMA_METRICS which has the 
> column family as "". So we could rule out the
> possibility of the table name being empty.
> Also to note, the variables "useTableNameGlobally" and 
> "tableAndFamilyToMetrics" of SchemaMetrics are static and are shared across 
> all the tests that run in a single jvm. In our case, the profile runAllTests 
> has the below configuration
> {code}
>         <surefire.firstPartForkMode>once</surefire.firstPartForkMode>
>         <surefire.firstPartParallel>none</surefire.firstPartParallel>
>         <surefire.firstPartThreadCount>1</surefire.firstPartThreadCount>
>       
> <surefire.firstPartGroups>org.apache.hadoop.hbase.SmallTests</surefire.firstPartGroups>
> {code}
> Hence all of our small tests run in a single jvm and share the above 
> variables "useTableNameGlobally" and "tableAndFamilyToMetrics".
> The reasons why the order of execution of the tests caused this failure are 
> as follows
> a) A bunch of small tests like TestMemStore, TestSchemaConfiguredset set the 
> useTableNameGlobally to false. But these tests don't create tables that have 
> the column family name as "cf".
> b) If the tests in step (a) run before the tests which create table/regions 
> with column family 'cf', metric names would start with "cf.cf."
> c) If any of other tests, like the failed tests(TestScannerSelectionUsingTTL, 
> TestHFileReaderV1, TestScannerSelectionUsingKeyRange), validate schema 
> metrics, they would fail as the metric names start with "cf.cf."
> On my local machine, I have tried to re-create the failure scenario by 
> changing the sure fire test configuration and creating a simple (TestSimple) 
> which just creates a region for the table 'testtable' and column family 'cf'.
> {code}
> TestSimple.java
> ------------------------------------------------------------------
>   @Before
>   public void setUp() throws Exception {
>     HTableDescriptor htd = new HTableDescriptor(TABLE_NAME_BYTES);
>     htd.addFamily(new HColumnDescriptor(FAMILY_NAME_BYTES));
>     HRegionInfo info = new HRegionInfo(TABLE_NAME_BYTES, null, null, false);
>     this.region = HRegion.createHRegion(info, TEST_UTIL.getDataTestDir(),
>         TEST_UTIL.getConfiguration(), htd);
>     Put put = new Put(ROW_BYTES);
>     for (int i = 0; i < 10; i += 2) {
>       // puts 0, 2, 4, 6 and 8
>       put.add(FAMILY_NAME_BYTES, Bytes.toBytes(QUALIFIER_PREFIX + i), i,
>           Bytes.toBytes(VALUE_PREFIX + i));
>     }
>     this.region.put(put);
>     this.region.flushcache();
>   }
>   @Test
>   public void testFilterInvocation() throws Exception {
>     System.out.println("testing");
>   }
>   @After
>   public void tearDown() throws Exception {
>     HLog hlog = region.getLog();
>     region.close();
>     hlog.closeAndDelete();
>   }
> Successful run:
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> 2013-09-09 15:38:03.478 java[46562:db03] Unable to load realm mapping info 
> from SCDynamicStore
> Running org.apache.hadoop.hbase.filter.TestSimple
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.342 sec
> Running org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.085 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.217 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.618 sec
> Running org.apache.hadoop.hbase.regionserver.TestMemStore
> Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.542 sec
> Results :
> Tests run: 35, Failures: 0, Errors: 0, Skipped: 0
> ------------------------------------------------------------------
> Failed run order:
> -------------------------------------------------------
>  T E S T S
> -------------------------------------------------------
> 2013-09-09 15:43:21.466 java[46890:db03] Unable to load realm mapping info 
> from SCDynamicStore
> Running org.apache.hadoop.hbase.regionserver.TestMemStore
> Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.967 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.659 sec
> Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.15 sec
> Running org.apache.hadoop.hbase.filter.TestSimple
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.031 sec
> Running org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 24.883 sec 
> <<< FAILURE!
> Results :
> Failed tests:   
> testReadingExistingVersion1HFile(org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1):
>  Column family prefix used twice: cf.cf.bt.Data.fsReadnumops
> Tests run: 35, Failures: 1, Errors: 0, Skipped: 0
> [INFO] 
> ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO] 
> ------------------------------------------------------------------------
> {code}
> In the failed scenario, the below has happened
> a) TestMemStore sets the useTableNameGlobally to false
> b) TestScannerSelectionUsingKeyRange, TestScannerSelectionUsingKeyRange are 
> successful as they don't create table with column family name "cf"
> c) TestSimple creates a region for table 'testtable' and column family 'cf'. 
> Since useTableNameGlobally is set to false, it would create metric names that 
> start with "cf.cf."
> d) TestHFileReaderV1 while validating metrics would fail as the metric names 
> start with "cf.cf."
> The reason why this has been exposed due to this patch is because TestSimple 
> is TestInvocationRecordFilter. The executions of the build 1136, 1137 and 
> 1138 which have been executed after this patch have a different order of 
> executions when compared to the failed builds 1139 & 1140.
> One simple fix to address the issue would have been to change the column 
> family name from "cf" to "mycf" in the TestInvocationRecordFilter. But to 
> avoid future occurrences of these issues, I would suggest setting the 
> "surefire.firstPartForkMode" to "always" similar to the settings we use while 
> running localTests, medium & large tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to