[ https://issues.apache.org/jira/browse/HBASE-9490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vasu Mariyala updated HBASE-9490: --------------------------------- Attachment: 0.96-trunk-Independent-Test-Execution.patch 0.94-Independent-Test-Execution.patch > Provide independent execution environment for small tests > --------------------------------------------------------- > > Key: HBASE-9490 > URL: https://issues.apache.org/jira/browse/HBASE-9490 > Project: HBase > Issue Type: Improvement > Reporter: Vasu Mariyala > Assignee: Vasu Mariyala > Attachments: 0.94-Independent-Test-Execution.patch, > 0.96-trunk-Independent-Test-Execution.patch > > > Some of the state related to schema metrics is stored in static variables and > since the small test cases are run in a single jvm, it is causing random > behavior in the output of the tests. > An example scenario is the test case failures in HBASE-8930 > {code} > for (SchemaMetrics cfm : tableAndFamilyToMetrics.values()) { > if (metricName.startsWith(CF_PREFIX + CF_PREFIX)) { > throw new AssertionError("Column family prefix used twice: " + > metricName); > } > {code} > The above code throws an error when the metric name starts with "cf.cf.". It > would be helpful if any one sheds some light on the reason behind checking > for "cf.cf." > The scenarios in which we would have a metric name start with "cf.cf." are as > follows (See generateSchemaMetricsPrefix method of SchemaMetrics) > a) The column family name should be "cf" > AND > b) The table name is either "" or use table name globally should be false > (useTableNameGlobally variable of SchemaMetrics). > Table name is empty only in the case of ALL_SCHEMA_METRICS which has the > column family as "". So we could rule out the > possibility of the table name being empty. > Also to note, the variables "useTableNameGlobally" and > "tableAndFamilyToMetrics" of SchemaMetrics are static and are shared across > all the tests that run in a single jvm. In our case, the profile runAllTests > has the below configuration > {code} > <surefire.firstPartForkMode>once</surefire.firstPartForkMode> > <surefire.firstPartParallel>none</surefire.firstPartParallel> > <surefire.firstPartThreadCount>1</surefire.firstPartThreadCount> > > <surefire.firstPartGroups>org.apache.hadoop.hbase.SmallTests</surefire.firstPartGroups> > {code} > Hence all of our small tests run in a single jvm and share the above > variables "useTableNameGlobally" and "tableAndFamilyToMetrics". > The reasons why the order of execution of the tests caused this failure are > as follows > a) A bunch of small tests like TestMemStore, TestSchemaConfiguredset set the > useTableNameGlobally to false. But these tests don't create tables that have > the column family name as "cf". > b) If the tests in step (a) run before the tests which create table/regions > with column family 'cf', metric names would start with "cf.cf." > c) If any of other tests, like the failed tests(TestScannerSelectionUsingTTL, > TestHFileReaderV1, TestScannerSelectionUsingKeyRange), validate schema > metrics, they would fail as the metric names start with "cf.cf." > On my local machine, I have tried to re-create the failure scenario by > changing the sure fire test configuration and creating a simple (TestSimple) > which just creates a region for the table 'testtable' and column family 'cf'. > {code} > TestSimple.java > ------------------------------------------------------------------ > @Before > public void setUp() throws Exception { > HTableDescriptor htd = new HTableDescriptor(TABLE_NAME_BYTES); > htd.addFamily(new HColumnDescriptor(FAMILY_NAME_BYTES)); > HRegionInfo info = new HRegionInfo(TABLE_NAME_BYTES, null, null, false); > this.region = HRegion.createHRegion(info, TEST_UTIL.getDataTestDir(), > TEST_UTIL.getConfiguration(), htd); > Put put = new Put(ROW_BYTES); > for (int i = 0; i < 10; i += 2) { > // puts 0, 2, 4, 6 and 8 > put.add(FAMILY_NAME_BYTES, Bytes.toBytes(QUALIFIER_PREFIX + i), i, > Bytes.toBytes(VALUE_PREFIX + i)); > } > this.region.put(put); > this.region.flushcache(); > } > @Test > public void testFilterInvocation() throws Exception { > System.out.println("testing"); > } > @After > public void tearDown() throws Exception { > HLog hlog = region.getLog(); > region.close(); > hlog.closeAndDelete(); > } > Successful run: > ------------------------------------------------------- > T E S T S > ------------------------------------------------------- > 2013-09-09 15:38:03.478 java[46562:db03] Unable to load realm mapping info > from SCDynamicStore > Running org.apache.hadoop.hbase.filter.TestSimple > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.342 sec > Running org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1 > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.085 sec > Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.217 sec > Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL > Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.618 sec > Running org.apache.hadoop.hbase.regionserver.TestMemStore > Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 26.542 sec > Results : > Tests run: 35, Failures: 0, Errors: 0, Skipped: 0 > ------------------------------------------------------------------ > Failed run order: > ------------------------------------------------------- > T E S T S > ------------------------------------------------------- > 2013-09-09 15:43:21.466 java[46890:db03] Unable to load realm mapping info > from SCDynamicStore > Running org.apache.hadoop.hbase.regionserver.TestMemStore > Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.967 sec > Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingTTL > Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.659 sec > Running org.apache.hadoop.hbase.io.hfile.TestScannerSelectionUsingKeyRange > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.15 sec > Running org.apache.hadoop.hbase.filter.TestSimple > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.031 sec > Running org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1 > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 24.883 sec > <<< FAILURE! > Results : > Failed tests: > testReadingExistingVersion1HFile(org.apache.hadoop.hbase.io.hfile.TestHFileReaderV1): > Column family prefix used twice: cf.cf.bt.Data.fsReadnumops > Tests run: 35, Failures: 1, Errors: 0, Skipped: 0 > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > {code} > In the failed scenario, the below has happened > a) TestMemStore sets the useTableNameGlobally to false > b) TestScannerSelectionUsingKeyRange, TestScannerSelectionUsingKeyRange are > successful as they don't create table with column family name "cf" > c) TestSimple creates a region for table 'testtable' and column family 'cf'. > Since useTableNameGlobally is set to false, it would create metric names that > start with "cf.cf." > d) TestHFileReaderV1 while validating metrics would fail as the metric names > start with "cf.cf." > The reason why this has been exposed due to this patch is because TestSimple > is TestInvocationRecordFilter. The executions of the build 1136, 1137 and > 1138 which have been executed after this patch have a different order of > executions when compared to the failed builds 1139 & 1140. > One simple fix to address the issue would have been to change the column > family name from "cf" to "mycf" in the TestInvocationRecordFilter. But to > avoid future occurrences of these issues, I would suggest setting the > "surefire.firstPartForkMode" to "always" similar to the settings we use while > running localTests, medium & large tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira