[ 
https://issues.apache.org/jira/browse/HADOOP-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15408591#comment-15408591
 ] 

Chen Liang edited comment on HADOOP-13439 at 8/4/16 10:35 PM:
--------------------------------------------------------------

Thanks for such detailed feedbacks [~iwasakims], it helped a lot!

Given what you mentioned here, I did some more investigation, and I think I 
verified that it is possible for one test to read config created by another. 

The two classes both have this line:
 {code}.save(TestMetricsConfig.getTestFilename("hadoop-metrics2-test"));{code}
Which writes the config to the exactly same file 
"hadoop-metrics2-test.properties" first. Then both classes have the call:
{code}ms.start();{code}
Which reads the file. So there can be race condition from the write to the 
read. This error can be reproduced by adding delay between .save() and 
ms.start() call, and run the tests at the same time.

Another problem with this race condition is the missing metric error of 
checkMetrics() as mentioned in HADOOP-12588. For example, one class might have 
the following in its config
{code}.add("test.source.s1.metric.filter.exclude", "X*"){code}
which configures to remove Xxx from the metric sources. If the other class is 
expecting this metric, we would end up getting this error:
{code}java.lang.AssertionError: Missing metrics: test.s1rec.Xxx{code}
One solution I'm thinking of is to separate the config file they are writing to 
by introducing a unique prefix for each test case, such that each test uses a 
different config file, similar to 
RollingFileSystemSinkTestBase#initMetricsSystem. The pros is that we are still 
able to run test cases in parallel without race condition, and the cons is that 
there might be a couple tens more .properties generated. What do you think 
about this approach? Or do you have any suggestions?


was (Author: vagarychen):
Thanks for such detailed feedbacks [~iwasakims], it helped a lot!

Given what you mentioned here, I did some more investigation, and I think I 
verified that it is possible for one test to read config created by another. 

The two classes both have this line:
 {code}.save(TestMetricsConfig.getTestFilename("hadoop-metrics2-test"));{code}
Which writes the config to the exactly same file 
"hadoop-metrics2-test.properties" first. Then both classes have the call:
{code}ms.start();{code}
Which reads the file. So there can be race condition from the write to the 
read. This error can be reproduced by adding delay between .save() and 
ms.start() call, and run the tests at the same time.

Another problem with this race condition is the missing metric error of 
checkMetrics() as mentioned in HADOOP-12588. For example, one class might have 
the following in its config
{code}.add("test.source.s1.metric.filter.exclude", "X*"){code}
which configures to remove Xxx from the metric sources. If the other class is 
expecting this metric, we would end up getting this error:
{code}java.lang.AssertionError: Missing metrics: test.s1rec.Xxx{code}
One solution I'm thinking of is to separate the config file they are writing to 
by introducing a unique prefix for each test case, such as in 
RollingFileSystemSinkTestBase#initMetricsSystem. The pros is that we are still 
able to run test cases in parallel without race condition, and the cons is that 
there might be a couple tens more .properties generated. What do you think 
about this approach? Or do you have any suggestions?

> Fix race between TestMetricsSystemImpl and TestGangliaMetrics
> -------------------------------------------------------------
>
>                 Key: HADOOP-13439
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13439
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: test
>            Reporter: Masatake Iwasaki
>            Assignee: Chen Liang
>            Priority: Minor
>
> TestGangliaMetrics#testGangliaMetrics2 set *.period to 120 but 8 was used.
> {noformat}
> 2016-06-27 15:21:31,480 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:startTimer(375)) - Scheduled snapshot period at 8 
> second(s).
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to