[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-11 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-757950935


   > Thanks for the update, @amahussein. +1.
   
   Thanks for the review @iwasakims  and @goiri 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-07 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-756533357


   I did an experiment on the original `TestDistributedShell`
   
   if you run
   ```bash
   mvn test -Dtest=TestDistributedShell#testDSShellWithoutDomainV1_5;
   mvn test -Dtest=TestDistributedShell#testDSShellWithoutDomainV2
   ```
   
   Then the testDSShellWithoutDomainV2 will fail with the same exception. It 
seems that there was an old bug and not introduced by my changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-07 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-756531850


   @iwasakims and @goiri Do you have insights what could the problem be ?
   When you run TestDSTimelineV10 -> then TestDSTimelineV20. things will be 
successful.
   Only TestDSTimelineV20 fails to initialize the RM if it runs immediately 
after TestDSTimelineV15



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-07 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-756530407


   > The latest commit seemed to introduce errors.
   > 
   > ```
   > $ mvn test -Dtest='TestDS*'
   > ...
   > [ERROR] Errors:
   > [ERROR]   
TestDSTimelineV20>DistributedShellBaseTest.setup:159->DistributedShellBaseTest.setupInternal:476
 » YarnRuntime
   > [ERROR]   
TestDSTimelineV20>DistributedShellBaseTest.setup:159->DistributedShellBaseTest.setupInternal:476
 » YarnRuntime
   > [ERROR]   
TestDSTimelineV20>DistributedShellBaseTest.setup:159->DistributedShellBaseTest.setupInternal:476
 » YarnRuntime
   > [ERROR]   
TestDSTimelineV20>DistributedShellBaseTest.setup:159->DistributedShellBaseTest.setupInternal:476
 » YarnRuntime
   > [ERROR]   
TestDSTimelineV20>DistributedShellBaseTest.setup:159->DistributedShellBaseTest.setupInternal:476
 » YarnRuntime
   > [ERROR]   
TestDSTimelineV20>DistributedShellBaseTest.setup:159->DistributedShellBaseTest.setupInternal:476
 » YarnRuntime
   > [INFO]
   > [ERROR] Tests run: 40, Failures: 0, Errors: 6, Skipped: 0
   > ```
   
   Hey @iwasakims!
   I was looking into this failure. TestDSTimelineV20 fails when it runs after 
TestDSTimelineV15.
   The exception is thrown as follows
   ```
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.mkdirsWithRetries(FileSystemTimelineWriterImpl.java:214)
at 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.serviceStart(FileSystemTimelineWriterImpl.java:188)
   ```
   
   This could be something that missing in the configurations of the unitTest 
or a bug in either timelineserviceV1.5 or timelineserviceV2.0
   I have verified that all the configurations are maintained when the code is 
split. 
   The reason this failure appears only now could be due to the fact that 
having all the tests in one file did not let V1.5 tests run immediately before 
V2.0 tests.
   
   I haven't figured it yet, but probably this would need filing another jira.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-07 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-756136730


   > Thank for working on this, @amahussein. LGTM overall pending some nits.
   > 
   > While it is too late here, it is hard to follow which part of the 
TestDistributedShell is modified if splitting the class and refactoring are 
mixed up in a single commit. Doing one thing in one PR makes reviewing and 
cherry-picking easier.
   
   Thanks for the review @iwasakims and @goiri !
   Just quick point. Can you please use "start review" feature instead of 
sending separate comments? single comments do not indicate whether the reviewer 
is done with his reviews or not. Therefore, I could start a new commit before 
receiving all the feedback.
   Thanks again guys. I understand that the PR was not straightforward to 
review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-07 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-756134654


   > > * The implementation has lots of code redundancy.
   > > * It is inefficient in the setup and tearing down. The large percentage 
of time execution is exhausted by starting cluster and stopping the services.
   > 
   > I think there is a gap between the PR and the above description of 
[YARN-10553](https://issues.apache.org/jira/browse/YARN-10553). Just splitting 
the TestDistributedShell does not reduce code nor minicluster ramp-up time. It 
would be nice to update the JIRA description reflecting what is addressed.
   
   I will update the Jira description.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-07 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-756132738


   > Thank for working on this, @amahussein. LGTM overall pending some nits.
   > 
   > While it is too late here, it is hard to follow which part of the 
TestDistributedShell is modified if splitting the class and refactoring are 
mixed up in a single commit. Doing one thing in one PR makes reviewing and 
cherry-picking easier.
   
   That's good point @iwasakims . Sorry that I made it hard.
   After my first commit, I received feedback from Inigio to transform 
busy-waiting loops into lambda. It is my bad I haven't managed to separate 
between refactoring and splitting.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-06 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-755797255


   > > @goiri the code changes fix {{testDSShellWithEnforceExecutionType}}.
   > > The problem with the test was that it launches two containers executing 
cmd `date`. Apparently the two containers would exit fast. The unit test will 
stay blocked waiting for the containers to be exactly "2".
   > > This does not take into consideration that the containers count is 3 
including the AMContainer.
   > > The fix was to get rid of the equality in the check, and change the 
application command to `ls`
   > 
   > That makes sense, does it make sense to make the assert more general than 
an equality instead of getting rid of the equality?
   
   In `DistributedShellBaseTest.waitForContainersLaunch()`, I used inequality 
`if (containers.size() < nContainers) { return false; }`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-06 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-755767916


   CC: @iwasakims . This includes a fix to `testDSShellWithEnforceExecutionType`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on pull request #2581: YARN-10553. Refactor TestDistributedShell

2021-01-06 Thread GitBox


amahussein commented on pull request #2581:
URL: https://github.com/apache/hadoop/pull/2581#issuecomment-755765315


   @goiri the code changes fix {{testDSShellWithOpportunisticContainers}}.
   The problem with the test was that it launches two containers executing cmd 
`date`. Apparently the two containers would exit fast. The unit test will stay 
blocked waiting for the containers to be exactly "2".
   This does not take into consideration that the containers count is 3 
including the AMContainer.
   
   The fix was to get rid of the equality in the check, and change the 
application command to `ls`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org