[jira] [Commented] (HADOOP-13927) ADLS TestAdlContractRootDirLive.testRmNonEmptyRootDirNonRecursive failed

2017-01-03 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795923#comment-15795923
 ] 

Tony Wu commented on HADOOP-13927:
--

Hi [~jzhuge], this error started appearing after HADOOP-13900. After reverting 
the {{azure-data-lake-store-sdk}} version to {{2.0.4-SNAPSHOT}} the error went 
away.

> ADLS TestAdlContractRootDirLive.testRmNonEmptyRootDirNonRecursive failed
> 
>
> Key: HADOOP-13927
> URL: https://issues.apache.org/jira/browse/HADOOP-13927
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/adl, test
>Affects Versions: 3.0.0-alpha2
>Reporter: John Zhuge
>Priority: Critical
>
> {noformat}
> Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 18.095 sec 
> <<< FAILURE! - in org.apache.hadoop.fs.adl.live.TestAdlContractRootDirLive
> testRmNonEmptyRootDirNonRecursive(org.apache.hadoop.fs.adl.live.TestAdlContractRootDirLive)
>   Time elapsed: 1.085 sec  <<< FAILURE!
> java.lang.AssertionError: non recursive delete should have raised an 
> exception, but completed with exit code false
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmNonEmptyRootDirNonRecursive(AbstractContractRootDirectoryTest.java:132)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13897) TestAdlFileContextMainOperationsLive#testGetFileContext1 fails consistently

2016-12-13 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15745554#comment-15745554
 ] 

Tony Wu commented on HADOOP-13897:
--

Hi [~steve_l], thanks a lot for your response.

{{TestAdlFileContextMainOperationsLive}} does setup the FC to use ADL as 
default FS:
{code}
public class TestAdlFileContextMainOperationsLive
extends FileContextMainOperationsBaseTest {
...
  @Override
  public void setUp() throws Exception {
Configuration conf = AdlStorageConfiguration.getConfiguration();
String fileSystem = conf.get(KEY_FILE_SYSTEM);
if (fileSystem == null || fileSystem.trim().length() == 0) {
  throw new Exception("Default file system not configured.");
}
URI uri = new URI(fileSystem);
FileSystem fs = AdlStorageConfiguration.createStorageConnector();
fc = FileContext.getFileContext(
new DelegateToFileSystem(uri, fs, conf, fs.getScheme(), false) {
}, conf);
super.setUp();
  }
{code}

However the test that's failing is creating a second FC, with default config:
{code}
  @Test
  /*
   * Test method
   *  org.apache.hadoop.fs.FileContext.getFileContext(AbstractFileSystem)
   */
  public void testGetFileContext1() throws IOException {
final Path rootPath = getTestRootPath(fc, "test");
AbstractFileSystem asf = fc.getDefaultFileSystem();
// create FileContext using the protected #getFileContext(1) method:
<<
FileContext fc2 = FileContext.getFileContext(asf); << 2nd 
FC created
>>
// Now just check that this context can do something reasonable:
final Path path = new Path(rootPath, "zoo");
FSDataOutputStream out = fc2.create(path, EnumSet.of(CREATE),
Options.CreateOpts.createParent());
out.close();
Path pathResolved = fc2.resolvePath(path);
assertEquals(pathResolved.toUri().getPath(), path.toUri().getPath());
  }
{code}

{{FileContext.getFileContext()}} uses the default configuration:
{code}
  /**
   * Create a FileContext for specified file system using the default config.
   * 
   * @param defaultFS
   * @return a FileContext with the specified AbstractFileSystem
   * as the default FS.
   */
  protected static FileContext getFileContext(
final AbstractFileSystem defaultFS) {
return getFileContext(defaultFS, new Configuration());
  }
{code}

It looks like {{TestAdlFileContextMainOperationsLive#testGetFileContext1}} 
should be using {{AdlStorageConfiguration.getConfiguration()}} to create 
{{fc2}}. Or maybe {{testGetFileContext1}} should be omitted as the protected 
API {{FileContext#getFileContext(final AbstractFileSystem defaultFS)}} does not 
appear to be used anywhere else but this test case.

The rest of {{TestAdlFileContextMainOperationsLive}} and all other 
{{hadoop-azure-datalake}} live tests passes successfully during my test.

> TestAdlFileContextMainOperationsLive#testGetFileContext1 fails consistently
> ---
>
> Key: HADOOP-13897
> URL: https://issues.apache.org/jira/browse/HADOOP-13897
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.0.0-alpha2
>Reporter: Tony Wu
>
> {{TestAdlFileContextMainOperationsLive#testGetFileContext1}} (this is a live 
> test against Azure Data Lake Store) fails consistently with the following 
> error:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 11.55 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive
> testGetFileContext1(org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive)
>   Time elapsed: 11.229 sec  <<< ERROR!
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:328)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
>   at 
> org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:328)
>   at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:320)
>   at 

[jira] [Commented] (HADOOP-13897) TestAdlFileContextMainOperationsLive#testGetFileContext1 fails consistently

2016-12-12 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743637#comment-15743637
 ] 

Tony Wu commented on HADOOP-13897:
--

It appears the cause is that the following test 
({{FileContextMainOperationsBaseTest#testGetFileContext1}}) is using the 
default configuration rather than the ADLS test specific config 
{{hadoop-azure-datalake/src/test/resources/contract-test-options.xml}}.
{code}
  @Test
  /*
   * Test method
   *  org.apache.hadoop.fs.FileContext.getFileContext(AbstractFileSystem)
   */
  public void testGetFileContext1() throws IOException {
final Path rootPath = getTestRootPath(fc, "test");
AbstractFileSystem asf = fc.getDefaultFileSystem();
// create FileContext using the protected #getFileContext(1) method:
FileContext fc2 = FileContext.getFileContext(asf); // this uses the 
default config
// Now just check that this context can do something reasonable:
final Path path = new Path(rootPath, "zoo");
FSDataOutputStream out = fc2.create(path, EnumSet.of(CREATE),
Options.CreateOpts.createParent());
out.close();
Path pathResolved = fc2.resolvePath(path);
assertEquals(pathResolved.toUri().getPath(), path.toUri().getPath());
  }
{code}

The default config does not have {{dfs.adls.oauth2.access.token.provider.type}} 
defined.

> TestAdlFileContextMainOperationsLive#testGetFileContext1 fails consistently
> ---
>
> Key: HADOOP-13897
> URL: https://issues.apache.org/jira/browse/HADOOP-13897
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.0.0-alpha2
>Reporter: Tony Wu
>
> {{TestAdlFileContextMainOperationsLive#testGetFileContext1}} (this is a live 
> test against Azure Data Lake Store) fails consistently with the following 
> error:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 11.55 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive
> testGetFileContext1(org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive)
>   Time elapsed: 11.229 sec  <<< ERROR!
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:328)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
>   at 
> org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:328)
>   at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:320)
>   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
>   at org.apache.hadoop.fs.FileContext.create(FileContext.java:685)
>   at 
> org.apache.hadoop.fs.FileContextMainOperationsBaseTest.testGetFileContext1(FileContextMainOperationsBaseTest.java:1350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
>   at 

[jira] [Created] (HADOOP-13897) TestAdlFileContextMainOperationsLive#testGetFileContext1 fails consistently

2016-12-12 Thread Tony Wu (JIRA)
Tony Wu created HADOOP-13897:


 Summary: TestAdlFileContextMainOperationsLive#testGetFileContext1 
fails consistently
 Key: HADOOP-13897
 URL: https://issues.apache.org/jira/browse/HADOOP-13897
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.0.0-alpha2
Reporter: Tony Wu


{{TestAdlFileContextMainOperationsLive#testGetFileContext1}} (this is a live 
test against Azure Data Lake Store) fails consistently with the following error:
{noformat}
---
 T E S T S
---
Running org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 11.55 sec <<< 
FAILURE! - in org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive
testGetFileContext1(org.apache.hadoop.fs.adl.live.TestAdlFileContextMainOperationsLive)
  Time elapsed: 11.229 sec  <<< ERROR!
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:331)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:328)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1857)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:328)
at org.apache.hadoop.fs.FileContext.getFSofPath(FileContext.java:320)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:85)
at org.apache.hadoop.fs.FileContext.create(FileContext.java:685)
at 
org.apache.hadoop.fs.FileContextMainOperationsBaseTest.testGetFileContext1(FileContextMainOperationsBaseTest.java:1350)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 

[jira] [Commented] (HADOOP-12875) [Azure Data Lake] Support for contract test and unit test cases

2016-03-28 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214317#comment-15214317
 ] 

Tony Wu commented on HADOOP-12875:
--

Hi [~vishwajeet.dusane],

Thanks a lot for filing a separate JIRA, isolating out the ADL unit tests and 
providing a FS contract test! This is very helpful. 

I did a quick scan of the patch and have the following comments regarding the 
newly added FS contract tests:

Instead of having the following change in various contract test implementations:
{code:java}
+  @Override
+  protected boolean isSupported(String feature) throws IOException {
+return true;
+  }
{code}
It's better to define a {{adls.xml}} file where you specify the file system 
behavior. You can refer to {{wasb.xml}} as an example. This page 
[here|https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/filesystem/testing.html]
 also describes the details of FS contract tests and best practices of adding 
new ones.

Instead of adding {{@Ignore}} to test cases unsupported by ADL, I believe you 
can use {{ContractTestUtils#unsupported("...")}} or 
{{ContractTestUtils#unsupported("skip")}} instead.


> [Azure Data Lake] Support for contract test and unit test cases
> ---
>
> Key: HADOOP-12875
> URL: https://issues.apache.org/jira/browse/HADOOP-12875
> Project: Hadoop Common
>  Issue Type: Test
>  Components: fs, fs/azure, tools
>Reporter: Vishwajeet Dusane
>Assignee: Vishwajeet Dusane
> Attachments: Hadoop-12875-001.patch
>
>
> This JIRA describes contract test and unit test cases support for azure data 
> lake file system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12666) Support Microsoft Azure Data Lake - as a file system in Hadoop

2016-03-24 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15210903#comment-15210903
 ] 

Tony Wu commented on HADOOP-12666:
--

Hi [~vishwajeet.dusane], 

Thank you for posting the semantics document, meeting minutes and a new patch. 
It was really helpful.

*General comments:*
# Regarding the semantics document. It will be great if you can also include 
more information on how ADL backend can "lock" a file for write. From the doc 
and our meeting discussion, there seems to be 2 ways (please confirm):
*# File lease (which you included in the doc). Used by {{createNonRecursive()}}
*# Maintain connection to the backend (you mentioned during meeting). Used by 
{{append()}}
*** In this case do we assume ADL backend tracks which file is opened for write 
by keeping track of HTTP connections to the file?
# In {{hadoop-tools/hadoop-azure-datalake/src/site/markdown/index.md}}. You 
mentioned:
{quote}User and group information returned as ListStatus and GetFileStatus is 
in form of GUID associated in Azure Active Directory.{quote}
There are applications which verifies the file ownership and it will fail 
because of this. Also I believe this means commands like {{hdfs dfs -ls }} will return GUID as user & group. It may not be that readable. Can you 
comment on how users should handle this?
** I see there is a {{adl.debug.override.localuserasfileowner}} config which 
will override the user with local client user and group with "hdfs". But this 
workaround is probably not for actual usage.

*Specific comments:*

1. Can you explain why {{flushAsync()}} is used in {{BatchAppendOutputStream}}? 
It seems like {{flushAsync()}} is only used in a particular case where there is 
some data left in the buffer from previous writes and that combined with the 
current write will cross the buffer boundary. I'm not sure why this particular 
flush has to be async.

Consider the following case:
# {{BatchAppendOutputStream#flushAsync()}} returns. {{flushAsync()}} sends the 
sync job to thread and sets offset to 0:
{code:java}
private void flushAsync() throws IOException {
  if (offset > 0) {
waitForOutstandingFlush();

// Submit the new flush task to the executor
flushTask = EXECUTOR.submit(new CommitTask(data, offset, eof));

// Get a new internal buffer for the user to write
data = getBuffer();
offset = 0;
  }
}
{code}
# BatchAppendOutputStream#write() returns.
# Client closes outputStream.
# BatchAppendOutputStream#close() checks to see if there's anything to flush by 
checking offset. And in this case there is nothing to flush because offset is 
set to 0 earlier.
{code:java}
  boolean flushedSomething = false;
  if (hadError) {
// No point proceeding further since the error has occurred and
// stream would be required to upload again.
return;
  } else {
flushedSomething = offset > 0;
flush();
  }
{code}
# {{BatchAppendOutputStream#close()}} does not wait for the async flush job to 
complete. After this point if {{flushAsync()}} hit any error, this error will 
be lost and client will not be aware of it.
# If client then starts to write (append) to the same file with a new stream, 
this new write also will not wait for the previous async job to complete 
because {{flushTask}} is internal to {{BatchAppendOutputStream}}. It might be 
possible for the 2 writes to reach the backend in reverse order.

IMHO if this async flush is necessary then {{EXECUTOR}} should be created 
inside {{BatchAppendOutputStream}} and shutdown when the stream is closed. 
Currently {{EXECUTOR}} lives in {{PrivateAzureDataLakeFileSystem}}.

2. {{EXECUTOR}} in {{PrivateAzureDataLakeFileSystem}} is not shutdown properly.

3. Stream is closed check is missing in {{BatchAppendOutputStream}}. This check 
is present for {{BatchByteArrayInputStream}}.


> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, tools
>Reporter: Vishwajeet Dusane
>Assignee: Vishwajeet Dusane
> Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf, 
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch, 
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch, 
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-1.patch
>
>   Original Estimate: 336h
>  Time Spent: 336h
>  Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft 
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing 
> Hadoop applications 

[jira] [Commented] (HADOOP-12876) [Azure Data Lake] Support for process level FileStatus cache to optimize GetFileStatus frequent opeations

2016-03-21 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204524#comment-15204524
 ] 

Tony Wu commented on HADOOP-12876:
--

Hi [~vishwajeet.dusane], Thanks a lot for creating a separate JIRA to discuss 
the file status cache. I noticed you have removed the relevant code (i.e. 
{{FileStatusCacheManager}}) from the latest patch in HADOOP-12666. Do you mind 
reposting the cache implementation here?

I think you can post a patch for this JIRA based off the latest patch for 
HADOOP-12666.


> [Azure Data Lake] Support for process level FileStatus cache to optimize 
> GetFileStatus frequent opeations
> -
>
> Key: HADOOP-12876
> URL: https://issues.apache.org/jira/browse/HADOOP-12876
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/azure, tools
>Reporter: Vishwajeet Dusane
>Assignee: Vishwajeet Dusane
>
> Add support to cache GetFileStatus and ListStatus response locally for 
> limited period of time. Local cache for limited period of time would optimize 
> number of calls for GetFileStatus operation.
> One of the example  where local limited period cache would be useful - 
> terasort ListStatus on input directory follows with GetFileStatus operation 
> on each file within directory. For 2048 input files in a directory would save 
> 2048 GetFileStatus calls during start up (Using the ListStatus response to 
> cache FileStatus instances).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-11-09 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Attachment: HADOOP-12482.006.patch

In v6 patch:
* Fix a missed {{printStackTrace()}} in previous patch and convert it to 
{{LOG.error()}}.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch, HADOOP-12482.004.patch, HADOOP-12482.005.patch, 
> HADOOP-12482.006.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12482) Race condition in JMX cache update

2015-11-09 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996783#comment-14996783
 ] 

Tony Wu commented on HADOOP-12482:
--

Hi [~ozawa] & [~eddyxu],

Please kindly take a look at the latest patch (v4) and let me know if you have 
any comments.

Thanks,
Tony

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch, HADOOP-12482.004.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12482) Race condition in JMX cache update

2015-11-09 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997959#comment-14997959
 ] 

Tony Wu commented on HADOOP-12482:
--

Thanks a lot to [~ozawa] for your thorough review. Please kindly take a look at 
the new patch.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch, HADOOP-12482.004.patch, HADOOP-12482.005.patch, 
> HADOOP-12482.006.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-11-09 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Attachment: HADOOP-12482.005.patch

In v5 patch:
* Addressed [~eddyxu]'s review comments: rename {{TsSource}} and use 
{{LOG.error}} for printing exception.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch, HADOOP-12482.004.patch, HADOOP-12482.005.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12482) Race condition in JMX cache update

2015-11-05 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992260#comment-14992260
 ] 

Tony Wu commented on HADOOP-12482:
--

The failed test hadoop.ipc.TestDecayRpcScheduler is not relevant to the change.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch, HADOOP-12482.004.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12482) Race condition in JMX cache update

2015-11-02 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986170#comment-14986170
 ] 

Tony Wu commented on HADOOP-12482:
--

Thanks [~eddyxu] for reviewing my patch. Please take a look at the update patch 
and kindly let me know if you have any further comments.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-11-02 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Attachment: HADOOP-12482.004.patch

In v4 patch:
* Addressed [~ozawa] and [~eddyxu]'s review comments.
* Reworked the test to make use of {{ScheduledExecutorService 
#scheduleAtFixedRate}}.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch, HADOOP-12482.004.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-10-30 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Attachment: HADOOP-12482.003.patch

In v3 patch:
* Address review comments by fixing a typo.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch, 
> HADOOP-12482.003.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12482) Race condition in JMX cache update

2015-10-30 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982967#comment-14982967
 ] 

Tony Wu commented on HADOOP-12482:
--

Oops. I will correct it and post a new patch. Thanks a lot for looking at the 
patch!


> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-10-29 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Attachment: HADOOP-12482.002.patch

In v2 patch:
* Rebased to latest trunk.

Manually verified the reported failed test cases again (on Linux and with 
native option) and they pass without error:
{code}
$ mvn 
-Dtest=TestMetricsSourceAdapter,TestDecayRpcScheduler,TestCopyPreserveFlag,TestReloadingX509TrustManager,TestGangliaMetrics
 test -Pnative
...
---
 T E S T S
---
Running org.apache.hadoop.fs.shell.TestCopyPreserveFlag
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.77 sec - in 
org.apache.hadoop.fs.shell.TestCopyPreserveFlag
Running org.apache.hadoop.metrics2.impl.TestGangliaMetrics
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.504 sec - in 
org.apache.hadoop.metrics2.impl.TestGangliaMetrics
Running org.apache.hadoop.metrics2.impl.TestMetricsSourceAdapter
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.447 sec - in 
org.apache.hadoop.metrics2.impl.TestMetricsSourceAdapter
Running org.apache.hadoop.ipc.TestDecayRpcScheduler
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.98 sec - in 
org.apache.hadoop.ipc.TestDecayRpcScheduler
Running org.apache.hadoop.security.ssl.TestReloadingX509TrustManager
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.538 sec - in 
org.apache.hadoop.security.ssl.TestReloadingX509TrustManager

Results :

Tests run: 27, Failures: 0, Errors: 0, Skipped: 0
{code}

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch, HADOOP-12482.002.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12482) Race condition in JMX cache update

2015-10-27 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14976619#comment-14976619
 ] 

Tony Wu commented on HADOOP-12482:
--

Manually ran the failed tests on Linux using JDK 1.7, all tests pass without 
error.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12482) Race condition in JMX cache update

2015-10-26 Thread Tony Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974504#comment-14974504
 ] 

Tony Wu commented on HADOOP-12482:
--

Manually ran the failed unit tests on OS X using JDK 1.7 and they all pass 
without error.

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-10-23 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Status: Patch Available  (was: Open)

> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch
>
>
> updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
> race condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-10-15 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Attachment: HADOOP-12482.001.patch

In this patch:
* Addressed the problem by adding a new variable to track lastRecs has been 
cleared by updateJmxCache(). 
* Change updateJmxCache() to use the new variable to track when to refresh the 
info cache.
* Add a test to simulate multiple thread accessing/updating the cache and make 
sure:
** The new test does capture the original problem.
** The hew test verifies the problem is fixed.

Also run a few additional tests and make sure they pass: 
* org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl
* 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainerMetrics


> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Tony Wu
>Assignee: Tony Wu
> Attachments: HADOOP-12482.001.patch
>
>
> updateJmxCache() was in HADOOP-11301. However the patch introduced a race 
> condition. In updateJmxCache() function in MetricsSourceAdapter.java:
> {code:java}
>   private void updateJmxCache() {
> boolean getAllMetrics = false;
> synchronized (this) {
>   if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
> // temporarilly advance the expiry while updating the cache
> jmxCacheTS = Time.now() + jmxCacheTTL;
> if (lastRecs == null) {
>   getAllMetrics = true;
> }
>   } else {
> return;
>   }
>   if (getAllMetrics) {
> MetricsCollectorImpl builder = new MetricsCollectorImpl();
> getMetrics(builder, true);
>   }
>   updateAttrCache();
>   if (getAllMetrics) {
> updateInfoCache();
>   }
>   jmxCacheTS = Time.now();
>   lastRecs = null; // in case regular interval update is not running
> }
>   }
> {code}
> Notice that getAllMetrics is set to true when:
> # jmxCacheTTL has passed
> # lastRecs == null
> lastRecs is set to null in the same function, but gets reassigned by 
> getMetrics().
> However getMetrics() can be called from a different thread:
> # MetricsSystemImpl.onTimerEvent()
> # MetricsSystemImpl.publishMetricsNow()
> Consider the following sequence:
> # updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
> info. 
> ** lastRecs is set to null.
> # metrics sources is updated with new value/field.
> # getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
> different thread getting the latest metrics. 
> ** lastRecs is updated (!= null).
> # jmxCacheTTL passed.
> # updateJmxCache() is called again via getMBeanInfo().
> ** However because lastRecs is already updated (!= null), getAllMetrics will 
> not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
> returns the old cached info.
> We ran into this issue on a cluster where a new metric did not get published 
> until much later.
> The case can be made worse by a periodic call to getMetrics() (driven by an 
> external program or script). In such case getMBeanInfo() may never be able to 
> retrieve the new record.
> The desired behavior should be that updateJmxCache() will guarantee to call 
> updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
> updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12482) Race condition in JMX cache update

2015-10-15 Thread Tony Wu (JIRA)
Tony Wu created HADOOP-12482:


 Summary: Race condition in JMX cache update
 Key: HADOOP-12482
 URL: https://issues.apache.org/jira/browse/HADOOP-12482
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Tony Wu
Assignee: Tony Wu


updateJmxCache() was in HADOOP-11301. However the patch introduced a race 
condition. In updateJmxCache() function in MetricsSourceAdapter.java:

{code:java}
  private void updateJmxCache() {
boolean getAllMetrics = false;
synchronized (this) {
  if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
// temporarilly advance the expiry while updating the cache
jmxCacheTS = Time.now() + jmxCacheTTL;
if (lastRecs == null) {
  getAllMetrics = true;
}
  } else {
return;
  }

  if (getAllMetrics) {
MetricsCollectorImpl builder = new MetricsCollectorImpl();
getMetrics(builder, true);
  }

  updateAttrCache();
  if (getAllMetrics) {
updateInfoCache();
  }
  jmxCacheTS = Time.now();
  lastRecs = null; // in case regular interval update is not running
}
  }
{code}

Notice that getAllMetrics is set to true when:
# jmxCacheTTL has passed
# lastRecs == null

lastRecs is set to null in the same function, but gets reassigned by 
getMetrics().

However getMetrics() can be called from a different thread:
# MetricsSystemImpl.onTimerEvent()
# MetricsSystemImpl.publishMetricsNow()

Consider the following sequence:
# updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
info. 
** lastRecs is set to null.
# metrics sources is updated with new value/field.
# getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
different thread getting the latest metrics. 
** lastRecs is updated (!= null).
# jmxCacheTTL passed.
# updateJmxCache() is called again via getMBeanInfo().
** However because lastRecs is already updated (!= null), getAllMetrics will 
not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
returns the old cached info.

We ran into this issue on a cluster where a new metric did not get published 
until much later.

The case can be made worse by a periodic call to getMetrics() (driven by an 
external program or script). In such case getMBeanInfo() may never be able to 
retrieve the new record.

The desired behavior should be that updateJmxCache() will guarantee to call 
updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
updateJmxCache() itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12482) Race condition in JMX cache update

2015-10-15 Thread Tony Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Wu updated HADOOP-12482:
-
Description: 
updateJmxCache() was updated in HADOOP-11301. However the patch introduced a 
race condition. In updateJmxCache() function in MetricsSourceAdapter.java:

{code:java}
  private void updateJmxCache() {
boolean getAllMetrics = false;
synchronized (this) {
  if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
// temporarilly advance the expiry while updating the cache
jmxCacheTS = Time.now() + jmxCacheTTL;
if (lastRecs == null) {
  getAllMetrics = true;
}
  } else {
return;
  }

  if (getAllMetrics) {
MetricsCollectorImpl builder = new MetricsCollectorImpl();
getMetrics(builder, true);
  }

  updateAttrCache();
  if (getAllMetrics) {
updateInfoCache();
  }
  jmxCacheTS = Time.now();
  lastRecs = null; // in case regular interval update is not running
}
  }
{code}

Notice that getAllMetrics is set to true when:
# jmxCacheTTL has passed
# lastRecs == null

lastRecs is set to null in the same function, but gets reassigned by 
getMetrics().

However getMetrics() can be called from a different thread:
# MetricsSystemImpl.onTimerEvent()
# MetricsSystemImpl.publishMetricsNow()

Consider the following sequence:
# updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
info. 
** lastRecs is set to null.
# metrics sources is updated with new value/field.
# getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
different thread getting the latest metrics. 
** lastRecs is updated (!= null).
# jmxCacheTTL passed.
# updateJmxCache() is called again via getMBeanInfo().
** However because lastRecs is already updated (!= null), getAllMetrics will 
not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
returns the old cached info.

We ran into this issue on a cluster where a new metric did not get published 
until much later.

The case can be made worse by a periodic call to getMetrics() (driven by an 
external program or script). In such case getMBeanInfo() may never be able to 
retrieve the new record.

The desired behavior should be that updateJmxCache() will guarantee to call 
updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
updateJmxCache() itself.

  was:
updateJmxCache() was in HADOOP-11301. However the patch introduced a race 
condition. In updateJmxCache() function in MetricsSourceAdapter.java:

{code:java}
  private void updateJmxCache() {
boolean getAllMetrics = false;
synchronized (this) {
  if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
// temporarilly advance the expiry while updating the cache
jmxCacheTS = Time.now() + jmxCacheTTL;
if (lastRecs == null) {
  getAllMetrics = true;
}
  } else {
return;
  }

  if (getAllMetrics) {
MetricsCollectorImpl builder = new MetricsCollectorImpl();
getMetrics(builder, true);
  }

  updateAttrCache();
  if (getAllMetrics) {
updateInfoCache();
  }
  jmxCacheTS = Time.now();
  lastRecs = null; // in case regular interval update is not running
}
  }
{code}

Notice that getAllMetrics is set to true when:
# jmxCacheTTL has passed
# lastRecs == null

lastRecs is set to null in the same function, but gets reassigned by 
getMetrics().

However getMetrics() can be called from a different thread:
# MetricsSystemImpl.onTimerEvent()
# MetricsSystemImpl.publishMetricsNow()

Consider the following sequence:
# updateJmxCache() is called by getMBeanInfo() from a thread getting cached 
info. 
** lastRecs is set to null.
# metrics sources is updated with new value/field.
# getMetrics() is called by publishMetricsNow() or onTimerEvent() from a 
different thread getting the latest metrics. 
** lastRecs is updated (!= null).
# jmxCacheTTL passed.
# updateJmxCache() is called again via getMBeanInfo().
** However because lastRecs is already updated (!= null), getAllMetrics will 
not be set to true. So updateInfoCache() is not called and getMBeanInfo() 
returns the old cached info.

We ran into this issue on a cluster where a new metric did not get published 
until much later.

The case can be made worse by a periodic call to getMetrics() (driven by an 
external program or script). In such case getMBeanInfo() may never be able to 
retrieve the new record.

The desired behavior should be that updateJmxCache() will guarantee to call 
updateInfoCache() once after jmxCacheTTL, if lastRecs has been set to null by 
updateJmxCache() itself.


> Race condition in JMX cache update
> --
>
> Key: HADOOP-12482
> URL: https://issues.apache.org/jira/browse/HADOOP-12482
> Project: Hadoop Common
>  Issue