[jira] [Reopened] (HDFS-14084) Need for more stats in DFSClient

2019-01-09 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reopened HDFS-14084:
---

I reverted this from trunk, branch-3.2, branch-3.1, branch-3.1.2, and 
branch-3.0.  Heads up to [~leftnoteasy] as this will impact the 3.1.2 release 
process and require a new release candidate to be built if one was already 
created.


> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Minor
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14084) Need for more stats in DFSClient

2019-01-09 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-14084:
--
Target Version/s: 3.0.4, 3.3.0, 3.2.1, 3.1.3
   Fix Version/s: (was: 3.2.1)
  (was: 3.3.0)
  (was: 3.1.2)
  (was: 3.0.4)

> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Minor
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14084) Need for more stats in DFSClient

2019-01-09 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738768#comment-16738768
 ] 

Jason Lowe commented on HDFS-14084:
---

This is breaking more than just tests.  MapReduce job tasks are failing with 
the same error reported in YARN-9183.  I verified that jobs work just before 
this commit went in and fail with this commit.  I propose this is reverted 
until it can be fixed to restore basic functionality to the 3.x builds.

> Need for more stats in DFSClient
> 
>
> Key: HDFS-14084
> URL: https://issues.apache.org/jira/browse/HDFS-14084
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Pranay Singh
>Assignee: Pranay Singh
>Priority: Minor
> Fix For: 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, 
> HDFS-14084.003.patch, HDFS-14084.004.patch, HDFS-14084.005.patch, 
> HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch, 
> HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now 
> it's becoming more of like a general purpose filesystem. In most of the cases 
> there are issues with the Namenode so we have metrics to know the workload or 
> stress on Namenode.
> However, there is a need to have more statistics collected for different 
> operations/RPCs in DFSClient to know which RPC operations are taking longer 
> time or to know what is the frequency of the operation.These statistics can 
> be exposed to the users of DFS Client and they can periodically log or do 
> some sort of flow control if the response is slow. This will also help to 
> isolate HDFS issue in a mixed environment where on a node say we have Spark, 
> HBase and Impala running together. We can check the throughput of different 
> operation across client and isolate the problem caused because of noisy 
> neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no 
> conclusive evidence as to what caused the problem. If we had metrics or stats 
> in DFSClient we would be better equipped to solve such complex problems.
> List of jiras for reference:
> -
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13975) TestBalancer#testMaxIterationTime fails sporadically

2018-10-08 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-13975:
-

 Summary: TestBalancer#testMaxIterationTime fails sporadically
 Key: HDFS-13975
 URL: https://issues.apache.org/jira/browse/HDFS-13975
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Jason Lowe


A number of precommit builds have seen this test fail like this:
{noformat}
java.lang.AssertionError: Unexpected iteration runtime: 4021ms > 3.5s
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testMaxIterationTime(TestBalancer.java:1649)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-17 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-13822:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Thanks, [~aw] and [~pradeepambati]!  I committed this to trunk.

> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Assignee: Allen Wittenauer
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: HDFS-13382.000.patch, HDFS-13822.01.patch, 
> HDFS-13822.02.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13822) speedup libhdfs++ build (enable parallel build)

2018-08-16 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned HDFS-13822:
-

Assignee: Allen Wittenauer

Thanks for the patches, [~pradeepambati] and [~aw]!  This looks like a massive 
improvement.  I personally would prefer not to have the portability fixes 
mashed in with the build performance changes since it adds a chunk to the patch 
unrelated to the JIRA, but overall it looks like a great change.

I verified that the same libhdfs++ tests are currently broken even without this 
patch, so the tests do not appear to be any worse off after the patch.

+1 lgtm.  I'll commit this tomorrow if nobody objects.  I'll credit both 
contributors in the commit message since Allen's patch includes Pradeep's 
original patch.


> speedup libhdfs++ build (enable parallel build)
> ---
>
> Key: HDFS-13822
> URL: https://issues.apache.org/jira/browse/HDFS-13822
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Pradeep Ambati
>Assignee: Allen Wittenauer
>Priority: Minor
> Attachments: HDFS-13382.000.patch, HDFS-13822.01.patch, 
> HDFS-13822.02.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk. Problem is that libhdfs++ isn't build in parallel. When I tried to 
> force a parallel build by specifying -Dnative_make_args=-j4, the build fails 
> due to dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13472) Compilation error in trunk in hadoop-aws

2018-04-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442734#comment-16442734
 ] 

Jason Lowe commented on HDFS-13472:
---

I am unable to reproduce the compilation error in trunk.  Given StagingTestBase 
has not been modified since November, it looks like many others have been 
unable to reproduce the error as well for some time.  How are you building 
Hadoop to reproduce this error (i.e.: what does the command-line look like)?

bq.  getArgumentAt(int, Class) method is available only from 
version 2.0.0-beta

getArgumentAt is available in 1.10.19.  
https://static.javadoc.io/org.mockito/mockito-core/1.10.19/org/mockito/invocation/InvocationOnMock.html

The reason this is working for me is because mockito-core 1.10.19 is being 
pulled in by the DynamoDBLocal dependency, and that is appearing in the 
classpath before the mockito-all 1.8.5 dependency (as reported by mvn 
dependency:build-classpath).

I agree that the version of mockito-all being requested by Hadoop is wrong.  
It's trying to call a method that isn't available in 1.8.5.  I think we should 
upgrade the mockito dependency to at least 1.10.19.


> Compilation error in trunk in hadoop-aws 
> -
>
> Key: HDFS-13472
> URL: https://issues.apache.org/jira/browse/HDFS-13472
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Mohammad Arshad
>Priority: Major
>
> *Problem:* hadoop trunk compilation is failing
>  *Root Cause:*
>  compilation error is coming from 
> {{org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase}}. Compilation 
> error is "The method getArgumentAt(int, Class) is 
> undefined for the type InvocationOnMock".
> StagingTestBase is using getArgumentAt(int, Class) method 
> which is not available in mockito-all 1.8.5 version. getArgumentAt(int, 
> Class) method is available only from version 2.0.0-beta
> *Expectations:*
>  Either mockito-all version to be upgraded or test case to be written only 
> with available functions in 1.8.5.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13371) NPE for FsServerDefaults.getKeyProviderUri() for clientProtocol communication between 2.7 and 3.2

2018-03-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419489#comment-16419489
 ] 

Jason Lowe commented on HDFS-13371:
---

{quote}Is there a good way to handle this other than having test coverage for 
this (which I'm not even sure how) and catching when new optional fields are 
added?
{quote}
Reflection could be leveraged to generate protobuf records for testing. See 
TestPBImplRecords for an example. Theoretically any field marked optional in 
the protobuf record could be missing in the test record, although I suspect in 
practice many of the fields aren't really optional. So the hard part would be 
knowing which ones are truly optional despite what the protobuf field metadata 
says. So it might be a manual process to add a test for a new, optional field 
that was added. However if there was already a test framework in place and very 
simple to add the new field to the test (i.e.: by adding an annotation to a 
method/field or single line to a test) then it would be more likely to be done.

> NPE for FsServerDefaults.getKeyProviderUri() for clientProtocol communication 
> between 2.7 and 3.2
> -
>
> Key: HDFS-13371
> URL: https://issues.apache.org/jira/browse/HDFS-13371
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Sherwood Zheng
>Assignee: Sherwood Zheng
>Priority: Minor
> Attachments: HADOOP-15336.000.patch, HADOOP-15336.001.patch
>
>
> KeyProviderUri is not available in 2.7 so when 2.7 clients contact with 3.2 
> services, it cannot find the key provider URI and triggers a 
> NullPointerException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13362) add a flag to skip the libhdfs++ build

2018-03-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417981#comment-16417981
 ] 

Jason Lowe commented on HDFS-13362:
---

The maven plugin for cmake automatically detects the number of processors in 
the machine and uses that for the build parallelism.  From CompileMojo#runMake:
{code}
  public void runMake() throws MojoExecutionException {
List cmd = new LinkedList();
cmd.add("make");
cmd.add("-j");
cmd.add(String.valueOf(availableProcessors));
cmd.add("VERBOSE=1");
{code}

I'm not an expert on the maven cmake plugin, but I do know the hadoop-common 
native builds and hadoop-yarn-server-nodemanager native builds use it.  See the 
cmake-compile goal definitions in their respective pom files for examples of 
how to build with cmake and run cetest for unit tests.  Fixing the dependencies 
for parallel builds will be a prerequisite since the maven cmake plugin always 
builds in parallel.

As for avoiding the build via a maven-level flag rather than a cmake flag, we 
should be able to leverage the {{activation}} portion of the profile 
configuration in the pom to disable the native build without invoking cmake at 
all.  HADOOP-13999 did something very similar for the skipShade flag to avoid 
the expensive shaded hadoop-client build.

> add a flag to skip the libhdfs++ build
> --
>
> Key: HDFS-13362
> URL: https://issues.apache.org/jira/browse/HDFS-13362
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: James Clampffer
>Priority: Minor
> Attachments: HDFS-13362.000.patch
>
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk.  This covers adding a flag that would let people build libhdfs 
> without all of libhdfs++ if they don't need it; it should be built by default 
> to maintain compatibility with as many environments as possible.
> Some thoughts:
> -The increase in compile time only impacts clean builds.  Incremental 
> rebuilds aren't significantly more expensive than they used to be if the code 
> hasn't changed.
> -Compile times for libhdfs++ can most likely be reduced but that's a longer 
> term project.  boost::asio and tr1::optional are header-only libraries that 
> are heavily templated so every compilation unit that includes them has to do 
> a lot of parsing.
> Is it common to do completely clean builds frequently for interactive users?  
> Are there opinions on what would be an acceptable compilation time?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13362) add a flag to skip the libhdfs++ build

2018-03-28 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417665#comment-16417665
 ] 

Jason Lowe commented on HDFS-13362:
---

Part of the problem with the excessive build time is that the build isn't being 
performed in parallel.  I noticed we're not using the cmake plugin but rather 
invoking cmake and make directly via the ant plugin.  Is there a good reason to 
not using the cmake plugin like all the other native builds in the project do?  
Doing so would automatically leverage parallel builds.

I tried forcing a parallel build manually by specifying -Dnative_make_args=-j4 
but it failed with a missing ClientNamenodeProtocol.pb.h.  Looks like the 
dependencies aren't fully specified in the makefile, which may explain why we 
can't use the cmake plugin.  I think fixing automatic parallel builds would 
significantly improve native build time on most setups.


> add a flag to skip the libhdfs++ build
> --
>
> Key: HDFS-13362
> URL: https://issues.apache.org/jira/browse/HDFS-13362
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: James Clampffer
>Priority: Minor
>
> libhdfs++ has significantly increased clean build times for the native client 
> on trunk.  This covers adding a flag that would let people build libhdfs 
> without all of libhdfs++ if they don't need it; it should be built by default 
> to maintain compatibility with as many environments as possible.
> Some thoughts:
> -The increase in compile time only impacts clean builds.  Incremental 
> rebuilds aren't significantly more expensive than they used to be if the code 
> hasn't changed.
> -Compile times for libhdfs++ can most likely be reduced but that's a longer 
> term project.  boost::asio and tr1::optional are header-only libraries that 
> are heavily templated so every compilation unit that includes them has to do 
> a lot of parsing.
> Is it common to do completely clean builds frequently for interactive users?  
> Are there opinions on what would be an acceptable compilation time?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13164) File not closed if streamer fail with DSQuotaExceededException

2018-02-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-13164:
--
Fix Version/s: (was: 2.8.4)

I reverted this from branch-2.8 because it breaks the build:
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:testCompile 
(default-testCompile) on project hadoop-hdfs: Compilation failure: Compilation 
failure:
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[70,23]
 package org.slf4j.event does not exist
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1479,55]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1480,52]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1502,55]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1503,52]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1504,49]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1530,52]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1543,55]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java:[1544,52]
 cannot find symbol
[ERROR] symbol:   variable Level
[ERROR] location: class org.apache.hadoop.hdfs.TestQuota
{noformat}


> File not closed if streamer fail with DSQuotaExceededException
> --
>
> Key: HDFS-13164
> URL: https://issues.apache.org/jira/browse/HDFS-13164
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: HDFS-13164.01.patch, HDFS-13164.02.patch
>
>
>  This is found during yarn log aggregation but theoretically could happen to 
> any client.
> If the dir's space quota is exceeded, the following would happen when a file 
> is created:
>  - client {{startFile}} rpc to NN, gets a {{DFSOutputStream}}.
>  - writing to the stream would trigger the streamer to {{getAdditionalBlock}} 
> rpc to NN, which would get the DSQuotaExceededException
>  - client closes the stream
>   
>  The fact that this would leave a 0-sized (or whatever size left in the 
> quota) file in HDFS is beyond the scope of this jira. However, the file would 
> be left in openforwrite status (shown in {{fsck -openforwrite)}} at least, 
> and could potentially leak leaseRenewer too.
> This is because in the close implementation,
>  # {{isClosed}} is first checked, and the close call will be a no-op if 
> {{isClosed == true}}.
>  # {{flushInternal}} checks {{isClosed}}, and throws the exception right away 
> if true
> {{isClosed}} does this: {{return closed || getStreamer().streamerClosed;}}
> When the disk quota is reached, {{getAdditionalBlock}} will throw when the 
> streamer calls. Because the streamer runs in a separate thread, at the time 
> the client calls close on the stream, the streamer may or may not have 
> reached the Quota exception. If it has, then due to #1, the close call on the 
> stream will be no-op. If it hasn't, then due to #2 the {{completeFile}} logic 
> will be skipped.
> {code:java}
> protected synchronized void closeImpl() throws IOException {
> if (isClosed()) {
>   IOException e = lastException.getAndSet(null);
>   if (e == null)
> return;
>   else
> throw e;
> }
>   try {
> flushBuffer(); // flush from all upper layers
> ...
> flushInternal(); // flush all data to Datanodes
> // get

[jira] [Commented] (HDFS-13039) StripedBlockReader#createBlockReader leaks socket on IOException

2018-01-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334531#comment-16334531
 ] 

Jason Lowe commented on HDFS-13039:
---

Note that branch-3 should not be a target of commits per the discussion in 
common-dev.  It was created by mistake and has since been deleted.

 

> StripedBlockReader#createBlockReader leaks socket on IOException
> 
>
> Key: HDFS-13039
> URL: https://issues.apache.org/jira/browse/HDFS-13039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Affects Versions: 3.0.0
>Reporter: Lei (Eddy) Xu
>Assignee: Lei (Eddy) Xu
>Priority: Critical
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-13039.00.patch, HDFS-13039.01.patch
>
>
> When running EC on one cluster, DataNode has millions of {{CLOSE_WAIT}} 
> connections
> {code:java}
> $ grep CLOSE_WAIT lsof.out | wc -l
> 10358700
> // All CLOSW_WAITs belong to the same DataNode process (pid=88527)
> $ grep CLOSE_WAIT lsof.out | awk '{print $2}' | sort | uniq
> 88527
> {code}
> And DN can not open any file / socket, as shown in the log:
> {noformat}
> 2018-01-19 06:47:09,424 WARN io.netty.channel.DefaultChannelPipeline: An 
> exceptionCaught() event was fired, and it reached at the tail of the 
> pipeline. It usually means the last handler in the pipeline did not handle 
> the exception.
> java.io.IOException: Too many open files
> at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
> at 
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
> at 
> io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135)
> at 
> io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:75)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:563)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:504)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:418)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:390)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:742)
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:145)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12919) RBF: Support erasure coding methods in RouterRpcServer

2018-01-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329669#comment-16329669
 ] 

Jason Lowe commented on HDFS-12919:
---

branch-3 should not exist at all (yet).  It was accidentally created by a 
committer recently.  branch-3 will eventually track 3.x releases similar to how 
branch-2 tracks 2.x releases.  But for now trunk is already tracking 3.x 
releases, so we do not need a branch-3.  branch-3 should be created when trunk 
moves to 4.0.0-SNAPSHOT, but in the meantime I'm in the process of asking for 
the removal of branch-3.

 

> RBF: Support erasure coding methods in RouterRpcServer
> --
>
> Key: HDFS-12919
> URL: https://issues.apache.org/jira/browse/HDFS-12919
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Critical
>  Labels: RBF
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-12919-branch-3.001.patch, 
> HDFS-12919-branch-3.002.patch, HDFS-12919-branch-3.003.patch, 
> HDFS-12919.000.patch, HDFS-12919.001.patch, HDFS-12919.002.patch, 
> HDFS-12919.003.patch, HDFS-12919.004.patch, HDFS-12919.005.patch, 
> HDFS-12919.006.patch, HDFS-12919.007.patch, HDFS-12919.008.patch, 
> HDFS-12919.009.patch, HDFS-12919.010.patch, HDFS-12919.011.patch, 
> HDFS-12919.012.patch, HDFS-12919.013.patch, HDFS-12919.013.patch, 
> HDFS-12919.014.patch, HDFS-12919.015.patch, HDFS-12919.016.patch, 
> HDFS-12919.017.patch, HDFS-12919.018.patch, HDFS-12919.019.patch, 
> HDFS-12919.020.patch, HDFS-12919.021.patch, HDFS-12919.022.patch, 
> HDFS-12919.023.patch
>
>
> MAPREDUCE-6954 started to tune the erasure coding settings for staging files. 
> However, the {{Router}} does not support this operation and throws:
> {code}
> 17/12/12 14:36:07 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1513116010218_0002
> org.apache.hadoop.ipc.RemoteException(java.lang.UnsupportedOperationException):
>  Operation "setErasureCodingPolicy" is not supported
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.checkOperation(RouterRpcServer.java:368)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setErasureCodingPolicy(RouterRpcServer.java:1805)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9049) Make Datanode Netty reverse proxy port to be configurable

2018-01-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329229#comment-16329229
 ] 

Jason Lowe commented on HDFS-9049:
--

This was committed accidentally to branch-3 instead of branch-3.0, so I picked 
this change over to branch-3.0 for its inclusion in the 3.0.1 release.

> Make Datanode Netty reverse proxy port to be configurable
> -
>
> Key: HDFS-9049
> URL: https://issues.apache.org/jira/browse/HDFS-9049
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: HDFS-9049-01.patch, HDFS-9049-02.patch, 
> HDFS-9049-03.patch, HDFS-9049-04.patch
>
>
> In DatanodeHttpServer.java Netty is used as reverse proxy. But uses random 
> port to start with binding to localhost. This port can be made configurable 
> for better deployments.
> {code}
>  HttpServer2.Builder builder = new HttpServer2.Builder()
> .setName("datanode")
> .setConf(confForInfoServer)
> .setACL(new AccessControlList(conf.get(DFS_ADMIN, " ")))
> .hostName(getHostnameForSpnegoPrincipal(confForInfoServer))
> .addEndpoint(URI.create("http://localhost:0";))
> .setFindPort(true);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13004) TestLeaseRecoveryStriped#testLeaseRecovery is failing when safeLength is 0MB or larger than the test file

2018-01-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329208#comment-16329208
 ] 

Jason Lowe commented on HDFS-13004:
---

This was committed accidentally to branch-3 instead of branch-3.0, so I picked 
this change over to branch-3.0 for its inclusion in the 3.0.1 release.

> TestLeaseRecoveryStriped#testLeaseRecovery is failing when safeLength is 0MB 
> or larger than the test file
> -
>
> Key: HDFS-13004
> URL: https://issues.apache.org/jira/browse/HDFS-13004
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.0.0
>Reporter: Zsolt Venczel
>Assignee: Zsolt Venczel
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-13004.01.patch, HDFS-13004.02.patch, 
> HDFS-13004.03.patch
>
>
> andre{code}
> Error:
> failed testCase at i=1, 
> blockLengths=org.apache.hadoop.hdfs.TestLeaseRecoveryStriped$BlockLengths@5a4c638d[blockLengths=
> {4194304,4194304,4194304,1048576,4194304,4194304,2097152,1048576,4194304},safeLength=25165824]
> java.lang.AssertionError: File length should be the same expected:<25165824> 
> but was:<18874368>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at 
> org.apache.hadoop.hdfs.StripedFileTestUtil.verifyLength(StripedFileTestUtil.java:79)
> at 
> org.apache.hadoop.hdfs.StripedFileTestUtil.checkData(StripedFileTestUtil.java:362)
> at 
> org.apache.hadoop.hdfs.TestLeaseRecoveryStriped.runTest(TestLeaseRecoveryStriped.java:198)
> at 
> org.apache.hadoop.hdfs.TestLeaseRecoveryStriped.testLeaseRecovery(TestLeaseRecoveryStriped.java:182)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
> at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> Stack:
> java.lang.AssertionError: 
> failed testCase at i=1, 
> blockLengths=org.apache.hadoop.hdfs.TestLeaseRecoveryStriped$BlockLengths@5a4c638d[blockLengths={4194304,4194304,4194304,1048576,4194304,4194304,2097152,1048576,4194304}
> ,safeLength=25165824]
> java.lang.AssertionError: File length should be the same expected:<25165824> 
> but was:<18874368>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at 
> org.apache.hadoop.hdfs.StripedFileTestUtil.verifyLength(StripedFileTestUtil.java:79)
> at 
> org.apache.hadoop.hdfs.StripedFileTestUtil.checkData(StripedFileTestUtil.java:362)
> at 
> org.apache.hadoop.hdfs.TestLeaseRecoveryStriped.runT

[jira] [Commented] (HDFS-12919) RBF: Support erasure coding methods in RouterRpcServer

2018-01-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329201#comment-16329201
 ] 

Jason Lowe commented on HDFS-12919:
---

This was committed accidentally to branch-3 instead of branch-3.0, so I picked 
this change over to branch-3.0 for its inclusion in the 3.0.1 release.

> RBF: Support erasure coding methods in RouterRpcServer
> --
>
> Key: HDFS-12919
> URL: https://issues.apache.org/jira/browse/HDFS-12919
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Critical
>  Labels: RBF
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-12919-branch-3.001.patch, 
> HDFS-12919-branch-3.002.patch, HDFS-12919-branch-3.003.patch, 
> HDFS-12919.000.patch, HDFS-12919.001.patch, HDFS-12919.002.patch, 
> HDFS-12919.003.patch, HDFS-12919.004.patch, HDFS-12919.005.patch, 
> HDFS-12919.006.patch, HDFS-12919.007.patch, HDFS-12919.008.patch, 
> HDFS-12919.009.patch, HDFS-12919.010.patch, HDFS-12919.011.patch, 
> HDFS-12919.012.patch, HDFS-12919.013.patch, HDFS-12919.013.patch, 
> HDFS-12919.014.patch, HDFS-12919.015.patch, HDFS-12919.016.patch, 
> HDFS-12919.017.patch, HDFS-12919.018.patch, HDFS-12919.019.patch, 
> HDFS-12919.020.patch, HDFS-12919.021.patch, HDFS-12919.022.patch, 
> HDFS-12919.023.patch
>
>
> MAPREDUCE-6954 started to tune the erasure coding settings for staging files. 
> However, the {{Router}} does not support this operation and throws:
> {code}
> 17/12/12 14:36:07 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /tmp/hadoop-yarn/staging/hadoop/.staging/job_1513116010218_0002
> org.apache.hadoop.ipc.RemoteException(java.lang.UnsupportedOperationException):
>  Operation "setErasureCodingPolicy" is not supported
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.checkOperation(RouterRpcServer.java:368)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setErasureCodingPolicy(RouterRpcServer.java:1805)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11848) Enhance dfsadmin listOpenFiles command to list files under a given path

2018-01-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329195#comment-16329195
 ] 

Jason Lowe commented on HDFS-11848:
---

This was committed accidentally to branch-3 instead of branch-3.0, so I picked 
this change over to branch-3.0 for its inclusion in the 3.0.1 release.

> Enhance dfsadmin listOpenFiles command to list files under a given path
> ---
>
> Key: HDFS-11848
> URL: https://issues.apache.org/jira/browse/HDFS-11848
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Yiqun Lin
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-11848.001.patch, HDFS-11848.002.patch, 
> HDFS-11848.003.patch, HDFS-11848.004.patch
>
>
> HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list 
> all the open files in the system.
> One more thing that would be nice here is to filter the output on a passed 
> path or DataNode. Usecases: An admin might already know a stale file by path 
> (perhaps from fsck's -openforwrite), and wants to figure out who the lease 
> holder is. Proposal here is add suboptions to {{listOpenFiles}} to list files 
> filtered by path.
> {{LeaseManager#getINodeWithLeases(INodeDirectory)}} can be used to get the 
> open file list for any given ancestor directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning

2018-01-17 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328942#comment-16328942
 ] 

Jason Lowe commented on HDFS-11847:
---

I recently noticed the new {{branch-3}} branch and tracked it back to here.  
branch-3.0 is for tracking 3.0.x releases, currently 3.0.1-SNAPSHOT.  Was the 
creation of {{branch-3}} intentional, and if so, how is it different than 
branch-3.0?

> Enhance dfsadmin listOpenFiles command to list files blocking datanode 
> decommissioning
> --
>
> Key: HDFS-11847
> URL: https://issues.apache.org/jira/browse/HDFS-11847
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch, 
> HDFS-11847.03.patch, HDFS-11847.04.patch, HDFS-11847.05.patch
>
>
> HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list 
> all the open files in the system.
> Additionally, it would be very useful to only list open files that are 
> blocking the DataNode decommissioning. With thousand+ node clusters, where 
> there might be machines added and removed regularly for maintenance, any 
> option to monitor and debug decommissioning status is very helpful. Proposal 
> here is to add suboptions to {{listOpenFiles}} for the above case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-15 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-12881:
--
   Resolution: Fixed
Fix Version/s: 2.7.6
   2.8.4
   2.9.1
   2.10.0
   Status: Resolved  (was: Patch Available)

Thanks, Ajay!  I committed this to branch-2, branch-2.9, branch-2.8, and 
branch-2.7 as well.


> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6
>
> Attachments: HDFS-12881-branch-2.10.0.001.patch, 
> HDFS-12881.001.patch, HDFS-12881.002.patch, HDFS-12881.003.patch, 
> HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293222#comment-16293222
 ] 

Jason Lowe commented on HDFS-12881:
---

Thanks for the branch-2 patch!  +1 lgtm.  I agree the unit tests failures 
appear to be unrelated, and I verified those tests pass locally with the patch 
applied.

Committing this.



> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-12881-branch-2.10.0.001.patch, 
> HDFS-12881.001.patch, HDFS-12881.002.patch, HDFS-12881.003.patch, 
> HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-14 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-12881:
--
Status: Patch Available  (was: Reopened)

> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-12881-branch-2.10.0.001.patch, 
> HDFS-12881.001.patch, HDFS-12881.002.patch, HDFS-12881.003.patch, 
> HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-14 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reopened HDFS-12881:
---

Reopening to get a precommit run on the branch-2 patch.

> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-12881-branch-2.10.0.001.patch, 
> HDFS-12881.001.patch, HDFS-12881.002.patch, HDFS-12881.003.patch, 
> HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12924) Port HDFS-12881 to branch-2 (Output streams closed with IOUtils suppressing write errors)

2017-12-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291314#comment-16291314
 ] 

Jason Lowe commented on HDFS-12924:
---

I've already resolved it as a duplicate, so we can move the discussion back to 
HDFS-12881.

> Port HDFS-12881 to branch-2 (Output streams closed with IOUtils suppressing 
> write errors)
> -
>
> Key: HDFS-12924
> URL: https://issues.apache.org/jira/browse/HDFS-12924
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
> Fix For: 2.10.0
>
> Attachments: HDFS-12924-HDFS-2.10.0.001.patch
>
>
> Port HDFS-12881 to branch-2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12924) Port HDFS-12881 to branch-2 (Output streams closed with IOUtils suppressing write errors)

2017-12-14 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved HDFS-12924.
---
Resolution: Duplicate

> Port HDFS-12881 to branch-2 (Output streams closed with IOUtils suppressing 
> write errors)
> -
>
> Key: HDFS-12924
> URL: https://issues.apache.org/jira/browse/HDFS-12924
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
> Fix For: 2.10.0
>
>
> Port HDFS-12881 to branch-2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12924) Port HDFS-12881 to branch-2 (Output streams closed with IOUtils suppressing write errors)

2017-12-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291297#comment-16291297
 ] 

Jason Lowe commented on HDFS-12924:
---

We should just do this on HDFS-12881 rather than create separate tickets.  We 
can reopen it and place it in Patch Available, that way it's all in one place 
instead of spread across separate tickets.

> Port HDFS-12881 to branch-2 (Output streams closed with IOUtils suppressing 
> write errors)
> -
>
> Key: HDFS-12924
> URL: https://issues.apache.org/jira/browse/HDFS-12924
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
> Fix For: 2.10.0
>
>
> Port HDFS-12881 to branch-2



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-14 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-12881:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.1
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks, Ajay!  I committed this to trunk and branch-3.0.

The problems exist in 2.x versions as well.  Would you be willing to provide a 
patch for branch-2?


> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Fix For: 3.1.0, 3.0.1
>
> Attachments: HDFS-12881.001.patch, HDFS-12881.002.patch, 
> HDFS-12881.003.patch, HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291035#comment-16291035
 ] 

Jason Lowe commented on HDFS-12881:
---

Thanks for updating the patch!  The unit tests are unrelated and have been 
failing in recent nightly builds and other precommit runs.

+1 for the latest patch.  Committing this.

> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Attachments: HDFS-12881.001.patch, HDFS-12881.002.patch, 
> HDFS-12881.003.patch, HDFS-12881.004.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with new version (3.0) of hadoop

2017-12-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291005#comment-16291005
 ] 

Jason Lowe commented on HDFS-12920:
---

This only occurs if the job submitter is using 3.x jars and the submitted job 
is using 2.x jars.  If the job submitter is using the same jars as the code 
then this does not happen, since the values copied from hdfs-default.xml into 
job.xml as part of job submission are compatible with the parsing code.

So another workaround is to have at least two tarballs on HDFS, one that uses 
3.x and one that uses 2.x.  The 3.x site configs request the 3.x tarball and 
the 2.x site configs request the 2.x tarball.  When the job submitter client 
upgrades to use 3.x jars, it can also upgrade to 3.x configs to start running 
the job with 3.x as well.


> HDFS default value change (with adding time unit) breaks old version MR 
> tarball work with new version (3.0) of hadoop
> -
>
> Key: HDFS-12920
> URL: https://issues.apache.org/jira/browse/HDFS-12920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Junping Du
>Priority: Blocker
>
> After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 
> RC1, and run the job with following errors:
> {noformat}
> 2017-12-12 13:29:06,824 INFO [main] 
> org.apache.hadoop.service.AbstractService: Service 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650)
> {noformat}
> This is because HDFS-10845, we are adding time unit to hdfs-default.xml but 
> it cannot be recognized by old version MR jars. 
> This break our rolling upgrade story, so should mark as blocker.
> A quick workaround is to add values in hdfs-site.xml with removing all time 
> unit. But the right way may be to revert HDFS-10845 (and get rid of noisy 
> warnings).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288320#comment-16288320
 ] 

Jason Lowe edited comment on HDFS-12881 at 12/12/17 9:58 PM:
-

Thanks for updating the patch!

The patch looks much better, but it is modifying more places than intended.  
The changes in hadoop-common should be under HADOOP-15085 and the changes in 
YARN are already covered in YARN-7595.  Also one minor nit, it's cleaner to 
call {{IOUtils.closeStream\(x)}} rather than {{IOUtils.cleanupWithLogger(null, 
x)}} when there's only one stream to close.  Would be nice if there was an 
{{IOUtils.closeStreams(...)}} method, but that's not part of this JIRA.



was (Author: jlowe):
Thanks for updating the patch!

The patch looks much better, but it is modifying more places than intended.  
The changes in hadoop-common should be under HADOOP-15085 and the changes in 
YARN are already covered in YARN-7595.  Also one minor nit, it's cleaner to 
call IOUtils.closeStream(x) rather than IOUtils.cleanupWithLogger(null, x) when 
there's only one stream to close.  Would be nice if there was an 
IOUtils.closeStreams(...) method, but that's not part of this JIRA.


> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Attachments: HDFS-12881.001.patch, HDFS-12881.002.patch, 
> HDFS-12881.003.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288320#comment-16288320
 ] 

Jason Lowe commented on HDFS-12881:
---

Thanks for updating the patch!

The patch looks much better, but it is modifying more places than intended.  
The changes in hadoop-common should be under HADOOP-15085 and the changes in 
YARN are already covered in YARN-7595.  Also one minor nit, it's cleaner to 
call IOUtils.closeStream(x) rather than IOUtils.cleanupWithLogger(null, x) when 
there's only one stream to close.  Would be nice if there was an 
IOUtils.closeStreams(...) method, but that's not part of this JIRA.


> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Attachments: HDFS-12881.001.patch, HDFS-12881.002.patch, 
> HDFS-12881.003.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284047#comment-16284047
 ] 

Jason Lowe commented on HDFS-12881:
---

Thanks for the patch!

The patch updates the handling of input streams, but this bug only applies to 
output streams.  For an input stream, once the code has read the data it needs 
then we're not interested in any errors that happen on close.  We've already 
read what we need to from the stream, so anything else that happens to it isn't 
very interesting after that point and we don't want to fail the operation if 
something with that stream does happen.  However for output streams, we need 
the close() to complete successfully otherwise data previously written could be 
lost (e.g.: due to buffering, etc.).

> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Ajay Kumar
> Attachments: HDFS-12881.001.patch
>
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-01 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274548#comment-16274548
 ] 

Jason Lowe commented on HDFS-12881:
---

Some places with this pattern:
* FsDatasetImpl#computeChecksum
* FSImageTestUtil#getImageFileMD5IgnoringTxId
* FSImageTestUtil#corruptVersionFile
* TestOfflineImageViewer#copyPartOfFile


> Output streams closed with IOUtils suppressing write errors
> ---
>
> Key: HDFS-12881
> URL: https://issues.apache.org/jira/browse/HDFS-12881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jason Lowe
>
> There are a few places in HDFS code that are closing an output stream with 
> IOUtils.cleanupWithLogger like this:
> {code}
>   try {
> ...write to outStream...
>   } finally {
> IOUtils.cleanupWithLogger(LOG, outStream);
>   }
> {code}
> This suppresses any IOException that occurs during the close() method which 
> could lead to partial/corrupted output without throwing a corresponding 
> exception.  The code should either use try-with-resources or explicitly close 
> the stream within the try block so the exception thrown during close() is 
> properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-12881) Output streams closed with IOUtils suppressing write errors

2017-12-01 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-12881:
-

 Summary: Output streams closed with IOUtils suppressing write 
errors
 Key: HDFS-12881
 URL: https://issues.apache.org/jira/browse/HDFS-12881
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jason Lowe


There are a few places in HDFS code that are closing an output stream with 
IOUtils.cleanupWithLogger like this:
{code}
  try {
...write to outStream...
  } finally {
IOUtils.cleanupWithLogger(LOG, outStream);
  }
{code}
This suppresses any IOException that occurs during the close() method which 
could lead to partial/corrupted output without throwing a corresponding 
exception.  The code should either use try-with-resources or explicitly close 
the stream within the try block so the exception thrown during close() is 
properly propagated as exceptions during write operations are.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12817) Support multiple storages in DataNodeCluster / SimulatedFSDataset

2017-11-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253958#comment-16253958
 ] 

Jason Lowe commented on HDFS-12817:
---

Probably a hiccup from JIRA being occasionally slow or something.  No worries!

> Support multiple storages in DataNodeCluster / SimulatedFSDataset
> -
>
> Key: HDFS-12817
> URL: https://issues.apache.org/jira/browse/HDFS-12817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, test
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Minor
>
> Currently {{SimulatedFSDataset}} (and thus, {{DataNodeCluster}} with 
> {{-simulated}}) only supports a single storage per {{DataNode}}. Given that 
> the number of storages can have important implications on the performance of 
> block report processing, it would be useful for these classes to support a 
> multiple storage configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-12817) Support multiple storages in DataNodeCluster / SimulatedFSDataset

2017-11-15 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved HDFS-12817.
---
Resolution: Duplicate

> Support multiple storages in DataNodeCluster / SimulatedFSDataset
> -
>
> Key: HDFS-12817
> URL: https://issues.apache.org/jira/browse/HDFS-12817
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, test
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Minor
>
> Currently {{SimulatedFSDataset}} (and thus, {{DataNodeCluster}} with 
> {{-simulated}}) only supports a single storage per {{DataNode}}. Given that 
> the number of storages can have important implications on the performance of 
> block report processing, it would be useful for these classes to support a 
> multiple storage configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message

2017-10-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212654#comment-16212654
 ] 

Jason Lowe commented on HDFS-12688:
---

Then there is very likely some other job or async behavior that is re-creating 
the directory.  Please examine the HDFS audit logs.  You should see why the 
directory is getting re-created after delete there and which node is doing it.  
That will likely pinpoint exactly how this is occurring.



> HDFS File Not Removed Despite Successful "Moved to .Trash" Message
> --
>
> Key: HDFS-12688
> URL: https://issues.apache.org/jira/browse/HDFS-12688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.0
>Reporter: Shriya Gupta
>Priority: Critical
>
> Wrote a simple script to delete and create a file and ran it multiple times. 
> However, some executions of the script randomly threw a FileAlreadyExists 
> error while the others succeeded despite successful hdfs dfs -rm command. The 
> script is as below, I have reproduced it in two different environments -- 
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting hdfs remove **" 
> hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput
>  echo "hdfs compeleted!"
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting mapReduce***"
> mapred job -libjars 
> /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar 
> -submit /data/home/shriya/shell_test/wordcountJob.xml
> The message confirming successful move -- 
> 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: 
> hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728
> The contents of subsequent -ls after -rm also showed that the file still 
> existed)
> The error I got when my MapReduce job tried to create the file -- 
> 17/10/19 14:50:00 WARN security.UserGroupInformation: 
> PriviledgedActionException as: (auth:KERBEROS) 
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> Exception in thread "main" 
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12688) HDFS File Not Removed Despite Successful "Moved to .Trash" Message

2017-10-19 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211729#comment-16211729
 ] 

Jason Lowe commented on HDFS-12688:
---

Have you checked the HDFS audit logs?  That should give you clues about what is 
happening here and who is re-creating the directory.  I suspect what's 
happening here is that the job is executing asynchronously, and you're actually 
running multiple copies of the job at the same time when you run the script 
multiple times.  If the job is still running then it is going to re-create the 
output directory when its tasks need to write output.


> HDFS File Not Removed Despite Successful "Moved to .Trash" Message
> --
>
> Key: HDFS-12688
> URL: https://issues.apache.org/jira/browse/HDFS-12688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 2.6.0
>Reporter: Shriya Gupta
>Priority: Critical
>
> Wrote a simple script to delete and create a file and ran it multiple times. 
> However, some executions of the script randomly threw a FileAlreadyExists 
> error while the others succeeded despite successful hdfs dfs -rm command. The 
> script is as below, I have reproduced it in two different environments -- 
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting hdfs remove **" 
> hdfs dfs -rm -r -f /user/shriya/shell_test/wordcountOutput
>  echo "hdfs compeleted!"
> hdfs dfs -ls  /user/shriya/shell_test/
> echo "starting mapReduce***"
> mapred job -libjars 
> /data/home/shriya/shell_test/hadoop-mapreduce-client-jobclient-2.7.1.jar 
> -submit /data/home/shriya/shell_test/wordcountJob.xml
> The message confirming successful move -- 
> 17/10/19 14:49:12 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nameservice1/user/shriya/shell_test/wordcountOutput' to trash at: 
> hdfs://nameservice1/user/shriya/.Trash/Current/user/shriya/shell_test/wordcountOutput1508438952728
> The contents of subsequent -ls after -rm also showed that the file still 
> existed)
> The error I got when my MapReduce job tried to create the file -- 
> 17/10/19 14:50:00 WARN security.UserGroupInformation: 
> PriviledgedActionException as: (auth:KERBEROS) 
> cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> Exception in thread "main" 
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://nameservice1/user/shriya/shell_test/wordcountOutput already exists
> at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:272)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:315)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12217) HDFS snapshots doesn't capture all open files when one of the open files is deleted

2017-08-02 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-12217:
--
Fix Version/s: (was: 2.9.0)

> HDFS snapshots doesn't capture all open files when one of the open files is 
> deleted
> ---
>
> Key: HDFS-12217
> URL: https://issues.apache.org/jira/browse/HDFS-12217
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-12217.01.patch, HDFS-12217.02.patch, 
> HDFS-12217.03.patch, HDFS-12217.04.patch, HDFS-12217.05.patch
>
>
> With the fix for HDFS-11402, HDFS Snapshots can additionally capture all the 
> open files. Just like all other files, these open files in the snapshots will 
> remain immutable. But, sometimes it is found that snapshots fail to capture 
> all the open files in the system.
> Under the following conditions, LeaseManager will fail to find INode 
> corresponding to an active lease 
> * a file is opened for writing (LeaseManager allots a lease), and
> * the same file is deleted while it is still open for writing and having 
> active lease, and
> * the same file is not referenced in any other Snapshots/Trash
> {{INode[] LeaseManager#getINodesWithLease()}} can thus return null for few 
> leases there by causing the caller to trip over and not return all the open 
> files needed by the snapshot manager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12217) HDFS snapshots doesn't capture all open files when one of the open files is deleted

2017-08-02 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reopened HDFS-12217:
---

I reverted this from branch-2 because the commit broke the build.
{noformat}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java:[395,40]
 unreported exception java.io.IOException; must be caught or declared to be 
thrown
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java:[397,40]
 unreported exception java.io.IOException; must be caught or declared to be 
thrown
[INFO] 2 errors 
{noformat}

The patch will need to be updated for branch-2 and recommitted.

> HDFS snapshots doesn't capture all open files when one of the open files is 
> deleted
> ---
>
> Key: HDFS-12217
> URL: https://issues.apache.org/jira/browse/HDFS-12217
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: HDFS-12217.01.patch, HDFS-12217.02.patch, 
> HDFS-12217.03.patch, HDFS-12217.04.patch, HDFS-12217.05.patch
>
>
> With the fix for HDFS-11402, HDFS Snapshots can additionally capture all the 
> open files. Just like all other files, these open files in the snapshots will 
> remain immutable. But, sometimes it is found that snapshots fail to capture 
> all the open files in the system.
> Under the following conditions, LeaseManager will fail to find INode 
> corresponding to an active lease 
> * a file is opened for writing (LeaseManager allots a lease), and
> * the same file is deleted while it is still open for writing and having 
> active lease, and
> * the same file is not referenced in any other Snapshots/Trash
> {{INode[] LeaseManager#getINodesWithLease()}} can thus return null for few 
> leases there by causing the caller to trip over and not return all the open 
> files needed by the snapshot manager.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11736) OIV tests should not write outside 'target' directory.

2017-06-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16050605#comment-16050605
 ] 

Jason Lowe commented on HDFS-11736:
---

The branch-2 and branch-2.8 builds broke after this commit due to a compilation 
error:
{noformat}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java:[117,35]
 cannot find symbol
  symbol:   method getNameNodeDirectory(java.lang.String,int,int)
  location: class org.apache.hadoop.hdfs.MiniDFSCluster
{noformat}


> OIV tests should not write outside 'target' directory.
> --
>
> Key: HDFS-11736
> URL: https://issues.apache.org/jira/browse/HDFS-11736
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Konstantin Shvachko
>Assignee: Yiqun Lin
>  Labels: newbie++, test
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11736.001.patch, HDFS-11736.002.patch, 
> HDFS-11736.003.patch, HDFS-11736-branch-2.7.001.patch, 
> HDFS-11736-branch-2.7.002.patch
>
>
> A few tests use {{Files.createTempDir()}} from Guava package, but do not set 
> {{java.io.tmpdir}} system property. Thus the temp directory is created in 
> unpredictable places and is not being cleaned up by {{mvn clean}}.
> This was probably introduced in {{TestOfflineImageViewer}} and then 
> replicated in {{TestCheckpoint}}, {{TestStandbyCheckpoints}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11818) TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently

2017-05-12 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-11818:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.2
   3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks to Nathan for the contribution and to Eric for additional review!  I 
committed this to trunk, branch-2, and branch-2.8.

> TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently
> ---
>
> Key: HDFS-11818
> URL: https://issues.apache.org/jira/browse/HDFS-11818
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2, 2.8.2
>Reporter: Eric Badger
>Assignee: Nathan Roberts
> Fix For: 2.9.0, 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11818-branch-2.patch, HDFS-11818.patch
>
>
> Saw a weird Mockito failure in last night's build with the following stack 
> trace:
> {noformat}
> org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
> INodeFile cannot be returned by isRunning()
> isRunning() should return boolean
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.addBlockOnNodes(TestBlockManager.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:404)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:397)
> {noformat}
> This is pretty confusing since we explicitly set isRunning() to return true 
> in TestBlockManager's \@Before method
> {noformat}
> 154Mockito.doReturn(true).when(fsn).isRunning();
> {noformat}
> Also saw the following exception in the logs:
> {noformat}
> 2017-05-12 05:42:27,903 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(2796)) - Error while processing replication queues 
> async
> org.mockito.exceptions.base.MockitoException: 
> 'writeLockInterruptibly' is a *void method* and it *cannot* be stubbed with a 
> *return value*!
> Voids are usually stubbed with Throwables:
> doThrow(exception).when(mock).someVoidMethod();
> If the method you are trying to stub is *overloaded* then make sure you are 
> calling the right overloaded version.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatesAsync(BlockManager.java:2841)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.access$100(BlockManager.java:120)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$1.run(BlockManager.java:2792)
> {noformat}
> This is also weird since we don't do any explicit mocking with 
> {{writeLockInterruptibly}} via fsn in the test. It has to be something 
> changing the mocks or non-thread safe access or something like that. I can't 
> explain the failures otherwise. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11818) TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently

2017-05-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008847#comment-16008847
 ] 

Jason Lowe commented on HDFS-11818:
---

Test failures are unrelated.

+1 lgtm.  Committing this.


> TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently
> ---
>
> Key: HDFS-11818
> URL: https://issues.apache.org/jira/browse/HDFS-11818
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2, 2.8.2
>Reporter: Eric Badger
>Assignee: Nathan Roberts
> Attachments: HDFS-11818-branch-2.patch, HDFS-11818.patch
>
>
> Saw a weird Mockito failure in last night's build with the following stack 
> trace:
> {noformat}
> org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
> INodeFile cannot be returned by isRunning()
> isRunning() should return boolean
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.addBlockOnNodes(TestBlockManager.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:404)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:397)
> {noformat}
> This is pretty confusing since we explicitly set isRunning() to return true 
> in TestBlockManager's \@Before method
> {noformat}
> 154Mockito.doReturn(true).when(fsn).isRunning();
> {noformat}
> Also saw the following exception in the logs:
> {noformat}
> 2017-05-12 05:42:27,903 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(2796)) - Error while processing replication queues 
> async
> org.mockito.exceptions.base.MockitoException: 
> 'writeLockInterruptibly' is a *void method* and it *cannot* be stubbed with a 
> *return value*!
> Voids are usually stubbed with Throwables:
> doThrow(exception).when(mock).someVoidMethod();
> If the method you are trying to stub is *overloaded* then make sure you are 
> calling the right overloaded version.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatesAsync(BlockManager.java:2841)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.access$100(BlockManager.java:120)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$1.run(BlockManager.java:2792)
> {noformat}
> This is also weird since we don't do any explicit mocking with 
> {{writeLockInterruptibly}} via fsn in the test. It has to be something 
> changing the mocks or non-thread safe access or something like that. I can't 
> explain the failures otherwise. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11818) TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently

2017-05-12 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-11818:
--
Affects Version/s: 2.8.2

> TestBlockManager.testSufficientlyReplBlocksUsesNewRack fails intermittently
> ---
>
> Key: HDFS-11818
> URL: https://issues.apache.org/jira/browse/HDFS-11818
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: Eric Badger
>Assignee: Nathan Roberts
>
> Saw a weird Mockito failure in last night's build with the following stack 
> trace:
> {noformat}
> org.mockito.exceptions.misusing.WrongTypeOfReturnValue: 
> INodeFile cannot be returned by isRunning()
> isRunning() should return boolean
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.addBlockOnNodes(TestBlockManager.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.doTestSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:404)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSufficientlyReplBlocksUsesNewRack(TestBlockManager.java:397)
> {noformat}
> This is pretty confusing since we explicitly set isRunning() to return true 
> in TestBlockManager's \@Before method
> {noformat}
> 154Mockito.doReturn(true).when(fsn).isRunning();
> {noformat}
> Also saw the following exception in the logs:
> {noformat}
> 2017-05-12 05:42:27,903 ERROR blockmanagement.BlockManager 
> (BlockManager.java:run(2796)) - Error while processing replication queues 
> async
> org.mockito.exceptions.base.MockitoException: 
> 'writeLockInterruptibly' is a *void method* and it *cannot* be stubbed with a 
> *return value*!
> Voids are usually stubbed with Throwables:
> doThrow(exception).when(mock).someVoidMethod();
> If the method you are trying to stub is *overloaded* then make sure you are 
> calling the right overloaded version.
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processMisReplicatesAsync(BlockManager.java:2841)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.access$100(BlockManager.java:120)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$1.run(BlockManager.java:2792)
> {noformat}
> This is also weird since we don't do any explicit mocking with 
> {{writeLockInterruptibly}} via fsn in the test. It has to be something 
> changing the mocks or non-thread safe access or something like that. I can't 
> explain the failures otherwise. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds

2017-05-10 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-11745:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.2
   3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks to Eric for the contribution and to Arpit for additional review!  I 
committed this to trunk, branch-2, and branch-2.8.

> Increase HDFS test timeouts from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 3.0.0-alpha3, 2.8.2
>
> Attachments: HDFS-11745.001.patch, HDFS-11745.002.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds

2017-05-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005116#comment-16005116
 ] 

Jason Lowe commented on HDFS-11745:
---

Test failures are unrelated.

+1 lgtm.  Committing this.

> Increase HDFS test timeouts from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11745.001.patch, HDFS-11745.002.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds

2017-05-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004643#comment-16004643
 ] 

Jason Lowe commented on HDFS-11745:
---

bq. Thanks for committing it

I'm guessing this comment was meant for HADOOP-14377.  I haven't committed this 
one yet -- still waiting for Eric's response about the testCapacityMetrics 
comment.

> Increase HDFS test timeouts from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11745.001.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11745) Increase HDFS test timeouts from 1 second to 10 seconds

2017-05-09 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003597#comment-16003597
 ] 

Jason Lowe commented on HDFS-11745:
---

Thanks for the patch!  I noticed that TestNameNodeMetrics#testCapacityMetrics 
also has a pretty low timeout (1.8 seconds, seems like an odd number).  I think 
we should bump that as well.

> Increase HDFS test timeouts from 1 second to 10 seconds
> ---
>
> Key: HDFS-11745
> URL: https://issues.apache.org/jira/browse/HDFS-11745
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-11745.001.patch
>
>
> 1 second test timeouts are susceptible to failure on overloaded or otherwise 
> slow machines



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11691) Add a proper scheme to the datanode links in NN web UI

2017-04-25 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-11691:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.1
   3.0.0-alpha3
   2.9.0
   Status: Resolved  (was: Patch Available)

Thanks to [~kihwal] for the contribution and to [~cheersyang] for additional 
review!  I committed this to trunk, branch-2, branch-2.8, and branch-2.8.1.

> Add a proper scheme to the datanode links in NN web UI
> --
>
> Key: HDFS-11691
> URL: https://issues.apache.org/jira/browse/HDFS-11691
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 2.9.0, 3.0.0-alpha3, 2.8.1
>
> Attachments: HDFS-11691.patch
>
>
> On the datanodes page of the namenode web UI, the datanode links may not be 
> correct if the namenode is serving the page through http but https is also 
> enabled.  This is because {{dfshealth.js}} does not put a proper scheme in 
> front of the address.  It already determines whether the address is 
> non-secure or secure. It can simply prepend {{http:}} or {{https:}} to what 
> it is currently setting.
> The existing mechanism would work for YARN and MAPRED, since they can only 
> serve one protocol, HTTP or HTTPS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11691) Add a proper scheme to the datanode links in NN web UI

2017-04-25 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983034#comment-15983034
 ] 

Jason Lowe commented on HDFS-11691:
---

+1 lgtm.  I'll commit this later today if there are no objections.

> Add a proper scheme to the datanode links in NN web UI
> --
>
> Key: HDFS-11691
> URL: https://issues.apache.org/jira/browse/HDFS-11691
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-11691.patch
>
>
> On the datanodes page of the namenode web UI, the datanode links may not be 
> correct if the namenode is serving the page through http but https is also 
> enabled.  This is because {{dfshealth.js}} does not put a proper scheme in 
> front of the address.  It already determines whether the address is 
> non-secure or secure. It can simply prepend {{http:}} or {{https:}} to what 
> it is currently setting.
> The existing mechanism would work for YARN and MAPRED, since they can only 
> serve one protocol, HTTP or HTTPS.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-11501) Is there a way of loading cvs files to create hive tables with desired lengths for columns?

2017-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved HDFS-11501.
---
Resolution: Invalid

JIRA is for tracking bugs against the Hadoop project and not for general user 
support.  Please post your question to either the [Hive user mailing 
list|http://hive.apache.org/mailing_lists.html] or the [Hadoop user mailing 
list|http://hadoop.apache.org/mailing_lists.html].

> Is there a way of loading cvs files to create hive tables with desired 
> lengths for columns?
> ---
>
> Key: HDFS-11501
> URL: https://issues.apache.org/jira/browse/HDFS-11501
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wen Lin
>
> We just got on Hadoop environment. Our data sources are cvs files. The hive 
> tables created from the sources are seen all character /string columns have 
> same length of 255 bytes, even for gender which has value with one byte. Is 
> there a way of loading cvs files to create hive tables with desired lengths 
> for string columns instead of 255 across all tables? Thank you for your help!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11498) Make RestCsrfPreventionHandler and WebHdfsHandler compatible with Netty 4.0

2017-03-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897515#comment-15897515
 ] 

Jason Lowe commented on HDFS-11498:
---

I was under the impression this would go in relatively cleanly for branch-2.8, 
but that does not seem to be the case.  There is a minor conflict on some 
imports in the patch, but even after those are resolved the build is broken 
after this patch.  DatanodeHttpServer fails to compile after the netty-all 
version change.  I suspect it is because HDFS-8377 was semi-reverted from 
branch-2 but not branch-2.8.  Seems like we need to cherry-pick the fix for 
HDFS-11376 (really the revert patch in HDFS-8377) to branch-2.8 _then_ apply 
this change.  Thoughts?

> Make RestCsrfPreventionHandler and WebHdfsHandler compatible with Netty 4.0
> ---
>
> Key: HDFS-11498
> URL: https://issues.apache.org/jira/browse/HDFS-11498
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-11498.001.patch, HDFS-11498.branch-2.001.patch
>
>
> Per discussion in HADOOP-13866, it looks like we can change 2.8.0 back to 
> exposing Netty 4.0, but still be ABI compatible with Netty 4.1 for users like 
> HBase that want to swap out the version.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11498) Make RestCsrfPreventionHandler and WebHdfsHandler compatible with Netty 4.0

2017-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-11498:
--
Attachment: HDFS-11498.001.patch

+1, branch-2 patch looks good to me.  The javac warning is unrelated as are the 
unit tests, and the tests pass on branch-2 for me with the patch applied.

I believe the simplest path forward is to commit this patch to trunk, branch-2, 
branch-2.8 and branch-2.8.0 then update HADOOP-13866 accordingly.  We do not 
want to ship a 4.x beta even on trunk.

Uploading essentially the same patch (except for line number offsets) for 
trunk, and I'll commit this pending that Jenkins run.  

> Make RestCsrfPreventionHandler and WebHdfsHandler compatible with Netty 4.0
> ---
>
> Key: HDFS-11498
> URL: https://issues.apache.org/jira/browse/HDFS-11498
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: HDFS-11498.001.patch, HDFS-11498.branch-2.001.patch
>
>
> Per discussion in HADOOP-13866, it looks like we can change 2.8.0 back to 
> exposing Netty 4.0, but still be ABI compatible with Netty 4.1 for users like 
> HBase that want to swap out the version.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11252) TestFileTruncate#testTruncateWithDataNodesRestartImmediately can fail with BindException

2016-12-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751737#comment-15751737
 ] 

Jason Lowe commented on HDFS-11252:
---

Stacktrace:
{noformat}
java.net.BindException: Problem binding to [localhost:33571] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.apache.hadoop.ipc.Server.bind(Server.java:543)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:1033)
at org.apache.hadoop.ipc.Server.(Server.java:2785)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:960)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:420)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:341)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:802)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:953)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1364)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:492)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2661)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2564)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2611)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2305)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.restartDataNode(MiniDFSCluster.java:2355)
at 
org.apache.hadoop.hdfs.server.namenode.TestFileTruncate.testTruncateWithDataNodesRestartImmediately(TestFileTruncate.java:804)
{noformat}


> TestFileTruncate#testTruncateWithDataNodesRestartImmediately can fail with 
> BindException
> 
>
> Key: HDFS-11252
> URL: https://issues.apache.org/jira/browse/HDFS-11252
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha2
>Reporter: Jason Lowe
>
> testTruncateWithDataNodesRestartImmediately can fail with a BindException.  
> The setup for TestFileTruncate has been fixed in the past to solve a bind 
> exception, but this is occurring after the minicluster comes up and the 
> datanodes are being restarted.  Maybe there's a race condition there?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11252) TestFileTruncate#testTruncateWithDataNodesRestartImmediately can fail with BindException

2016-12-15 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-11252:
-

 Summary: 
TestFileTruncate#testTruncateWithDataNodesRestartImmediately can fail with 
BindException
 Key: HDFS-11252
 URL: https://issues.apache.org/jira/browse/HDFS-11252
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0-alpha2
Reporter: Jason Lowe


testTruncateWithDataNodesRestartImmediately can fail with a BindException.  The 
setup for TestFileTruncate has been fixed in the past to solve a bind 
exception, but this is occurring after the minicluster comes up and the 
datanodes are being restarted.  Maybe there's a race condition there?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11251) ConcurrentModificationException during DataNode#refreshVolumes

2016-12-15 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-11251:
-

 Summary: ConcurrentModificationException during 
DataNode#refreshVolumes
 Key: HDFS-11251
 URL: https://issues.apache.org/jira/browse/HDFS-11251
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0-alpha2
Reporter: Jason Lowe


The testAddVolumesDuringWrite case failed with a ReconfigurationException which 
appears to have been caused by a ConcurrentModificationException.  Stacktrace 
details to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11251) ConcurrentModificationException during DataNode#refreshVolumes

2016-12-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751703#comment-15751703
 ] 

Jason Lowe commented on HDFS-11251:
---

The test failed with this stacktrace:
{noformat}
org.apache.hadoop.conf.ReconfigurationException: Could not change property 
dfs.datanode.data.dir from 
'[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data1,[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data2,[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data4'
 to 
'[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data1,[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data2,[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data3,[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data4'
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:777)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:532)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.addVolumes(TestDataNodeHotSwapVolumes.java:310)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testAddVolumesDuringWrite(TestDataNodeHotSwapVolumes.java:404)

{noformat}

In the test output I found a CME which appears to be the cause.  If so, it'd be 
nice if ReconfigurationException relayed the exception that caused the failure.
{noformat}
2016-12-15 00:33:21,848 [pool-239-thread-2] INFO  impl.FsDatasetImpl 
(FsVolumeList.java:addVolume(320)) - Added new volume: 
DS-6c2d1743-ee6f-4011-8042-b47d45d5279b
2016-12-15 00:33:21,848 [pool-239-thread-2] INFO  impl.FsDatasetImpl 
(FsDatasetImpl.java:addVolume(494)) - Added volume - 
[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data4,
 StorageType: DISK
2016-12-15 00:33:21,851 [Thread-1888] ERROR datanode.DataNode 
(DataNode.java:refreshVolumes(764)) - Failed to add volume: 
[DISK]file:/testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data3
java.util.concurrent.ExecutionException: 
java.util.ConcurrentModificationException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.refreshVolumes(DataNode.java:750)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.reconfigurePropertyImpl(DataNode.java:532)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.addVolumes(TestDataNodeHotSwapVolumes.java:310)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes.testAddVolumesDuringWrite(TestDataNodeHotSwapVolumes.java:404)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at 
org.apache.hadoop.hdfs.server.common.Storage.containsStorageDir(Storage.java:999)
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:220)
at 
org.apache.hadoop.hdfs.server.datanode.DataStorage.prepareVolume(DataStorage.java:332)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addVolume(FsDatasetImpl.java:455)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$2.call(DataNode.java:737)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$2.call(DataNode.java:733)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}


> ConcurrentModificationException during DataNode#r

[jira] [Updated] (HDFS-9745) TestSecureNNWithQJM#testSecureMode sometimes fails with timeouts

2016-08-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-9745:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.4
   2.8.0
   Status: Resolved  (was: Patch Available)

Thanks, [~xiaochen]!  I committed this to trunk, branch-2, branch-2.8, and 
branch-2.7.

> TestSecureNNWithQJM#testSecureMode sometimes fails with timeouts
> 
>
> Key: HDFS-9745
> URL: https://issues.apache.org/jira/browse/HDFS-9745
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Minor
> Fix For: 2.8.0, 2.7.4
>
> Attachments: HDFS-9745.01.patch
>
>
> TestSecureNNWithQJM#testSecureMode fails intermittently. For most of the 
> case, it timeouts.
> In a 0.5%~1% probability, it fails with a more sophisticated error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9745) TestSecureNNWithQJM#testSecureMode sometimes fails with timeouts

2016-08-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15432932#comment-15432932
 ] 

Jason Lowe commented on HDFS-9745:
--

Seeing occasional timeouts for this test as well.

+1 lgtm.  Committing this.

> TestSecureNNWithQJM#testSecureMode sometimes fails with timeouts
> 
>
> Key: HDFS-9745
> URL: https://issues.apache.org/jira/browse/HDFS-9745
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Minor
> Attachments: HDFS-9745.01.patch
>
>
> TestSecureNNWithQJM#testSecureMode fails intermittently. For most of the 
> case, it timeouts.
> In a 0.5%~1% probability, it fails with a more sophisticated error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10774) Reflective XSS and HTML injection vulnerability

2016-08-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426512#comment-15426512
 ] 

Jason Lowe commented on HDFS-10774:
---

Security issues can be mailed to secur...@hadoop.apache.org.  See 
http://hadoop.apache.org/mailing_lists.html#Security for details and pointers 
to other mailing lists.

> Reflective XSS and HTML injection vulnerability
> ---
>
> Key: HDFS-10774
> URL: https://issues.apache.org/jira/browse/HDFS-10774
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.0-alpha
>Reporter: Will Harmon
>  Labels: security
>
> I’m assessing my customer's Apache Hadoop 2.0.0-CDH4.7.0 installation, and I 
> came across an XSS and HTML injection vulnerability. Although my customer 
> instance is 2.0.0, newer versions are also likely vulnerable. I’d like to 
> provide more details about my finding but first want to ensure I’m 
> communicating with the correct group. Please let me know if you would like to 
> know more and how I can securely share my findings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9580) TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected number of invalidate blocks.

2016-06-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-9580:
-
Fix Version/s: (was: 3.0.0-alpha1)
   2.8.0

Thanks, [~jojochuang]!  I committed this to branch-2 and branch-2.8 as well.

> TestComputeInvalidateWork#testDatanodeReRegistration failed due to unexpected 
> number of invalidate blocks.
> --
>
> Key: HDFS-9580
> URL: https://issues.apache.org/jira/browse/HDFS-9580
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode, test
>Affects Versions: 3.0.0-alpha1
> Environment: Jenkins
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.8.0
>
> Attachments: HDFS-9580.001.patch
>
>
> The failure appeared in the trunk jenkins job.
> https://builds.apache.org/job/Hadoop-Hdfs-trunk/2646/
> {noformat}
> Error Message
> Expected invalidate blocks to be the number of DNs expected:<3> but was:<2>
> Stacktrace
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> I think there could be a race condition between creating a file and shutting 
> down data nodes, which failed the test.
> {noformat}
> 2015-12-19 07:11:02,765 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[]] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=LAST_IN_PIPELINE, downstreams=0:[] terminating
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:45655, dest: 
> /127.0.0.1:54890, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> 6a13ec05-e1c1-4086-8a4d-d5a09636afcd, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 954174423
> 2015-12-19 07:11:02,768 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  DataNode.clienttrace 
> (BlockReceiver.java:finalizeBlock(1431)) - src: /127.0.0.1:33252, dest: 
> /127.0.0.1:54426, bytes: 134217728, op: HDFS_WRITE, cliID: 
> DFSClient_NONMAPREDUCE_147911011_935, offset: 0, srvID: 
> d81751db-02a9-48fe-b697-77623048784b, blockid: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, duration: 
> 957463510
> 2015-12-19 07:11:02,772 [PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO  datanode.DataNode 
> (BlockReceiver.java:run(1404)) - PacketResponder: 
> BP-1551077294-67.195.81.149-1450509060247:blk_1073741825_1001, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-12-19 07:11:02,782 [IPC Server handler 4 on 36404] INFO  
> blockmanagement.BlockManager 
> (BlockManager.java:checkBlocksProperlyReplicated(3871)) - BLOCK* 
> blk_1073741825_1001 is not COMPLETE (ucState = COMMITTED, replication# = 0 <  
> minimum = 1) in file /testRR
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:02,783 [IPC Server handler 4 on 36404] INFO  
> namenode.EditLogFileOutputStream 
> (EditLogFileOutputStream.java:flushAndSync(200)) - Nothing to flush
> 2015-12-19 07:11:03,190 [IPC Server handler 8 on 36404] INFO  
> hdfs.StateChange (FSNamesystem.java:completeFile(2557)) - DIR* completeFile: 
> /testRR is closed by DFSClient_NONMAPREDUCE_147911011_935
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9434) Recommission a datanode with 500k blocks may pause NN for 30 seconds

2015-11-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025422#comment-15025422
 ] 

Jason Lowe commented on HDFS-9434:
--

This broke the 2.6 build.  The patch is assuming SLF4J but that hasn't happened 
on the 2.6 branch.

> Recommission a datanode with 500k blocks may pause NN for 30 seconds
> 
>
> Key: HDFS-9434
> URL: https://issues.apache.org/jira/browse/HDFS-9434
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Tsz Wo Nicholas Sze
> Fix For: 2.6.3
>
> Attachments: h9434_20151116.patch
>
>
> In BlockManager, processOverReplicatedBlocksOnReCommission is called within 
> the namespace lock.  There is a (not very useful) log message printed in 
> processOverReplicatedBlock.  When there is a large number of blocks stored in 
> a storage, printing the log message for each block can pause NN to process 
> any other operations.  We did see that it could pause NN  for 30 seconds for 
> a storage with 500k blocks.
> I suggest to change the log message to trace level as a quick fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7087) Ability to list /.reserved

2015-10-21 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14968376#comment-14968376
 ] 

Jason Lowe commented on HDFS-7087:
--

Looks like this broke the branch-2 build:
{noformat}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:[126,7]
 constructor HdfsFileStatus in class 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus cannot be applied to given types;
  required: 
long,boolean,int,long,long,long,org.apache.hadoop.fs.permission.FsPermission,java.lang.String,java.lang.String,byte[],byte[],long,int,org.apache.hadoop.fs.FileEncryptionInfo,byte
  found: 
int,boolean,int,int,int,int,org.apache.hadoop.fs.permission.FsPermissionbyte[],long,int,,byte,
  reason: actual and formal argument lists differ in length
[ERROR] 
/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:[350,29]
 constructor HdfsFileStatus in class 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus cannot be applied to given types;
  required: 
long,boolean,int,long,long,long,org.apache.hadoop.fs.permission.FsPermission,java.lang.String,java.lang.String,byte[],byte[],long,int,org.apache.hadoop.fs.FileEncryptionInfo,byte
  found: 
int,boolean,int,int,long,long,org.apache.hadoop.fs.permission.FsPermission,,java.lang.String,,byte[],long,int,,byte,
  reason: actual and formal argument lists differ in length
[ERROR] 
/hadoop/apache/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:[354,26]
 constructor HdfsFileStatus in class 
org.apache.hadoop.hdfs.protocol.HdfsFileStatus cannot be applied to given types;
  required: 
long,boolean,int,long,long,long,org.apache.hadoop.fs.permission.FsPermission,java.lang.String,java.lang.String,byte[],byte[],long,int,org.apache.hadoop.fs.FileEncryptionInfo,byte
  found: 
int,boolean,int,int,long,long,org.apache.hadoop.fs.permission.FsPermission,,java.lang.String,,byte[],long,int,,byte,
  reason: actual and formal argument lists differ in length
[INFO] 3 errors 
{noformat}


> Ability to list /.reserved
> --
>
> Key: HDFS-7087
> URL: https://issues.apache.org/jira/browse/HDFS-7087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Andrew Wang
>Assignee: Xiao Chen
> Fix For: 2.8.0
>
> Attachments: HDFS-7087.001.patch, HDFS-7087.002.patch, 
> HDFS-7087.003.patch, HDFS-7087.draft.patch
>
>
> We have two special paths within /.reserved now, /.reserved/.inodes and 
> /.reserved/raw. It seems like we should be able to list /.reserved to see 
> them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9193) Fix incorrect references the usages of the DN in dfshealth.js

2015-10-05 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-9193:
-
Affects Version/s: 2.8.0
Fix Version/s: 2.8.0

> Fix incorrect references the usages of the DN in dfshealth.js
> -
>
> Key: HDFS-9193
> URL: https://issues.apache.org/jira/browse/HDFS-9193
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Chang Li
>Assignee: Chang Li
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: HDFS-9193.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts

2015-08-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14697547#comment-14697547
 ] 

Jason Lowe commented on HDFS-8898:
--

This would solve a significant annoyance with computing quotas on a shared 
tree.  However I think it has security implications.  If one can get the quota 
totals for the entire tree then they can calculate what must be used by the 
parts they cannot access via quota_usage - usage_visible.  If what is being 
stored in the restricted area is sensitive (e.g.: records related to 
financials) then knowing how many files or the size of the restricted data 
could leak sensitive information.

> Create API and command-line argument to get quota without need to get file 
> and directory counts
> ---
>
> Key: HDFS-8898
> URL: https://issues.apache.org/jira/browse/HDFS-8898
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs
>Reporter: Joep Rottinghuis
>
> On large directory structures it takes significant time to iterate through 
> the file and directory counts recursively to get a complete ContentSummary.
> When you want to just check for the quota on a higher level directory it 
> would be good to have an option to skip the file and directory counts.
> Moreover, currently one can only check the quota if you have access to all 
> the directories underneath. For example, if I have a large home directory 
> under /user/joep and I host some files for another user in a sub-directory, 
> the moment they create an unreadable sub-directory under my home I can no 
> longer check what my quota is. Understood that I cannot check the current 
> file counts unless I can iterate through all the usage, but for 
> administrative purposes it is nice to be able to get the current quota 
> setting on a directory without the need to iterate through and run into 
> permission issues on sub-directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8068) Do not retry rpc calls If the proxy contains unresolved address

2015-06-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603682#comment-14603682
 ] 

Jason Lowe commented on HDFS-8068:
--

Recently I ran across a similar situation except it involved YarnClient trying 
to setup the RM proxy rather than DFSClient trying to setup the NN proxy.  If 
possible, I'd much rather try to fix this in the RPC layer so we don't require 
every subsystem that wants to setup proxies to handle this separately.  Filed 
HADOOP-12125 to track that effort.

> Do not retry rpc calls If the proxy contains unresolved address
> ---
>
> Key: HDFS-8068
> URL: https://issues.apache.org/jira/browse/HDFS-8068
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>  Labels: BB2015-05-TBR
> Attachments: HDFS-8068.v1.patch, HDFS-8068.v2.patch
>
>
> When the InetSocketAddress object happens to be unresolvable (e.g. due to 
> transient DNS issue), the rpc proxy object will not be usable since the 
> client will throw UnknownHostException when a Connection object is created. 
> If FailoverOnNetworkExceptionRetry is used as in the standard HA failover 
> proxy, the call will be retried, but this will never recover.  Instead, the 
> validity of address must be checked on pxoy creation and throw if it is 
> invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8182) Implement topology-aware CDN-style caching

2015-04-22 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507982#comment-14507982
 ] 

Jason Lowe commented on HDFS-8182:
--

Quota seems like it could be problematic for some of the use-cases involved in 
the distributed cache.  Not all files being downloaded by a user belong to that 
user.  Public localized resources are a good example of this.  How does quota 
enter the picture in the case where the user who owns the file is not the user 
requesting the file?  For example, are other users accessing my public files 
going to be able to cause my quota usage to increase by this feature?

> Implement topology-aware CDN-style caching
> --
>
> Key: HDFS-8182
> URL: https://issues.apache.org/jira/browse/HDFS-8182
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, namenode
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>
> To scale reads of hot blocks in large clusters, it would be beneficial if we 
> could read a block across the ToR switches only once. Example scenarios are 
> localization of binaries, MR distributed cache files for map-side joins and 
> similar. There are multiple layers where this could be implemented (YARN 
> service or individual apps such as MR) but I believe it is best done in HDFS 
> or even common FileSystem to support as many use cases as possible. 
> The life cycle could look like this e.g. for the YARN localization scenario:
> 1. inputStream = fs.open(path, ..., CACHE_IN_RACK)
> 2. instead of reading from a remote DN directly, NN tells the client to read 
> via the local DN1 and the DN1 creates a replica of each block.
> When the next localizer on DN2 in the same rack starts it will learn from NN 
> about the replica in DN1 and the client will read from DN1 using the 
> conventional path.
> When the application ends the AM or NM's can instruct the NN in a fadvise 
> DONTNEED style, it can start telling DN's to discard extraneous replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7212) Huge number of BLOCKED threads rendering DataNodes useless

2015-03-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371285#comment-14371285
 ] 

Jason Lowe commented on HDFS-7212:
--

Wondering if you were seeing the same thing as HADOOP-11333 which is fixed in 
2.7.0.  Does the stacktrace in HADOOP-11333 match what you were seeing?

> Huge number of BLOCKED threads rendering DataNodes useless
> --
>
> Key: HDFS-7212
> URL: https://issues.apache.org/jira/browse/HDFS-7212
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.4.0
> Environment: PROD
>Reporter: Istvan Szukacs
>
> There are 3000 - 8000 threads in each datanode JVM, blocking the entire VM 
> and rendering the service unusable, missing heartbeats and stopping data 
> access. The threads look like this:
> {code}
> 3415 (state = BLOCKED)
> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
> be imprecise)
> - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=186 (Compiled frame)
> - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=834 (Interpreted frame)
> - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=867 (Interpreted frame)
> - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) @bci=17, 
> line=1197 (Interpreted frame)
> - java.util.concurrent.locks.ReentrantLock$NonfairSync.lock() @bci=21, 
> line=214 (Compiled frame)
> - java.util.concurrent.locks.ReentrantLock.lock() @bci=4, line=290 (Compiled 
> frame)
> - 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(org.apache.hadoop.net.unix.DomainSocket,
>  org.apache.hadoop.net.unix.DomainSocketWatcher$Handler) @bci=4, line=286 
> (Interpreted frame)
> - 
> org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(java.lang.String,
>  org.apache.hadoop.net.unix.DomainSocket) @bci=169, line=283 (Interpreted 
> frame)
> - 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(java.lang.String)
>  @bci=212, line=413 (Interpreted frame)
> - 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(java.io.DataInputStream)
>  @bci=13, line=172 (Interpreted frame)
> - 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(org.apache.hadoop.hdfs.protocol.datatransfer.Op)
>  @bci=149, line=92 (Compiled frame)
> - org.apache.hadoop.hdfs.server.datanode.DataXceiver.run() @bci=510, line=232 
> (Compiled frame)
> - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame)
> {code}
> Has anybody seen this before?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7816) Unable to open webhdfs paths with "+"

2015-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329790#comment-14329790
 ] 

Jason Lowe commented on HDFS-7816:
--

I don't think we can rely on clients changing the way the URL is encoded, 
otherwise we break compatibility with older clients.

I think Kihwal's patch will work even with older clients.  My main concern is 
that we're relying on QueryStringDecoder#path to give us a raw path so URI can 
decode it properly.  The javadoc for that method says it returns a decoded 
path, and if that were ever fixed to match the javadoc then we'd end up 
double-decoding which will break for some paths.  It also seems weird to me 
that we're using a QueryStringDecoder to obtain parts of the URL that aren't 
query strings.  I think it would be safer to avoid QueryStringDecoder 
altogether for the path computation and just pass the original request URI 
string to the URI constructor for path decoding.

> Unable to open webhdfs paths with "+"
> -
>
> Key: HDFS-7816
> URL: https://issues.apache.org/jira/browse/HDFS-7816
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Kihwal Lee
>Priority: Blocker
> Attachments: HDFS-7816.patch, HDFS-7816.patch
>
>
> webhdfs requests to open files with % characters in the filename fail because 
> the filename is not being decoded properly.  For example:
> $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
> cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7816) Unable to open webhdfs paths with escape characters

2015-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329396#comment-14329396
 ] 

Jason Lowe commented on HDFS-7816:
--

Thanks, Haohui!  Sorry for the duplicate noise, I missed HDFS-6662 when filing 
this.

> Unable to open webhdfs paths with escape characters
> ---
>
> Key: HDFS-7816
> URL: https://issues.apache.org/jira/browse/HDFS-7816
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>
> webhdfs requests to open files with % characters in the filename fail because 
> the filename is not being decoded properly.  For example:
> $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
> cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods

2015-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329206#comment-14329206
 ] 

Jason Lowe commented on HDFS-7279:
--

We ran into an issue with webhdfs paths containing escape characters that seems 
to be related to this change.  See HDFS-7816.

> Use netty to implement DatanodeWebHdfsMethods
> -
>
> Key: HDFS-7279
> URL: https://issues.apache.org/jira/browse/HDFS-7279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, webhdfs
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.7.0
>
> Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, 
> HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, 
> HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch, 
> HDFS-7279.008.patch, HDFS-7279.009.patch, HDFS-7279.010.patch, 
> HDFS-7279.011.patch, HDFS-7279.012.patch, HDFS-7279.013.patch
>
>
> Currently the DN implements all related webhdfs functionality using jetty. As 
> the current jetty version the DN used (jetty 6) lacks of fine-grained buffer 
> and connection management, DN often suffers from long latency and OOM when 
> its webhdfs component is under sustained heavy load.
> This jira proposes to implement the webhdfs component in DN using netty, 
> which can be more efficient and allow more finer-grain controls on webhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7816) Unable to open webhdfs paths with escape characters

2015-02-20 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329199#comment-14329199
 ] 

Jason Lowe commented on HDFS-7816:
--

This appears to be broken after HDFS-7279.  What's happening is the URI is 
being properly encoded by the client but the datanode is not decoding it 
properly when it tries to act as a DFS client.

Looks like netty's QueryStringDecoder#path method, despite the javadoc stating 
it returns a decoded path, is not returning a decoded path from the URI.  It 
doesn't use a URI object to decode it, rather it just performs substrings on 
the URI string.  Even if you pass it a URI it uses the raw path, not the 
decoded path, for the URI and then returns a substring of that as the path.

As a result the datanode ends up using a non-decoded path and we have a path 
mismatch between what the client requested and what the datanode tries to open 
on their behalf.

> Unable to open webhdfs paths with escape characters
> ---
>
> Key: HDFS-7816
> URL: https://issues.apache.org/jira/browse/HDFS-7816
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Priority: Blocker
>
> webhdfs requests to open files with % characters in the filename fail because 
> the filename is not being decoded properly.  For example:
> $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
> cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7816) Unable to open webhdfs paths with escape characters

2015-02-20 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-7816:


 Summary: Unable to open webhdfs paths with escape characters
 Key: HDFS-7816
 URL: https://issues.apache.org/jira/browse/HDFS-7816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Jason Lowe
Priority: Blocker


webhdfs requests to open files with % characters in the filename fail because 
the filename is not being decoded properly.  For example:

$ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
cat: File does not exist: /user/somebody/abc%25def




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7533) Datanode sometimes does not shutdown on receiving upgrade shutdown command

2015-01-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274206#comment-14274206
 ] 

Jason Lowe commented on HDFS-7533:
--

The "-1 overall" is unrelated, see HADOOP-11473.

> Datanode sometimes does not shutdown on receiving upgrade shutdown command
> --
>
> Key: HDFS-7533
> URL: https://issues.apache.org/jira/browse/HDFS-7533
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Eric Payne
> Attachments: HDFS-7533.v1.txt
>
>
> When datanode is told to shutdown via the dfsadmin command during rolling 
> upgrade, it may not shutdown.  This is because not all writers have responder 
> running, but sendOOB() tries anyway. This causes NPE and the shutdown thread 
> dies, halting the shutdown after only shutting down DataXceiverServer. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7598) TestDFSClientCache.testEviction is not quite correct and fails with newer version of guava

2015-01-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274168#comment-14274168
 ] 

Jason Lowe commented on HDFS-7598:
--

I think the -1 overall was caused by the test-patch.sh change from 
HADOOP-11352.  I'm seeing it on other JIRAs recently as well.

> TestDFSClientCache.testEviction is not quite correct and fails with newer 
> version of guava
> --
>
> Key: HDFS-7598
> URL: https://issues.apache.org/jira/browse/HDFS-7598
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.6.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Minor
> Attachments: HDFS-7598.001.patch
>
>
> TestDFSClientCache.testEviction() is not entirely accurate in its usage of 
> the guava LoadingCache.
> It sets the max size at 2, but asserts the loading cache will contain only 1 
> entry after inserting two entries. Guava's CacheBuilder.maximumSize() makes 
> only the following promise:
> {panel}
> Specifies the maximum number of entries the cache may contain. Note that the 
> cache may evict an entry before this limit is exceeded.
> {panel}
> Thus, the only invariant is that the loading cache will hold the maximum size 
> number of entries or fewer. The DFSClientCache.testEviction asserts it holds 
> maximum size - 1 exactly.
> For guava 11.0.2 this happens to be true at maximum size = 2 because of the 
> way it sets the maximum segment weight. With later versions of guava, the 
> maximum segment weight is set higher, and the eviction is less aggressive.
> The test should be fixed to assert only the true invariant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-08 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163847#comment-14163847
 ] 

Jason Lowe commented on HDFS-7199:
--

bq.  But I also wonder why you are getting a non-IOException in the first 
place. That seems like a bug.

The bug in the case we encountered was bad hardware.  The JVM was glitching out 
and happened to generate a java.lang.VerifyError in the DataStreamer thread.  
Unfortunately due to this bug the reducer ended up with a "successful" run that 
generated a zero-length file, and the data was silently dropped.  We caught it 
later downstream when a subsequent job tried to consume the empty file.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Chen He
>Priority: Critical
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160974#comment-14160974
 ] 

Jason Lowe commented on HDFS-7199:
--

I believe the problem lies in the way DataStreamer is handling the error:
{code}
} catch (Throwable e) {
  // Log warning if there was a real error.
  if (restartingNodeIndex == -1) {
DFSClient.LOG.warn("DataStreamer Exception", e);
  }
  if (e instanceof IOException) {
setLastException((IOException)e);
  }
  hasError = true;
  if (errorIndex == -1 && restartingNodeIndex == -1) {
// Not a datanode issue
streamerClosed = true;
  }
}
{code}

We should either always call setLastException, wrapping the exception in an I/O 
exception if necessary, or at least set it to something if we're going to set 
streamerClosed=true and exit the datastreamer thread.  That way there will 
always be some kind of exception to be picked up either in checkClosed() or 
close() in the output stream.

> DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O 
> exception
> ---
>
> Key: HDFS-7199
> URL: https://issues.apache.org/jira/browse/HDFS-7199
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Priority: Critical
>
> If the DataStreamer thread encounters a non-I/O exception then it closes the 
> output stream but does not set lastException.  When the client later calls 
> close on the output stream then it will see the stream is already closed with 
> lastException == null, mistakently think this is a redundant close call, and 
> fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception

2014-10-06 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-7199:


 Summary: DFSOutputStream can silently drop data if DataStreamer 
crashes with a non-I/O exception
 Key: HDFS-7199
 URL: https://issues.apache.org/jira/browse/HDFS-7199
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.5.0
Reporter: Jason Lowe
Priority: Critical


If the DataStreamer thread encounters a non-I/O exception then it closes the 
output stream but does not set lastException.  When the client later calls 
close on the output stream then it will see the stream is already closed with 
lastException == null, mistakently think this is a redundant close call, and 
fail to report any error to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

2014-08-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115859#comment-14115859
 ] 

Jason Lowe commented on HDFS-6840:
--

The test failures appear to be unrelated, and they pass for me locally with 
this patch applied.

Looks good overall, just one nit.  This comment was left in the code and no 
longer applies:

{code}
// Seed is normally the block id
// This means we use the same pseudo-random order for each block, for
// potentially better page cache usage.
// Seed is not used if we want to randomize block location for every block
{code}


> Clients are always sent to the same datanode when read is off rack
> --
>
> Key: HDFS-6840
> URL: https://issues.apache.org/jira/browse/HDFS-6840
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: hdfs-6840.001.patch, hdfs-6840.002.patch
>
>
> After HDFS-6268 the sorting order of block locations is deterministic for a 
> given block and locality level (e.g.: local, rack. off-rack), so off-rack 
> clients all see the same datanode for the same block.  This leads to very 
> poor behavior in distributed cache localization and other scenarios where 
> many clients all want the same block data at approximately the same time.  
> The one datanode is crushed by the load while the other replicas only handle 
> local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6907) Source files missing license headers

2014-08-21 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved HDFS-6907.
--

Resolution: Duplicate

Dup of HDFS-6905.

> Source files missing license headers
> 
>
> Key: HDFS-6907
> URL: https://issues.apache.org/jira/browse/HDFS-6907
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.6.0
>Reporter: Arpit Agarwal
>
> The following files were committed without license headers as flagged by 
> Jenkins.
> {code}
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionFaultInjector.java
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EncryptionZoneManager.java
>  !? 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/EncryptionZoneWithId.java
> Lines that start with ? in the release audit report indicate files that 
> do not have an Apache license header.
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6846) NetworkTopology#sortByDistance should give nodes higher priority, which cache the block.

2014-08-18 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14101269#comment-14101269
 ] 

Jason Lowe commented on HDFS-6846:
--

That seems like a reasonable compromise if we don't need to worry about 
overwhelming a rack-local node.

I got the impression that the original problem behind HDFS-6268 was that the 
same rack-local node always appeared first for all blocks of a file which 
caused load issues on that node.  If a rack-local node cached all the blocks of 
a file then it seems like we'd be in the same place as that JIRA.  But maybe 
I'm misunderstanding HDFS-6268 or for some reason we don't need to worry about 
a single node getting all the blocks of a multi-block file cached.

> NetworkTopology#sortByDistance should give nodes higher priority, which cache 
> the block.
> 
>
> Key: HDFS-6846
> URL: https://issues.apache.org/jira/browse/HDFS-6846
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>
> Currently there are 3 weights:
> * local
> * same rack
> * off rack
> But if some nodes cache the block, then it's faster if client read block from 
> these nodes. So we should have some more weights as following:
> * local
> * cached & same rack
> * same rack
> * cached & off rack
> * off rack



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6846) NetworkTopology#sortByDistance should give nodes higher priority, which cache the block.

2014-08-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095807#comment-14095807
 ] 

Jason Lowe commented on HDFS-6846:
--

This could be very undesirable if a single node is the only one that has a 
cached block and suddenly the block becomes very popular (e.g.: during 
localization across many nodes in a large cluster).  Unless the block is highly 
replicated, most requests will be off-rack and the one node that has it cached 
will be hammered.  Having the block in memory doesn't help if the NIC saturates 
from the traffic.  I just want to make sure we don't end up with another form 
of HDFS-6840.

> NetworkTopology#sortByDistance should give nodes higher priority, which cache 
> the block.
> 
>
> Key: HDFS-6846
> URL: https://issues.apache.org/jira/browse/HDFS-6846
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Yi Liu
>Assignee: Yi Liu
>
> Currently there are 3 weights:
> * local
> * same rack
> * off rack
> But if some nodes cache the block, then it's faster if client read block from 
> these nodes. So we should have some more weights as following:
> * local
> * cached & same rack
> * same rack
> * cached & off rack
> * off rack



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

2014-08-12 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14094706#comment-14094706
 ] 

Jason Lowe commented on HDFS-6840:
--

Commenting out the setSeed call in NetworkTopology on our 2.5-based build fixes 
the issue, so I suspect changing the param would work as well in 2.6+.  Given 
how poorly off-rack load balancing is without the randomization, do we even 
want to keep the parameter added in HDFS-6701 at all?

> Clients are always sent to the same datanode when read is off rack
> --
>
> Key: HDFS-6840
> URL: https://issues.apache.org/jira/browse/HDFS-6840
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Priority: Critical
>
> After HDFS-6268 the sorting order of block locations is deterministic for a 
> given block and locality level (e.g.: local, rack. off-rack), so off-rack 
> clients all see the same datanode for the same block.  This leads to very 
> poor behavior in distributed cache localization and other scenarios where 
> many clients all want the same block data at approximately the same time.  
> The one datanode is crushed by the load while the other replicas only handle 
> local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

2014-08-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093613#comment-14093613
 ] 

Jason Lowe commented on HDFS-6840:
--

I think the previous behavior was not deterministic due to this change that was 
removed in the HDFS-6268 patch:

{code}
// put a random node at position 0 if it is not a local/local-rack node
if(tempIndex == 0 && localRackNode == -1 && nodes.length != 0) {
  swap(nodes, 0, r.nextInt(nodes.length));
{code}

The list used to be mostly deterministic, but the first node in the list (i.e.: 
the one clients are likely to be the only one to use) was random.

I have not done the bisect to prove without a doubt it was HDFS-6268, but we've 
run builds based on something 2.4.1+ and 2.5 and this behavior is brand-new 
with 2.5.  There weren't a lot of changes in the topology sorting arena besides 
this one between 2.4.1 and 2.5.0, and the code and JIRA for HDFS-6268 state 
it's intentionally not randomizing the datanode list between clients.  Besides 
the bisect approach I probably can try replacing the network topology class 
with the one from before HDFS-6268 and see if the behavior reverts to what it 
used to be.

> Clients are always sent to the same datanode when read is off rack
> --
>
> Key: HDFS-6840
> URL: https://issues.apache.org/jira/browse/HDFS-6840
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Priority: Critical
>
> After HDFS-6268 the sorting order of block locations is deterministic for a 
> given block and locality level (e.g.: local, rack. off-rack), so off-rack 
> clients all see the same datanode for the same block.  This leads to very 
> poor behavior in distributed cache localization and other scenarios where 
> many clients all want the same block data at approximately the same time.  
> The one datanode is crushed by the load while the other replicas only handle 
> local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6268) Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found

2014-08-11 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-6268:
-

Fix Version/s: 2.5.0

We also ran into the massive skew issue during localization that 
[~ashwinshankar77] encountered.  The result previously was not deterministic 
since it would swap in a random node to the first position if it wasn't local 
or rack-local, but now it always sends all off-rack requests to the same node.

Localization is a pretty common process, so I'm not sure sending most of the 
nodes to a single datanode is a good default.  Filed HDFS-6840 to discuss this 
further.

> Better sorting in NetworkTopology#pseudoSortByDistance when no local node is 
> found
> --
>
> Key: HDFS-6268
> URL: https://issues.apache.org/jira/browse/HDFS-6268
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Minor
> Fix For: 3.0.0, 2.5.0
>
> Attachments: hdfs-6268-1.patch, hdfs-6268-2.patch, hdfs-6268-3.patch, 
> hdfs-6268-4.patch, hdfs-6268-5.patch, hdfs-6268-branch-2.001.patch
>
>
> In NetworkTopology#pseudoSortByDistance, if no local node is found, it will 
> always place the first rack local node in the list in front.
> This became an issue when a dataset was loaded from a single datanode. This 
> datanode ended up being the first replica for all the blocks in the dataset. 
> When running an Impala query, the non-local reads when reading past a block 
> boundary were all hitting this node, meaning massive load skew.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

2014-08-11 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14093584#comment-14093584
 ] 

Jason Lowe commented on HDFS-6840:
--

HDFS-6701 gives the option to randomize the returned datanodes but the default 
is off.  I'm not sure if defaulting to off is a good thing, given the 
significantly different load behavior and heavy skew to the one datanode.  If 
that skew is desired then I think it should be opted-in rather than having to 
opt-out to avoid the skew.

> Clients are always sent to the same datanode when read is off rack
> --
>
> Key: HDFS-6840
> URL: https://issues.apache.org/jira/browse/HDFS-6840
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Priority: Critical
>
> After HDFS-6268 the sorting order of block locations is deterministic for a 
> given block and locality level (e.g.: local, rack. off-rack), so off-rack 
> clients all see the same datanode for the same block.  This leads to very 
> poor behavior in distributed cache localization and other scenarios where 
> many clients all want the same block data at approximately the same time.  
> The one datanode is crushed by the load while the other replicas only handle 
> local and rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6840) Clients are always sent to the same datanode when read is off rack

2014-08-11 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-6840:


 Summary: Clients are always sent to the same datanode when read is 
off rack
 Key: HDFS-6840
 URL: https://issues.apache.org/jira/browse/HDFS-6840
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Jason Lowe
Priority: Critical


After HDFS-6268 the sorting order of block locations is deterministic for a 
given block and locality level (e.g.: local, rack. off-rack), so off-rack 
clients all see the same datanode for the same block.  This leads to very poor 
behavior in distributed cache localization and other scenarios where many 
clients all want the same block data at approximately the same time.  The one 
datanode is crushed by the load while the other replicas only handle local and 
rack-local requests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6770) stop-dfs.sh/start-dfs.sh breaks if native lib not installed

2014-07-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078228#comment-14078228
 ] 

Jason Lowe commented on HDFS-6770:
--

I believe this is a duplicate of HDFS-4427.

> stop-dfs.sh/start-dfs.sh breaks if native lib not installed
> ---
>
> Key: HDFS-6770
> URL: https://issues.apache.org/jira/browse/HDFS-6770
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>
> Because the native library warning goes to stdout, it gets caught up in the 
> output that gets captured to determine which nodes the script is supposed to 
> contact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-6421:


 Summary: RHEL4 fails to compile vecsum.c
 Key: HDFS-6421
 URL: https://issues.apache.org/jira/browse/HDFS-6421
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Affects Versions: 2.5.0
 Environment: RHEL4
Reporter: Jason Lowe


After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit compatibility 
environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6421) RHEL4 fails to compile vecsum.c

2014-05-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000268#comment-14000268
 ] 

Jason Lowe commented on HDFS-6421:
--

Colin mentioned on HDFS-6287 that we may also need to address the include of 
malloc.h on FreeBSD.

> RHEL4 fails to compile vecsum.c
> ---
>
> Key: HDFS-6421
> URL: https://issues.apache.org/jira/browse/HDFS-6421
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.5.0
> Environment: RHEL4
>Reporter: Jason Lowe
>
> After HDFS-6287 RHEL4 builds fail trying to compile vecsum.c since they don't 
> have RUSAGE_THREAD.  RHEL4 is ancient, but we use it in a 32-bit 
> compatibility environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6287) Add vecsum test of libhdfs read access times

2014-05-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000269#comment-14000269
 ] 

Jason Lowe commented on HDFS-6287:
--

I commented initially since I wasn't sure if you wanted it addressed here or 
separately.  Filed HDFS-6421.

> Add vecsum test of libhdfs read access times
> 
>
> Key: HDFS-6287
> URL: https://issues.apache.org/jira/browse/HDFS-6287
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: libhdfs, test
>Affects Versions: 2.5.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, 
> HDFS-6287.003.patch, HDFS-6287.004.patch, HDFS-6287.005.patch, 
> HDFS-6287.006.patch
>
>
> Add vecsum, a benchmark that tests libhdfs access times.  This includes 
> short-circuit, zero-copy, and standard libhdfs access modes.  It also has a 
> local filesystem mode for comparison.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6287) Add vecsum test of libhdfs read access times

2014-05-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999835#comment-13999835
 ] 

Jason Lowe commented on HDFS-6287:
--

This change breaks the build on RHEL4 because it doesn't have RUSAGE_THREAD.  
Yes RHEL4 is ancient, but we build against it in a 32-bit compatibility 
environment.

> Add vecsum test of libhdfs read access times
> 
>
> Key: HDFS-6287
> URL: https://issues.apache.org/jira/browse/HDFS-6287
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: libhdfs, test
>Affects Versions: 2.5.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: HDFS-6282.001.patch, HDFS-6287.002.patch, 
> HDFS-6287.003.patch, HDFS-6287.004.patch, HDFS-6287.005.patch, 
> HDFS-6287.006.patch
>
>
> Add vecsum, a benchmark that tests libhdfs access times.  This includes 
> short-circuit, zero-copy, and standard libhdfs access modes.  It also has a 
> local filesystem mode for comparison.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-4427) start-dfs.sh generates malformed ssh command when not running with native libs

2014-03-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918094#comment-13918094
 ] 

Jason Lowe commented on HDFS-4427:
--

Native libs are preferable, but if native libs aren't an option another 
workaround is to explicitly set hadoop.security.group.mapping to 
org.apache.hadoop.security.ShellBasedUnixGroupsMapping in core-site.xml.

> start-dfs.sh generates malformed ssh command when not running with native libs
> --
>
> Key: HDFS-4427
> URL: https://issues.apache.org/jira/browse/HDFS-4427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Reporter: Jason Lowe
>Assignee: Robert Parker
>
> After HADOOP-8712 the start-dfs.sh script is generating malformed ssh 
> commands when the native hadoop libraries are not present.  This is because 
> {{hdfs getconf}} is printing a warning, and that warning is accidentally 
> interpreted as one of the machines to target for ssh.
> Here's an example output of hdfs getconf:
> {noformat}
> $ hdfs getconf -namenodes 2>/dev/null
> 2013-01-22 21:03:59,543 WARN  util.NativeCodeLoader 
> (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> localhost
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6021) NPE in FSImageFormatProtobuf upgrading from layout -52 to -53

2014-02-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved HDFS-6021.
--

Resolution: Duplicate

Resolving as a dup of HDFS-5988 since that seems like the most likely culprit.  
I'll reopen if it occurs again.  Thanks Andrew!

> NPE in FSImageFormatProtobuf upgrading from layout -52 to -53
> -
>
> Key: HDFS-6021
> URL: https://issues.apache.org/jira/browse/HDFS-6021
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jason Lowe
>
> While updating my trunk instance the namenode refused to startup because the 
> layout version needed to be upgraded.  When attempting the upgrade ran into 
> an NPE in FSImageFormatProtobuf.  Full stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-6021) NPE in FSImageFormatProtobuf upgrading from layout -52 to -53

2014-02-27 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914592#comment-13914592
 ] 

Jason Lowe commented on HDFS-6021:
--

Seems likely.  I've been upgrading all along on my trunk instance and have not 
received failures like this until recently.   HDFS-5988 doesn't mention what 
change broke the upgrade process.  In other words, do we know what window of 
time upgrades were broken?

> NPE in FSImageFormatProtobuf upgrading from layout -52 to -53
> -
>
> Key: HDFS-6021
> URL: https://issues.apache.org/jira/browse/HDFS-6021
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jason Lowe
>
> While updating my trunk instance the namenode refused to startup because the 
> layout version needed to be upgraded.  When attempting the upgrade ran into 
> an NPE in FSImageFormatProtobuf.  Full stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5806) balancer should set SoTimeout to avoid indefinite hangs

2014-02-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5806:
-

Fix Version/s: 0.23.11

Thanks, Nathan!  I committed this to branch-0.23.

> balancer should set SoTimeout to avoid indefinite hangs
> ---
>
> Key: HDFS-5806
> URL: https://issues.apache.org/jira/browse/HDFS-5806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 0.23.11, 2.3.0
>
> Attachments: HDFS-5806-0.23.patch, HDFS-5806.patch
>
>
> Simple patch to avoid the balancer hanging when datanode stops responding to 
> requests. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5806) balancer should set SoTimeout to avoid indefinite hangs

2014-02-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913441#comment-13913441
 ] 

Jason Lowe commented on HDFS-5806:
--

+1 for branch-0.23 patch, committing this.

> balancer should set SoTimeout to avoid indefinite hangs
> ---
>
> Key: HDFS-5806
> URL: https://issues.apache.org/jira/browse/HDFS-5806
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
> Fix For: 2.3.0
>
> Attachments: HDFS-5806-0.23.patch, HDFS-5806.patch
>
>
> Simple patch to avoid the balancer hanging when datanode stops responding to 
> requests. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-6021) NPE in FSImageFormatProtobuf upgrading from layout -52 to -53

2014-02-26 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-6021:


 Summary: NPE in FSImageFormatProtobuf upgrading from layout -52 to 
-53
 Key: HDFS-6021
 URL: https://issues.apache.org/jira/browse/HDFS-6021
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jason Lowe


While updating my trunk instance the namenode refused to startup because the 
layout version needed to be upgraded.  When attempting the upgrade ran into an 
NPE in FSImageFormatProtobuf.  Full stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-6021) NPE in FSImageFormatProtobuf upgrading from layout -52 to -53

2014-02-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913165#comment-13913165
 ] 

Jason Lowe commented on HDFS-6021:
--

Stacktrace:
{noformat}
2014-02-26 17:03:11,755 FATAL [main] namenode.NameNode 
(NameNode.java:main(1351)) - Exception in namenode join
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:227)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:169)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:225)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:802)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:792)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:624)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:593)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:331)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:251)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:882)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:641)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:647)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:632)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1280)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1346)
{noformat}

> NPE in FSImageFormatProtobuf upgrading from layout -52 to -53
> -
>
> Key: HDFS-6021
> URL: https://issues.apache.org/jira/browse/HDFS-6021
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jason Lowe
>
> While updating my trunk instance the namenode refused to startup because the 
> layout version needed to be upgraded.  When attempting the upgrade ran into 
> an NPE in FSImageFormatProtobuf.  Full stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5293) Symlink resolution requires unnecessary RPCs

2014-01-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5293:
-

Target Version/s: 3.0.0, 2.4.0  (was: 3.0.0)

> Symlink resolution requires unnecessary RPCs
> 
>
> Key: HDFS-5293
> URL: https://issues.apache.org/jira/browse/HDFS-5293
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha, 3.0.0
>Reporter: Daryn Sharp
>Priority: Critical
>
> When the NN encounters a symlink, it throws an {{UnresolvedLinkException}}.  
> This exception contains only the path that is a symlink.  The client issues 
> another RPC to obtain the link target, followed by another RPC with the link 
> target + remainder of the original path.
> {{UnresolvedLinkException}} should be returning both the link and the target 
> to avoid a costly and unnecessary intermediate RPC to obtain the link target.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA

2014-01-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5138:
-

Target Version/s: 2.4.0  (was: )

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >