[jira] [Commented] (HDFS-5688) Wire-encription in QJM

2014-02-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908067#comment-13908067
 ] 

Suresh Srinivas commented on HDFS-5688:
---

[~wheat9], can you please comment on this issue?

> Wire-encription in QJM
> --
>
> Key: HDFS-5688
> URL: https://issues.apache.org/jira/browse/HDFS-5688
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, security
>Affects Versions: 2.2.0
>Reporter: Juan Carlos Fernandez
>Priority: Blocker
>  Labels: security
> Attachments: core-site.xml, hdfs-site.xml, jaas.conf, ssl-client.xml, 
> ssl-server.xml
>
>
> When HA is implemented with QJM and using kerberos, it's not possible to set 
> wire-encrypted data.
> If it's set property hadoop.rpc.protection to something different to 
> authentication it doesn't work propertly, getting the error:
> ERROR security.UserGroupInformation: PriviledgedActionException 
> as:principal@REALM (auth:KERBEROS) cause:javax.security.sasl.SaslException: 
> No common protection layer between client and server
> With NFS as shared storage everything works like a charm



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5993) org.apache.hadoop.fs.loadGenerator.TestLoadGenerator failure in trunk

2014-02-20 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao resolved HDFS-5993.
-

Resolution: Duplicate

Closed as duplicate of HADOOP-10355. But thanks for the report [~yzhangal]!

> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator failure in trunk
> -
>
> Key: HDFS-5993
> URL: https://issues.apache.org/jira/browse/HDFS-5993
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CentOS release 6.5 (Final)
> cpe:/o:centos:linux:6:GA
>Reporter: Yongjun Zhang
>
> With today's latest trunk at
> commit d926e51bdc27f08e916534567a1edcfd994e2784
> When running it locally, I can consistently see the following test fails as:
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.888 sec 
> <<< FAILURE! - in org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
> testLoadGenerator(org.apache.hadoop.fs.loadGenerator.TestLoadGenerator)  Time 
> elapsed: 14.285 sec  <<< ERROR!
> java.io.IOException: Stream closed
> at java.io.BufferedReader.ensureOpen(BufferedReader.java:115)
> at java.io.BufferedReader.readLine(BufferedReader.java:310)
> at java.io.BufferedReader.readLine(BufferedReader.java:382)
> at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
> at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
> at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
> at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> Results :
> Tests in error:
>   TestLoadGenerator.testLoadGenerator:231 » IO Stream closed
> This failure is also reported in one upstream test for HDFS-5939 patch.
> (I can see the same problem locally without applying this patch).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5993) org.apache.hadoop.fs.loadGenerator.TestLoadGenerator failure in trunk

2014-02-20 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5993:
---

 Summary: org.apache.hadoop.fs.loadGenerator.TestLoadGenerator 
failure in trunk
 Key: HDFS-5993
 URL: https://issues.apache.org/jira/browse/HDFS-5993
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: CentOS release 6.5 (Final)
cpe:/o:centos:linux:6:GA

Reporter: Yongjun Zhang


With today's latest trunk at
commit d926e51bdc27f08e916534567a1edcfd994e2784

When running it locally, I can consistently see the following test fails as:

---
 T E S T S
---

---
 T E S T S
---
Running org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.888 sec <<< 
FAILURE! - in org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
testLoadGenerator(org.apache.hadoop.fs.loadGenerator.TestLoadGenerator)  Time 
elapsed: 14.285 sec  <<< ERROR!
java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(BufferedReader.java:115)
at java.io.BufferedReader.readLine(BufferedReader.java:310)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at 
org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
at 
org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
at 
org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
at 
org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)

Results :

Tests in error:
  TestLoadGenerator.testLoadGenerator:231 » IO Stream closed

This failure is also reported in one upstream test for HDFS-5939 patch.
(I can see the same problem locally without applying this patch).




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908054#comment-13908054
 ] 

Hadoop QA commented on HDFS-5274:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630259/ss-5274v8-get.png
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6204//console

This message is automatically generated.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> HDFS-5274-7.patch, HDFS-5274-8.patch, Zipkin   Trace a06e941b0172ec73.png, 
> Zipkin   Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908051#comment-13908051
 ] 

Yongjun Zhang commented on HDFS-5939:
-

I think the two failed tests are irrelevant to the patch I submitted:

   org.apache.hadoop.hdfs.TestSafeMode.testInitializeReplQueuesEarly
   org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator

The latest patch I submitted is no different than previous one, except some 
cosmetic changes. My previous versions 
passed all tests successfully.

I can consistently reproduce TestLoadGenerator failure locally with and without 
my changes.
My local run of TestSafeMode is always successful with and without my change. 
So upstream failure may be specific to upstream test env. 


> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch, 
> HDFS-5939.003.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5981) PBImageXmlWriter generates malformed XML

2014-02-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5981:


   Resolution: Fixed
Fix Version/s: 2.4.0
   3.0.0
   Status: Resolved  (was: Patch Available)

I committed this patch to trunk, branch-2 and branch-2.4.  Thank you for the 
patch, [~wheat9],.

> PBImageXmlWriter generates malformed XML
> 
>
> Key: HDFS-5981
> URL: https://issues.apache.org/jira/browse/HDFS-5981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, 
> HDFS-5981.002.patch, HDFS-5981.003.patch
>
>
> {{PBImageXmlWriter}} outputs malformed XML file because it closes the 
> {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} 
> incorrectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5981) PBImageXmlWriter generates malformed XML

2014-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908039#comment-13908039
 ] 

Hudson commented on HDFS-5981:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5202 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5202/])
HDFS-5981. loadGenerator exit code is not reliable. Contributed by Haohui Mai. 
(cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570468)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/OfflineImageViewerPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageXmlWriter.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java


> PBImageXmlWriter generates malformed XML
> 
>
> Key: HDFS-5981
> URL: https://issues.apache.org/jira/browse/HDFS-5981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, 
> HDFS-5981.002.patch, HDFS-5981.003.patch
>
>
> {{PBImageXmlWriter}} outputs malformed XML file because it closes the 
> {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} 
> incorrectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-02-20 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908033#comment-13908033
 ] 

Masatake Iwasaki commented on HDFS-5274:


bq. Masatake Iwasaki You know, I was thinking... Maybe it ok that there are so 
many spans? Tracing doesn't cost unless enabled. When debugging, you might want 
to see in the trace that HDFS is doing a bunch of small reads?

I just missed your comment on uploading v.8 patch.
Because there were so many spans than I expected and receiver's queue was 
filled, I think disabling those spans is safer as a starting point.

{noformat}
14/02/19 22:25:23 ERROR impl.ZipkinSpanReceiver: Error trying to append span 
(DFSOutputStream.write) to the queue.  Blocking Queue was full.
{noformat}


> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> HDFS-5274-7.patch, HDFS-5274-8.patch, Zipkin   Trace a06e941b0172ec73.png, 
> Zipkin   Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908027#comment-13908027
 ] 

Hadoop QA commented on HDFS-5939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630229/HDFS-5939.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestSafeMode
  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6202//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6202//console

This message is automatically generated.

> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch, 
> HDFS-5939.003.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5274) Add Tracing to HDFS

2014-02-20 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-5274:
---

Attachment: ss-5274v8-get.png
ss-5274v8-put.png
HDFS-5274-8.patch

I am attaching updated patch and screen shots of trace of putting and getting a 
200MB file.

bq. Fix these in next patch:

fixed.

bq. Is formatting ok here?

fixed.

bq. In BlockReceiver, should traceSpan be getting closed?

added description to span and calling close().

{quote}
Is it possible that below throws an exception?

+ scope.getSpan().addKVAnnotation(
+ "stream".getBytes(),
+ jas.getCurrentStream().toString().getBytes());

i..e. we can hope out w/o closing the span since the try/finally only happens 
later.

This is in JournalSet in a few places.
{quote}

I moved these code in try block to make sure.

bq. TraceInfo and RPCTInfo seem to be same datastructure? Should we define it 
onetime only and share?'

I prefer keeping this as is because of simplicity and independency between 
datatransfer protocol and o.a.h.ipc.


bq. I checked the trace of putting and getting a big file by Zipkin today. 
There seems to be too many spans concerning "DFSInputStream.read" and 
"DFSOutputStream.write". I will fix this in the next version of patch.

just removed those spans from DFSInputStream and DFSOutputStream.


> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> HDFS-5274-7.patch, HDFS-5274-8.patch, Zipkin   Trace a06e941b0172ec73.png, 
> Zipkin   Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5396) FSImage.getFsImageName should check whether fsimage exists

2014-02-20 Thread zhaoyunjiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhaoyunjiong updated HDFS-5396:
---

Attachment: HDFS-5396-branch-1.2.patch

Update patch.

> FSImage.getFsImageName should check whether fsimage exists
> --
>
> Key: HDFS-5396
> URL: https://issues.apache.org/jira/browse/HDFS-5396
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.1
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Fix For: 1.3.0
>
> Attachments: HDFS-5396-branch-1.2.patch, HDFS-5396-branch-1.2.patch
>
>
> In https://issues.apache.org/jira/browse/HDFS-5367, fsimage may not write to 
> all IMAGE dir, so we need to check whether fsimage exists before 
> FSImage.getFsImageName returned.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2014-02-20 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907955#comment-13907955
 ] 

Vinayakumar B commented on HDFS-5496:
-

These failures are not there in the second patch's test report. 
I think since first patch was missing LightWeightGSet changes, so those are 
failed.

> Make replication queue initialization asynchronous
> --
>
> Key: HDFS-5496
> URL: https://issues.apache.org/jira/browse/HDFS-5496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Kihwal Lee
>Assignee: Vinayakumar B
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
> HDFS-5496.patch, HDFS-5496.patch
>
>
> Today, initialization of replication queues blocks safe mode exit and certain 
> HA state transitions. For a big name space, this can take hundreds of seconds 
> with the FSNamesystem write lock held.  During this time, important requests 
> (e.g. initial block reports, heartbeat, etc) are blocked.
> The effect of delaying the initialization would be not starting replication 
> right away, but I think the benefit outweighs. If we make it asynchronous, 
> the work per iteration should be limited, so that the lock duration is 
> capped. 
> If full/incremental block reports and any other requests that modifies block 
> state properly performs replication checks while the blocks are scanned and 
> the queues populated in background, every block will be processed. (Some may 
> be done twice)  The replication monitor should run even before all blocks are 
> processed.
> This will allow namenode to exit safe mode and start serving immediately even 
> with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5535:
-

Attachment: h5535_20140220b.patch

h5535_20140220b.patch: includes HDFS-5992.

> Umbrella jira for improved HDFS rolling upgrades
> 
>
> Key: HDFS-5535
> URL: https://issues.apache.org/jira/browse/HDFS-5535
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ha, hdfs-client, namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Nathan Roberts
> Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, 
> h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch
>
>
> In order to roll a new HDFS release through a large cluster quickly and 
> safely, a few enhancements are needed in HDFS. An initial High level design 
> document will be attached to this jira, and sub-jiras will itemize the 
> individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5992) Fix NPE in MD5FileUtils

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5992:
-

Attachment: editsStored
h5992_20140220.patch

h5992_20140220.patch: fix MD5FileUtils and TestOfflineEditsViewer.
editsStored: the new binary file.

> Fix NPE in MD5FileUtils
> ---
>
> Key: HDFS-5992
> URL: https://issues.apache.org/jira/browse/HDFS-5992
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: editsStored, h5992_20140220.patch
>
>
> MD5FileUtils.readStoredMd5(File md5File)  may return null but the callers may 
> not check it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5778) Document new commands and parameters for improved rolling upgrades

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5778:
-

Attachment: (was: h5992_20140220.patch)

> Document new commands and parameters for improved rolling upgrades
> --
>
> Key: HDFS-5778
> URL: https://issues.apache.org/jira/browse/HDFS-5778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Akira AJISAKA
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5778_20140220.patch
>
>
> "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and 
> some other commands and parameters will be added in the future. This issue 
> exists to flag undocumented commands and parameters when HDFS-5535 branch is 
> merging to trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5778) Document new commands and parameters for improved rolling upgrades

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5778:
-

Attachment: h5778_20140220.patch
h5992_20140220.patch

h5778_20140220.patch: wrote a few sections but not yet finished.

> Document new commands and parameters for improved rolling upgrades
> --
>
> Key: HDFS-5778
> URL: https://issues.apache.org/jira/browse/HDFS-5778
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: HDFS-5535 (Rolling upgrades)
>Reporter: Akira AJISAKA
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5778_20140220.patch
>
>
> "hdfs dfsadmin -rollingUpgrade" command was newly added in HDFS-5752, and 
> some other commands and parameters will be added in the future. This issue 
> exists to flag undocumented commands and parameters when HDFS-5535 branch is 
> merging to trunk.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-02-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907945#comment-13907945
 ] 

stack commented on HDFS-5274:
-

[~iwasakims] You know, I was thinking... Maybe it ok that there are so many 
spans?  Tracing doesn't cost unless enabled.  When debugging, you might want to 
see in the trace that HDFS is doing a bunch of small reads?

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> HDFS-5274-7.patch, Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace 
> d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907930#comment-13907930
 ] 

Hadoop QA commented on HDFS-5935:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630215/HDFS-5935-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6201//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6201//console

This message is automatically generated.

> New Namenode UI FS browser should throw smarter error messages
> --
>
> Key: HDFS-5935
> URL: https://issues.apache.org/jira/browse/HDFS-5935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Travis Thompson
>Assignee: Travis Thompson
>Priority: Minor
> Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch, 
> HDFS-5935-4.patch
>
>
> When browsing using the new FS browser in the namenode, if I try to browse a 
> folder that I don't have permission to view, it throws the error:
> {noformat}
> Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: 
> Forbidden
> WebHDFS might be disabled. WebHDFS is required to browse the filesystem.
> {noformat}
> The reason I'm not allowed to see /system is because I don't have permission, 
> not because WebHDFS is disabled.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5992) Fix NPE in MD5FileUtils

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-5992:


 Summary: Fix NPE in MD5FileUtils
 Key: HDFS-5992
 URL: https://issues.apache.org/jira/browse/HDFS-5992
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


MD5FileUtils.readStoredMd5(File md5File)  may return null but the callers may 
not check it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Comment Edited] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

2014-02-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907687#comment-13907687
 ] 

Suresh Srinivas edited comment on HDFS-5840 at 2/21/14 3:38 AM:


[~atm], sorry for the late reply. I had lost track of this.

{quote}
As for handling the partial upgrade failure as you've described, I'd like to 
add one more RPC call to the JournalManager to initiate analysis/recovery of 
the storage dirs upon first contact, and then refactor the contents of 
FSImage#recoverStorageDirs into NNUpgradeUtil just like was done with the other 
upgrade-related procedures. If this sounds OK to you, I'll go ahead and add 
that stuff and appropriate tests.
{quote}
Why not always recover in preupgrade step, instead of adding another RPC?

With rolling upgrade getting ready, some of the functionality added in that may 
be useful. For partial failures related to JournalNodes, the choice made in 
that feature to make the operation to rollback JournalNode idempotent. It looks 
like lot of rolling upgrade related code can be leveraged here, since upgrade 
is a special case of rolling upgrade. Should we explore that?


was (Author: sureshms):
[~atm], sorry for the late reply. I had lost track of this.

{quote}
As for handling the partial upgrade failure as you've described, I'd like to 
add one more RPC call to the JournalManager to initiate analysis/recovery of 
the storage dirs upon first contact, and then refactor the contents of 
FSImage#recoverStorageDirs into NNUpgradeUtil just like was done with the other 
upgrade-related procedures. If this sounds OK to you, I'll go ahead and add 
that stuff and appropriate tests.
{quote}
Why not always recover in preupgrade/upgrade step, instead of adding another 
RPC?

With rolling upgrade getting ready, some of the functionality added in that may 
be useful. For partial failures related to JournalNodes, the choice made in 
that feature to make the operation to rollback JournalNode idempotent. It looks 
like lot of rolling upgrade related code can be leveraged here, since upgrade 
is a special case of rolling upgrade. Should we explore that?

> Follow-up to HDFS-5138 to improve error handling during partial upgrade 
> failures
> 
>
> Key: HDFS-5840
> URL: https://issues.apache.org/jira/browse/HDFS-5840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 3.0.0
>
> Attachments: HDFS-5840.patch
>
>
> Suresh posted some good comment in HDFS-5138 after that patch had already 
> been committed to trunk. This JIRA is to address those. See the first comment 
> of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907921#comment-13907921
 ] 

Hadoop QA commented on HDFS-5535:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12630196/h5535_20140220-1554.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 39 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
  
org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots
  
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives
  org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults
  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
  org.apache.hadoop.hdfs.TestRollingUpgrade
  org.apache.hadoop.hdfs.TestRollingUpgradeRollback
  org.apache.hadoop.hdfs.qjournal.server.TestJournalNode
  org.apache.hadoop.hdfs.util.TestMD5FileUtils
  org.apache.hadoop.hdfs.qjournal.TestNNWithQJM
  org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade
  
org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager
  org.apache.hadoop.hdfs.server.namenode.TestStartup
  org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

org.apache.hadoop.hdfs.server.namenode.TestBackupNode
org.apache.hadoop.hdfs.server.namenode.TestCheckpoint

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6199//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6199//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6199//console

This message is automatically generated.

> Umbrella jira for improved HDFS rolling upgrades
> 
>
> Key: HDFS-5535
> URL: https://issues.apache.org/jira/browse/HDFS-5535
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ha, hdfs-client, namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Nathan Roberts
> Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, 
> h5535_20140219.patch, h5535_20140220-1554.patch
>
>
> In order to roll a new HDFS release through a large cluster quickly and 
> safely, a few enhancements are needed in HDFS. An initial High level design 
> document will be attached to this jira, and sub-jiras will itemize the 
> individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5064) Standby checkpoints should not block concurrent readers

2014-02-20 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907920#comment-13907920
 ] 

Andrew Wang commented on HDFS-5064:
---

Hi ATM, I looked at this patch. It needs a small rebase for the lock fairness 
change, but I was still able to review. I have just one nit: 64-bit reads are 
not atomic in the current Java memory model, so we need to slap a volatile on 
{{NNStorage#mostRecentCheckpointId}} since the getter is no longer synchronized.

At a high-level, this makes sense to me as an intermediate solution for the 
specific issue of the SbNN and checkpointing, until we actually separate out 
block management from the namespace. Kihwal, do you have any reservations about 
this approach?

Otherwise, I'm +1 for this change pending rebase and Jenkins.

> Standby checkpoints should not block concurrent readers
> ---
>
> Key: HDFS-5064
> URL: https://issues.apache.org/jira/browse/HDFS-5064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, namenode
>Affects Versions: 2.3.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Attachments: HDFS-5064.patch
>
>
> We've observed an issue which causes fetches of the {{/jmx}} page of the NN 
> to take a long time to load when the standby is in the process of creating a 
> checkpoint.
> Even though both creating the checkpoint and gathering the statistics for 
> {{/jmx}} take only the FSNS read lock, the issue is that since the FSNS uses 
> a _fair_ RW lock, a single writer attempting to get the lock will block all 
> threads attempting to get only the read lock for the duration of the 
> checkpoint. This will cause {{/jmx}}, and really any thread only attempting 
> to get the read lock, to block for the duration of the checkpoint, even 
> though they should be able to proceed concurrently with the checkpointing 
> thread.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-02-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907917#comment-13907917
 ] 

Suresh Srinivas commented on HDFS-5274:
---

bq. Wouldn't adding htrace to the common pom.xml make it "...available in 
Hadoop common"? Thanks.
I agree with the comment [~cutting] had made - 
https://issues.apache.org/jira/browse/HADOOP-10311?focusedCommentId=13886809&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13886809.
 I would love to see the code in Hadoop community itself, if possible. Agreed, 
this is not different using google guava or other such libraries. But I am 
afraid this could start a trend of capabilities hosted/attributed especially to 
 vendor companies making its way into Hadoop.

With that said, I am -0 and would rather not see this becoming a trend.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> HDFS-5274-7.patch, Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace 
> d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5988) Bad fsimage always generated after upgrade

2014-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907910#comment-13907910
 ] 

Hudson commented on HDFS-5988:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5201 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5201/])
HDFS-5988. Bad fsimage always generated after upgrade. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570429)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/LsrPBImage.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgradeFromImage.java


> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5988) Bad fsimage always generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5988:
--

   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.4. Thanks for the quick +1 Jing!

> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Fix For: 2.4.0
>
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5988) Bad fsimage always generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907891#comment-13907891
 ] 

Andrew Wang commented on HDFS-5988:
---

I believe the test failure is HDFS-5991, known flake. Will commit.

> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5988) Bad fsimage always generated after upgrade

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907882#comment-13907882
 ] 

Hadoop QA commented on HDFS-5988:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630183/hdfs-5988-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6198//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6198//console

This message is automatically generated.

> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.

2014-02-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907875#comment-13907875
 ] 

Colin Patrick McCabe commented on HDFS-5957:


I talked to [~kkambatl] about this.  It seems that YARN is monitoring the 
process' {{RSS}} (resident set size), which does seem to include the physical 
memory taken up by memory-mapped files.  I think this is unfortunate.  The 
physical memory taken up by mmapped files is basically part of the page cache.  
If there is any memory pressure at all, it's easy to purge this memory (the 
pages are "clean")  Charging an application for this memory is similar to 
charging it for the page cache consumed by calls to read(2)-- it doesn't really 
make sense for this application.  I think this is a problem within YARN, which 
has to be fixed inside YARN.

bq. It sounds like you really do need a deterministic way to trigger the munmap 
calls, i.e. LRU caching or no caching at all described above.

The {{munmap}} calls are deterministic now.  You can control the number of 
unused mmaps that we'll store by changing {{dfs.client.mmap.cache.size}}.

It's very important to keep in mind that {{dfs.client.mmap.cache.size}} 
controls the size of the cache, *not* the total number of mmaps.  So if my 
application has 10 threads that each use an mmap at a time, and the maximum 
cache size is 10, I may have 20 mmaps in existence at any given time.  The 
maximum size of any mmap is going to be the size of a block, so you should be 
able to use this to calculate how much RSS you will need.

bq. For small 200Gb data-sets (~1.4x tasks per container), ZCR does give a perf 
boost because we get to use HADOOP-10047 instead of shuffling it between byte[] 
buffers for decompression.

As a workaround, have you considered reading into a direct {{ByteBuffer}} that 
you allocated yourself?  {{DFSInputStream}} implements the 
{{ByteBufferReadable}} interface, which lets you read into any {{ByteBuffer}}.  
This would avoid the array copy that you're talking about.

I hope we can fix this within YARN soon, since otherwise the perf benefit of 
zero-copy reads will be substantially reduced or eliminated (as well as 
people's ability to use ZCR in the first place)

> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> -
>
> Key: HDFS-5957
> URL: https://issues.apache.org/jira/browse/HDFS-5957
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907872#comment-13907872
 ] 

Yongjun Zhang commented on HDFS-5939:
-

Hi Haohui and Tsz, 

Thanks a lot for your earlier review and the good info you provided. 

I just uploaded a modified version (003) to address all the comments. I found a 
bug when I'm doing the test (filed HDFS-5989). After working around HDFS-5989, 
my test of the updated fix is fine. 

Would you please review again and help to commit it if it's fine with you?

Thanks.


 

> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch, 
> HDFS-5939.003.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-20 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-5939:


Attachment: HDFS-5939.003.patch

> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch, 
> HDFS-5939.003.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-02-20 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907863#comment-13907863
 ] 

Liang Xie commented on HDFS-5776:
-

bq. little discernible overall difference in spite of my flushing file system 
cache
you need to have a huge test data size than physical memory, such that lots of 
HBase read will come to disks, if the disk contend is big enough(e.g. await 
from iostat reached tens of ms, even hundreds of ms), then the slow disk will 
make the difference obviously:)  That's why i set up my test env with only one 
sata disk per dn instance,  that will need less test data be loaded to observe 
a difference.

> Support 'hedged' reads in DFSClient
> ---
>
> Key: HDFS-5776
> URL: https://issues.apache.org/jira/browse/HDFS-5776
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
> HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, 
> HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, 
> HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, 
> HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt, 
> HDFS-5776v18.txt, HDFS-5776v21.txt
>
>
> This is a placeholder of hdfs related stuff backport from 
> https://issues.apache.org/jira/browse/HBASE-7509
> The quorum read ability should be helpful especially to optimize read outliers
> we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & 
> "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read 
> ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
> could export the interested metric valus into client system(e.g. HBase's 
> regionserver metric).
> The core logic is in pread code path, we decide to goto the original 
> fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
> the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-2538) option to disable fsck dots

2014-02-20 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam reassigned HDFS-2538:
---

Assignee: Mohammad Kamrul Islam

> option to disable fsck dots 
> 
>
> Key: HDFS-2538
> URL: https://issues.apache.org/jira/browse/HDFS-2538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.20.204.0, 1.0.0
>Reporter: Allen Wittenauer
>Assignee: Mohammad Kamrul Islam
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-2538-branch-0.20-security-204.patch, 
> HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch
>
>
> this patch turns the dots during fsck off by default and provides an option 
> to turn them back on if you have a fetish for millions and millions of dots 
> on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
> 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907853#comment-13907853
 ] 

Jing Zhao commented on HDFS-5991:
-

+1 for the patch. I will commit it shortly.

> TestLoadGenerator#testLoadGenerator fails on trunk
> --
>
> Key: HDFS-5991
> URL: https://issues.apache.org/jira/browse/HDFS-5991
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Haohui Mai
> Attachments: HDFS-5991.000.patch, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator-output.txt, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.txt
>
>
> From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
> {code}
> java.io.IOException: Stream closed
>   at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
>   at java.io.BufferedReader.readLine(BufferedReader.java:292)
>   at java.io.BufferedReader.readLine(BufferedReader.java:362)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
>   at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5931) Potential bugs and improvements for exception handlers

2014-02-20 Thread Ding Yuan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907848#comment-13907848
 ] 

Ding Yuan commented on HDFS-5931:
-

Hi [~atm], I took another close look at the test output. It seems the 
SocketTimeoutException might not be caused by my patch (I ran the test on my 
machine and it passed). Is there any chance to comment on this patch, and if 
indeed the test was broke by the patch or there is any other problems with my 
patch I can further fix it?

Thanks,

> Potential bugs and improvements for exception handlers
> --
>
> Key: HDFS-5931
> URL: https://issues.apache.org/jira/browse/HDFS-5931
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.2.0
>Reporter: Ding Yuan
> Attachments: hdfs-5931-v2.patch, hdfs-5931-v3.patch, hdfs-5931.patch
>
>
> This is to report some improvements and potential bug fixes to some error 
> handling code. Also attaching a patch for review.
> Details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907843#comment-13907843
 ] 

Hadoop QA commented on HDFS-5991:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630203/HDFS-5991.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6200//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6200//console

This message is automatically generated.

> TestLoadGenerator#testLoadGenerator fails on trunk
> --
>
> Key: HDFS-5991
> URL: https://issues.apache.org/jira/browse/HDFS-5991
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Haohui Mai
> Attachments: HDFS-5991.000.patch, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator-output.txt, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.txt
>
>
> From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
> {code}
> java.io.IOException: Stream closed
>   at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
>   at java.io.BufferedReader.readLine(BufferedReader.java:292)
>   at java.io.BufferedReader.readLine(BufferedReader.java:362)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
>   at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5990) Create options to search files/dirs in OfflineImageViewer

2014-02-20 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA reassigned HDFS-5990:
---

Assignee: Akira AJISAKA

> Create options to search files/dirs in OfflineImageViewer
> -
>
> Key: HDFS-5990
> URL: https://issues.apache.org/jira/browse/HDFS-5990
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: tools
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>
> The enhancement of HDFS-5975.
> I suggest options to search files/dirs in OfflineImageViewer.
> An example command is as follows:
> {code}
> hdfs oiv -i input -o output -p Ls -owner theuser -group supergroup -minSize 
> 1024 -maxSize 1048576
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages

2014-02-20 Thread Travis Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Travis Thompson updated HDFS-5935:
--

Attachment: HDFS-5935-4.patch

> New Namenode UI FS browser should throw smarter error messages
> --
>
> Key: HDFS-5935
> URL: https://issues.apache.org/jira/browse/HDFS-5935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Travis Thompson
>Assignee: Travis Thompson
>Priority: Minor
> Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch, 
> HDFS-5935-4.patch
>
>
> When browsing using the new FS browser in the namenode, if I try to browse a 
> folder that I don't have permission to view, it throws the error:
> {noformat}
> Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: 
> Forbidden
> WebHDFS might be disabled. WebHDFS is required to browse the filesystem.
> {noformat}
> The reason I'm not allowed to see /system is because I don't have permission, 
> not because WebHDFS is disabled.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages

2014-02-20 Thread Travis Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Travis Thompson updated HDFS-5935:
--

Attachment: (was: HDFS-5935-4.patch)

> New Namenode UI FS browser should throw smarter error messages
> --
>
> Key: HDFS-5935
> URL: https://issues.apache.org/jira/browse/HDFS-5935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Travis Thompson
>Assignee: Travis Thompson
>Priority: Minor
> Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch
>
>
> When browsing using the new FS browser in the namenode, if I try to browse a 
> folder that I don't have permission to view, it throws the error:
> {noformat}
> Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: 
> Forbidden
> WebHDFS might be disabled. WebHDFS is required to browse the filesystem.
> {noformat}
> The reason I'm not allowed to see /system is because I don't have permission, 
> not because WebHDFS is disabled.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages

2014-02-20 Thread Travis Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Travis Thompson updated HDFS-5935:
--

Attachment: HDFS-5935-4.patch

Update with combined if statement

> New Namenode UI FS browser should throw smarter error messages
> --
>
> Key: HDFS-5935
> URL: https://issues.apache.org/jira/browse/HDFS-5935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Travis Thompson
>Assignee: Travis Thompson
>Priority: Minor
> Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch
>
>
> When browsing using the new FS browser in the namenode, if I try to browse a 
> folder that I don't have permission to view, it throws the error:
> {noformat}
> Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: 
> Forbidden
> WebHDFS might be disabled. WebHDFS is required to browse the filesystem.
> {noformat}
> The reason I'm not allowed to see /system is because I don't have permission, 
> not because WebHDFS is disabled.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages

2014-02-20 Thread Travis Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907814#comment-13907814
 ] 

Travis Thompson commented on HDFS-5935:
---

Looks like you are right: [http://jsfiddle.net/GD4zN/]

I'll update the patch with the combined if.

> New Namenode UI FS browser should throw smarter error messages
> --
>
> Key: HDFS-5935
> URL: https://issues.apache.org/jira/browse/HDFS-5935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Travis Thompson
>Assignee: Travis Thompson
>Priority: Minor
> Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch
>
>
> When browsing using the new FS browser in the namenode, if I try to browse a 
> folder that I don't have permission to view, it throws the error:
> {noformat}
> Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: 
> Forbidden
> WebHDFS might be disabled. WebHDFS is required to browse the filesystem.
> {noformat}
> The reason I'm not allowed to see /system is because I don't have permission, 
> not because WebHDFS is disabled.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-02-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907816#comment-13907816
 ] 

stack commented on HDFS-5274:
-

[~iwasakims] Thanks.  When you are done, I'll try hooking it up w/ hbase to 
make sure we get a trace that spans the two systems.

[~sureshms] Wouldn't adding htrace to the common pom.xml make it  "...available 
in Hadoop common"?  Thanks.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> HDFS-5274-7.patch, Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace 
> d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5989) merge of HDFS-4685 to trunk introduced trunk test failure

2014-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907804#comment-13907804
 ] 

Yongjun Zhang commented on HDFS-5989:
-

Hi Chris,

I was writing my last update and just saw yours. Cool you figured out the root 
cause of this bug! Thanks for following-up!

--Yongjun



> merge of HDFS-4685 to trunk introduced trunk test failure
> -
>
> Key: HDFS-5989
> URL: https://issues.apache.org/jira/browse/HDFS-5989
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CentOS release 6.5 (Final)
> cpe:/o:centos:linux:6:GA
>Reporter: Yongjun Zhang
>Assignee: Chris Nauroth
>
> HI,
> I'm seeing trunk branch test failure locally (centOs6) today. And I 
> identified it's this commit that caused the failure. 
> Author: Chris Nauroth   2014-02-19 10:34:52
> Committer: Chris Nauroth   2014-02-19 10:34:52
> Parent: 7215d12fdce727e1f4bce21a156b0505bd9ba72a (YARN-1666. Modified RM HA 
> handling of include/exclude node-lists to be available across RM failover by 
> making using of a remote configuration-provider. Contributed by Xuan Gong.)
> Parent: 603ebb82b31e9300cfbf81ed5dd6110f1cb31b27 (HDFS-4685. Correct minor 
> whitespace difference in FSImageSerialization.java in preparation for trunk 
> merge.)
> Child:  ef8a5bceb7f3ce34d08a5968777effd40e0b1d0f (YARN-1171. Add default 
> queue properties to Fair Scheduler documentation (Naren Koneru via Sandy 
> Ryza))
> Branches: remotes/apache/HDFS-5535, remotes/apache/trunk, testv10, testv3, 
> testv4, testv7
> Follows: testv5
> Precedes: 
> Merge HDFS-4685 to trunk.
> 
> git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1569870 
> 13f79535-47bb-0310-9956-ffa450edef68
> I'm not sure whether other folks are seeing the same, or maybe related to my 
> environment. But prior to chis change, I don't see this problem.
> The failures are in TestWebHDFS:
> Running org.apache.hadoop.hdfs.web.TestWebHDFS
> Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.687 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS
> testLargeDirectory(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 2.478 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
> at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeDirectory(TestWebHDFS.java:229)
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 0.342 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> at 
> org.apache.hadoop.hdfs.server.datanod

[jira] [Commented] (HDFS-5989) merge of HDFS-4685 to trunk introduced trunk test failure

2014-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907803#comment-13907803
 ] 

Yongjun Zhang commented on HDFS-5989:
-

Hi Jing. 

Good to know that you also saw the same problem. I guess most developers take 
the default setting of ACL. I think it would be nice if the unit test is 
self-contained so its success/failure is not subject to which env setting we 
are running. So I wonder if the test itself can be modified to loose the ACL 
restriction, rather than to change the setting of a specific machine. 

What do you think?

Thanks.





> merge of HDFS-4685 to trunk introduced trunk test failure
> -
>
> Key: HDFS-5989
> URL: https://issues.apache.org/jira/browse/HDFS-5989
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CentOS release 6.5 (Final)
> cpe:/o:centos:linux:6:GA
>Reporter: Yongjun Zhang
>Assignee: Chris Nauroth
>
> HI,
> I'm seeing trunk branch test failure locally (centOs6) today. And I 
> identified it's this commit that caused the failure. 
> Author: Chris Nauroth   2014-02-19 10:34:52
> Committer: Chris Nauroth   2014-02-19 10:34:52
> Parent: 7215d12fdce727e1f4bce21a156b0505bd9ba72a (YARN-1666. Modified RM HA 
> handling of include/exclude node-lists to be available across RM failover by 
> making using of a remote configuration-provider. Contributed by Xuan Gong.)
> Parent: 603ebb82b31e9300cfbf81ed5dd6110f1cb31b27 (HDFS-4685. Correct minor 
> whitespace difference in FSImageSerialization.java in preparation for trunk 
> merge.)
> Child:  ef8a5bceb7f3ce34d08a5968777effd40e0b1d0f (YARN-1171. Add default 
> queue properties to Fair Scheduler documentation (Naren Koneru via Sandy 
> Ryza))
> Branches: remotes/apache/HDFS-5535, remotes/apache/trunk, testv10, testv3, 
> testv4, testv7
> Follows: testv5
> Precedes: 
> Merge HDFS-4685 to trunk.
> 
> git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1569870 
> 13f79535-47bb-0310-9956-ffa450edef68
> I'm not sure whether other folks are seeing the same, or maybe related to my 
> environment. But prior to chis change, I don't see this problem.
> The failures are in TestWebHDFS:
> Running org.apache.hadoop.hdfs.web.TestWebHDFS
> Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.687 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS
> testLargeDirectory(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 2.478 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
> at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeDirectory(TestWebHDFS.java:229)
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 0.342 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(

[jira] [Commented] (HDFS-5988) Bad fsimage always generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907801#comment-13907801
 ] 

Andrew Wang commented on HDFS-5988:
---

Thanks for the review Jing, I'll commit if test-patch comes back clean.

> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5274) Add Tracing to HDFS

2014-02-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907800#comment-13907800
 ] 

Todd Lipcon commented on HDFS-5274:
---

Hi Suresh. As one of the primary authors of HTrace I'd say that there are no 
plans to put it in Hadoop Common. Personally I think Hadoop Common as a 
grab-bag of all things Hadoop gets pretty messy, since it's harder to 
piece-meal upgrade different components. It also means that dependent apps like 
HBase would need to wait on new versions of Common and have even more 
difficulty building the same code against different versions of Hadoop.

Happy to take contributions to HTrace via github pull request if you like, 
though.

> Add Tracing to HDFS
> ---
>
> Key: HDFS-5274
> URL: https://issues.apache.org/jira/browse/HDFS-5274
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 2.1.1-beta
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-2.patch, 
> HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, 
> HDFS-5274-7.patch, Zipkin   Trace a06e941b0172ec73.png, Zipkin   Trace 
> d0f0d66b8a258a69.png
>
>
> Since Google's Dapper paper has shown the benefits of tracing for a large 
> distributed system, it seems like a good time to add tracing to HDFS.  HBase 
> has added tracing using HTrace.  I propose that the same can be done within 
> HDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5989) merge of HDFS-4685 to trunk introduced trunk test failure

2014-02-20 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907799#comment-13907799
 ] 

Chris Nauroth commented on HDFS-5989:
-

Hi, [~yzhangal].  Thank you for the bug report.  I don't have a repro locally 
for this.  I suspect that you're running with Smack enabled on your local file 
system.  I believe the extra '.' indicator in the permission string indicates 
the presence of a Smack label.  For Jing, it would be a similar situation with 
'+' appended to files with an ACL on the local file system.

I suspect I know the root cause of this bug.  I'll post a patch later tonight.

> merge of HDFS-4685 to trunk introduced trunk test failure
> -
>
> Key: HDFS-5989
> URL: https://issues.apache.org/jira/browse/HDFS-5989
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CentOS release 6.5 (Final)
> cpe:/o:centos:linux:6:GA
>Reporter: Yongjun Zhang
>
> HI,
> I'm seeing trunk branch test failure locally (centOs6) today. And I 
> identified it's this commit that caused the failure. 
> Author: Chris Nauroth   2014-02-19 10:34:52
> Committer: Chris Nauroth   2014-02-19 10:34:52
> Parent: 7215d12fdce727e1f4bce21a156b0505bd9ba72a (YARN-1666. Modified RM HA 
> handling of include/exclude node-lists to be available across RM failover by 
> making using of a remote configuration-provider. Contributed by Xuan Gong.)
> Parent: 603ebb82b31e9300cfbf81ed5dd6110f1cb31b27 (HDFS-4685. Correct minor 
> whitespace difference in FSImageSerialization.java in preparation for trunk 
> merge.)
> Child:  ef8a5bceb7f3ce34d08a5968777effd40e0b1d0f (YARN-1171. Add default 
> queue properties to Fair Scheduler documentation (Naren Koneru via Sandy 
> Ryza))
> Branches: remotes/apache/HDFS-5535, remotes/apache/trunk, testv10, testv3, 
> testv4, testv7
> Follows: testv5
> Precedes: 
> Merge HDFS-4685 to trunk.
> 
> git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1569870 
> 13f79535-47bb-0310-9956-ffa450edef68
> I'm not sure whether other folks are seeing the same, or maybe related to my 
> environment. But prior to chis change, I don't see this problem.
> The failures are in TestWebHDFS:
> Running org.apache.hadoop.hdfs.web.TestWebHDFS
> Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.687 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS
> testLargeDirectory(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 2.478 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
> at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeDirectory(TestWebHDFS.java:229)
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 0.342 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(Dat

[jira] [Assigned] (HDFS-5989) merge of HDFS-4685 to trunk introduced trunk test failure

2014-02-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reassigned HDFS-5989:
---

Assignee: Chris Nauroth

> merge of HDFS-4685 to trunk introduced trunk test failure
> -
>
> Key: HDFS-5989
> URL: https://issues.apache.org/jira/browse/HDFS-5989
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CentOS release 6.5 (Final)
> cpe:/o:centos:linux:6:GA
>Reporter: Yongjun Zhang
>Assignee: Chris Nauroth
>
> HI,
> I'm seeing trunk branch test failure locally (centOs6) today. And I 
> identified it's this commit that caused the failure. 
> Author: Chris Nauroth   2014-02-19 10:34:52
> Committer: Chris Nauroth   2014-02-19 10:34:52
> Parent: 7215d12fdce727e1f4bce21a156b0505bd9ba72a (YARN-1666. Modified RM HA 
> handling of include/exclude node-lists to be available across RM failover by 
> making using of a remote configuration-provider. Contributed by Xuan Gong.)
> Parent: 603ebb82b31e9300cfbf81ed5dd6110f1cb31b27 (HDFS-4685. Correct minor 
> whitespace difference in FSImageSerialization.java in preparation for trunk 
> merge.)
> Child:  ef8a5bceb7f3ce34d08a5968777effd40e0b1d0f (YARN-1171. Add default 
> queue properties to Fair Scheduler documentation (Naren Koneru via Sandy 
> Ryza))
> Branches: remotes/apache/HDFS-5535, remotes/apache/trunk, testv10, testv3, 
> testv4, testv7
> Follows: testv5
> Precedes: 
> Merge HDFS-4685 to trunk.
> 
> git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1569870 
> 13f79535-47bb-0310-9956-ffa450edef68
> I'm not sure whether other folks are seeing the same, or maybe related to my 
> environment. But prior to chis change, I don't see this problem.
> The failures are in TestWebHDFS:
> Running org.apache.hadoop.hdfs.web.TestWebHDFS
> Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.687 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS
> testLargeDirectory(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 2.478 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
> at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeDirectory(TestWebHDFS.java:229)
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 0.342 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
> at 
> org.apach

[jira] [Commented] (HDFS-5865) Update OfflineImageViewer document

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907795#comment-13907795
 ] 

Hadoop QA commented on HDFS-5865:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630161/HDFS-5865.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6197//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6197//console

This message is automatically generated.

> Update OfflineImageViewer document
> --
>
> Key: HDFS-5865
> URL: https://issues.apache.org/jira/browse/HDFS-5865
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 2.4.0
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>  Labels: newbie
> Attachments: HDFS-5865.patch
>
>
> OfflineImageViewer is renewed to handle the new format of fsimage by 
> HDFS-5698 (fsimage in protobuf).
> We should document followings:
> * The tool can handle the layout version of Hadoop 2.4 and up. (If you want 
> to handle the older version, you can use OfflineImageViewer of Hadoop 2.3)
> * Remove deprecated options such as Delimited and Indented processor.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages

2014-02-20 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907793#comment-13907793
 ] 

Haohui Mai commented on HDFS-5935:
--

Does

{code}
if (jqxhr.responseJSON !== undefined && jqxhr.responseJSON.RemoteException !== 
undefined) {
...
{code}

work?

> New Namenode UI FS browser should throw smarter error messages
> --
>
> Key: HDFS-5935
> URL: https://issues.apache.org/jira/browse/HDFS-5935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Travis Thompson
>Assignee: Travis Thompson
>Priority: Minor
> Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch
>
>
> When browsing using the new FS browser in the namenode, if I try to browse a 
> folder that I don't have permission to view, it throws the error:
> {noformat}
> Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: 
> Forbidden
> WebHDFS might be disabled. WebHDFS is required to browse the filesystem.
> {noformat}
> The reason I'm not allowed to see /system is because I don't have permission, 
> not because WebHDFS is disabled.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory

2014-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907791#comment-13907791
 ] 

Hudson commented on HDFS-5982:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5200 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5200/])
HDFS-5982. Need to update snapshot manager when applying editlog for deleting a 
snapshottable directory. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570395)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotDeletion.java


> Need to update snapshot manager when applying editlog for deleting a 
> snapshottable directory
> 
>
> Key: HDFS-5982
> URL: https://issues.apache.org/jira/browse/HDFS-5982
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Tassapol Athiapinya
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 2.4.0
>
> Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch, 
> HDFS-5982.001.patch
>
>
> Currently after deleting a snapshottable directory which does not have 
> snapshots any more, we also remove the directory from the snapshottable 
> directory list in SnapshotManager. This works fine when handling a delete 
> request from user. However, when we apply the OP_DELETE editlog, 
> FSDirectory#unprotectedDelete(String, long) is called, which does not contain 
> the "updating snapshot manager" process. This may leave an non-existent inode 
> id in the snapshottable directory list, and can even lead to FSImage 
> corruption.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages

2014-02-20 Thread Travis Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907780#comment-13907780
 ] 

Travis Thompson commented on HDFS-5935:
---

The reason I didn't combine them is because I was worried it would throw an 
error checking for {{jqxhr.responseJSON.RemoteException}} if 
{{jqxhr.responseJSON}} is undefined.  If you don't think this is something to 
worry about, I can combine them.

> New Namenode UI FS browser should throw smarter error messages
> --
>
> Key: HDFS-5935
> URL: https://issues.apache.org/jira/browse/HDFS-5935
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Travis Thompson
>Assignee: Travis Thompson
>Priority: Minor
> Attachments: HDFS-5935-1.patch, HDFS-5935-2.patch, HDFS-5935-3.patch
>
>
> When browsing using the new FS browser in the namenode, if I try to browse a 
> folder that I don't have permission to view, it throws the error:
> {noformat}
> Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: 
> Forbidden
> WebHDFS might be disabled. WebHDFS is required to browse the filesystem.
> {noformat}
> The reason I'm not allowed to see /system is because I don't have permission, 
> not because WebHDFS is disabled.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory

2014-02-20 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5982:


   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

Thanks for the review, Chris! I've committed this to trunk, branch-2 and 
branch-2.4.0.

> Need to update snapshot manager when applying editlog for deleting a 
> snapshottable directory
> 
>
> Key: HDFS-5982
> URL: https://issues.apache.org/jira/browse/HDFS-5982
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Tassapol Athiapinya
>Assignee: Jing Zhao
>Priority: Critical
> Fix For: 2.4.0
>
> Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch, 
> HDFS-5982.001.patch
>
>
> Currently after deleting a snapshottable directory which does not have 
> snapshots any more, we also remove the directory from the snapshottable 
> directory list in SnapshotManager. This works fine when handling a delete 
> request from user. However, when we apply the OP_DELETE editlog, 
> FSDirectory#unprotectedDelete(String, long) is called, which does not contain 
> the "updating snapshot manager" process. This may leave an non-existent inode 
> id in the snapshottable directory list, and can even lead to FSImage 
> corruption.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5981) PBImageXmlWriter generates malformed XML

2014-02-20 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907766#comment-13907766
 ] 

Haohui Mai commented on HDFS-5981:
--

HDFS-5991 is tracking the failure of {{TestLoadGenerator}}. The failure of 
{{TestCacheDirectives}} is unrelated.

> PBImageXmlWriter generates malformed XML
> 
>
> Key: HDFS-5981
> URL: https://issues.apache.org/jira/browse/HDFS-5981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, 
> HDFS-5981.002.patch, HDFS-5981.003.patch
>
>
> {{PBImageXmlWriter}} outputs malformed XML file because it closes the 
> {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} 
> incorrectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5991:
-

Attachment: HDFS-5991.000.patch

> TestLoadGenerator#testLoadGenerator fails on trunk
> --
>
> Key: HDFS-5991
> URL: https://issues.apache.org/jira/browse/HDFS-5991
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Haohui Mai
> Attachments: HDFS-5991.000.patch, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator-output.txt, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.txt
>
>
> From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
> {code}
> java.io.IOException: Stream closed
>   at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
>   at java.io.BufferedReader.readLine(BufferedReader.java:292)
>   at java.io.BufferedReader.readLine(BufferedReader.java:362)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
>   at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5951) Provide diagnosis information in the Web UI

2014-02-20 Thread Travis Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907764#comment-13907764
 ] 

Travis Thompson commented on HDFS-5951:
---

I have to agree with [~sureshms], especially since the WebUI is hitting the JMX 
REST API which I think is a fairly common way to monitor these things with 
other tools like Nagios.  To me, it seems to be more like a convince thing to 
expose useful diagnosis information, like the missing files message, but not 
try to replace things like Nagios checks.

> Provide diagnosis information in the Web UI
> ---
>
> Key: HDFS-5951
> URL: https://issues.apache.org/jira/browse/HDFS-5951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5951.000.patch, diagnosis-failure.png, 
> diagnosis-succeed.png
>
>
> HDFS should provide operation statistics in its UI. it can go one step 
> further by leveraging the information to diagnose common problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5991:
-

Status: Patch Available  (was: Open)

> TestLoadGenerator#testLoadGenerator fails on trunk
> --
>
> Key: HDFS-5991
> URL: https://issues.apache.org/jira/browse/HDFS-5991
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Haohui Mai
> Attachments: HDFS-5991.000.patch, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator-output.txt, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.txt
>
>
> From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
> {code}
> java.io.IOException: Stream closed
>   at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
>   at java.io.BufferedReader.readLine(BufferedReader.java:292)
>   at java.io.BufferedReader.readLine(BufferedReader.java:362)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
>   at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5981) PBImageXmlWriter generates malformed XML

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907763#comment-13907763
 ] 

Hadoop QA commented on HDFS-5981:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630158/HDFS-5981.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator
  org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6196//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6196//console

This message is automatically generated.

> PBImageXmlWriter generates malformed XML
> 
>
> Key: HDFS-5981
> URL: https://issues.apache.org/jira/browse/HDFS-5981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, 
> HDFS-5981.002.patch, HDFS-5981.003.patch
>
>
> {{PBImageXmlWriter}} outputs malformed XML file because it closes the 
> {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} 
> incorrectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5988) Bad fsimage always generated after upgrade

2014-02-20 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907757#comment-13907757
 ] 

Jing Zhao commented on HDFS-5988:
-

Hi Andrew, I think you're right: getLayoutVersion() returns the layout version 
of the old fsimage, but we should update the current inode map any way. 

Thanks for the fix. +1 for the patch.

> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5991:


Attachment: org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.txt
org.apache.hadoop.fs.loadGenerator.TestLoadGenerator-output.txt

> TestLoadGenerator#testLoadGenerator fails on trunk
> --
>
> Key: HDFS-5991
> URL: https://issues.apache.org/jira/browse/HDFS-5991
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Haohui Mai
> Attachments: 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator-output.txt, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.txt
>
>
> From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
> {code}
> java.io.IOException: Stream closed
>   at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
>   at java.io.BufferedReader.readLine(BufferedReader.java:292)
>   at java.io.BufferedReader.readLine(BufferedReader.java:362)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
>   at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reassigned HDFS-5991:


Assignee: Haohui Mai

> TestLoadGenerator#testLoadGenerator fails on trunk
> --
>
> Key: HDFS-5991
> URL: https://issues.apache.org/jira/browse/HDFS-5991
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>Assignee: Haohui Mai
> Attachments: 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator-output.txt, 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.txt
>
>
> From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
> {code}
> java.io.IOException: Stream closed
>   at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
>   at java.io.BufferedReader.readLine(BufferedReader.java:292)
>   at java.io.BufferedReader.readLine(BufferedReader.java:362)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
>   at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5981) PBImageXmlWriter generates malformed XML

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907752#comment-13907752
 ] 

Hadoop QA commented on HDFS-5981:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630155/HDFS-5981.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6195//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6195//console

This message is automatically generated.

> PBImageXmlWriter generates malformed XML
> 
>
> Key: HDFS-5981
> URL: https://issues.apache.org/jira/browse/HDFS-5981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, 
> HDFS-5981.002.patch, HDFS-5981.003.patch
>
>
> {{PBImageXmlWriter}} outputs malformed XML file because it closes the 
> {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} 
> incorrectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907751#comment-13907751
 ] 

Akira AJISAKA commented on HDFS-5991:
-

I could reproduce this failure. I'll attach the logs.

> TestLoadGenerator#testLoadGenerator fails on trunk
> --
>
> Key: HDFS-5991
> URL: https://issues.apache.org/jira/browse/HDFS-5991
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Akira AJISAKA
>
> From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
> {code}
> java.io.IOException: Stream closed
>   at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
>   at java.io.BufferedReader.readLine(BufferedReader.java:292)
>   at java.io.BufferedReader.readLine(BufferedReader.java:362)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
>   at 
> org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
>   at 
> org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5924) Utilize OOB upgrade message processing for writes

2014-02-20 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907748#comment-13907748
 ] 

Brandon Li commented on HDFS-5924:
--

{quote}This is no worse than the current behavior.{quote}
The application could experience much higher write failure rate during the 
datanode upgrade. For example, all datanodes inaccessible in the pipeline is 
more possible now. Shutting down a datanode after previous shutdown datanode is 
up might be able to minimize the write failure but would increase the total 
upgrade time.

Recall there was some discussion that, after datanode sends OOB, its shutdown 
can be paused until the client agrees to let it go. I don't remember the reason 
why not taking that approach.

> Utilize OOB upgrade message processing for writes
> -
>
> Key: HDFS-5924
> URL: https://issues.apache.org/jira/browse/HDFS-5924
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-5924_RBW_RECOVERY.patch, 
> HDFS-5924_RBW_RECOVERY.patch
>
>
> After HDFS-5585 and HDFS-5583, clients and datanodes can coordinate 
> shutdown-restart in order to minimize failures or locality loss.
> In this jira, HDFS client is made aware of the restart OOB ack and perform 
> special write pipeline recovery. Datanode is also modified to load marked RBW 
> replicas as RBW instead of RWR as long as the restart did not take long. 
> For clients, it considers doing this kind of recovery only when there is only 
> one node left in the pipeline or the restarting node is a local datanode.  
> For both clients and datanodes, the timeout or expiration is configurable, 
> meaning this feature can be turned off by setting timeout variables to 0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5991) TestLoadGenerator#testLoadGenerator fails on trunk

2014-02-20 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-5991:
---

 Summary: TestLoadGenerator#testLoadGenerator fails on trunk
 Key: HDFS-5991
 URL: https://issues.apache.org/jira/browse/HDFS-5991
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Akira AJISAKA


>From https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/

{code}
java.io.IOException: Stream closed
at java.io.BufferedReader.ensureOpen(BufferedReader.java:97)
at java.io.BufferedReader.readLine(BufferedReader.java:292)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at 
org.apache.hadoop.fs.loadGenerator.LoadGenerator.loadScriptFile(LoadGenerator.java:511)
at 
org.apache.hadoop.fs.loadGenerator.LoadGenerator.init(LoadGenerator.java:418)
at 
org.apache.hadoop.fs.loadGenerator.LoadGenerator.run(LoadGenerator.java:324)
at 
org.apache.hadoop.fs.loadGenerator.TestLoadGenerator.testLoadGenerator(TestLoadGenerator.java:231)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5535:
-

Attachment: h5535_20140220-1554.patch

h5535_20140220-1554.patch: try again after fixed some tests and warnings.

> Umbrella jira for improved HDFS rolling upgrades
> 
>
> Key: HDFS-5535
> URL: https://issues.apache.org/jira/browse/HDFS-5535
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ha, hdfs-client, namenode
>Affects Versions: 3.0.0, 2.2.0
>Reporter: Nathan Roberts
> Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, 
> h5535_20140219.patch, h5535_20140220-1554.patch
>
>
> In order to roll a new HDFS release through a large cluster quickly and 
> safely, a few enhancements are needed in HDFS. An initial High level design 
> document will be attached to this jira, and sub-jiras will itemize the 
> individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5989) merge of HDFS-4685 to trunk introduced trunk test failure

2014-02-20 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907734#comment-13907734
 ] 

Jing Zhao commented on HDFS-5989:
-

I've met the same test failure also. The cause of the test failure in my 
machine is that I enabled ACL in my macbook. After disabling it the test 
passed. 

But in the meanwhile, do we want to loose the check a little bit for the unit 
test? 

> merge of HDFS-4685 to trunk introduced trunk test failure
> -
>
> Key: HDFS-5989
> URL: https://issues.apache.org/jira/browse/HDFS-5989
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: CentOS release 6.5 (Final)
> cpe:/o:centos:linux:6:GA
>Reporter: Yongjun Zhang
>
> HI,
> I'm seeing trunk branch test failure locally (centOs6) today. And I 
> identified it's this commit that caused the failure. 
> Author: Chris Nauroth   2014-02-19 10:34:52
> Committer: Chris Nauroth   2014-02-19 10:34:52
> Parent: 7215d12fdce727e1f4bce21a156b0505bd9ba72a (YARN-1666. Modified RM HA 
> handling of include/exclude node-lists to be available across RM failover by 
> making using of a remote configuration-provider. Contributed by Xuan Gong.)
> Parent: 603ebb82b31e9300cfbf81ed5dd6110f1cb31b27 (HDFS-4685. Correct minor 
> whitespace difference in FSImageSerialization.java in preparation for trunk 
> merge.)
> Child:  ef8a5bceb7f3ce34d08a5968777effd40e0b1d0f (YARN-1171. Add default 
> queue properties to Fair Scheduler documentation (Naren Koneru via Sandy 
> Ryza))
> Branches: remotes/apache/HDFS-5535, remotes/apache/trunk, testv10, testv3, 
> testv4, testv7
> Follows: testv5
> Precedes: 
> Merge HDFS-4685 to trunk.
> 
> git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1569870 
> 13f79535-47bb-0310-9956-ffa450edef68
> I'm not sure whether other folks are seeing the same, or maybe related to my 
> environment. But prior to chis change, I don't see this problem.
> The failures are in TestWebHDFS:
> Running org.apache.hadoop.hdfs.web.TestWebHDFS
> Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.687 sec <<< 
> FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS
> testLargeDirectory(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 2.478 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
> at 
> org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeDirectory(TestWebHDFS.java:229)
> testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
> 0.342 sec  <<< ERROR!
> java.lang.IllegalArgumentException: length != 
> 10(unixSymbolicPermission=drwxrwxr-x.)
> at 
> org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
> at 
> org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
> at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
> 

[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint

2014-02-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907729#comment-13907729
 ] 

Hudson commented on HDFS-5944:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5199 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5199/])
HDFS-5944. LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ 
and cause SecondaryNameNode failed do checkpoint. Contributed by Yunjiong Zhao 
(brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1570366)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestLeaseManager.java


> LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and 
> cause SecondaryNameNode failed do checkpoint
> 
>
> Key: HDFS-5944
> URL: https://issues.apache.org/jira/browse/HDFS-5944
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Fix For: 1.3.0, 2.4.0
>
> Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
> HDFS-5944.test.txt, HDFS-5944.trunk.patch
>
>
> In our cluster, we encountered error like this:
> java.io.IOException: saveLeases found path 
> /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
> What happened:
> Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
> And Client A continue refresh it's lease.
> Client B deleted /XXX/20140206/04_30/
> Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
> Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
> Then secondaryNameNode try to do checkpoint and failed due to failed to 
> delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
> The reason is a bug in findLeaseWithPrefixPath:
>  int srclen = prefix.length();
>  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
> entries.put(entry.getKey(), entry.getValue());
>   }
> Here when prefix is /XXX/20140206/04_30/, and p is 
> /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
> The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.

2014-02-20 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907731#comment-13907731
 ] 

Chris Nauroth commented on HDFS-5957:
-

bq. Chris Nauroth: mmap() does take up physical memory, assuming those pages 
are mapped into RAM and are not disk-resident.

Yes, most definitely.  I think Colin was trying to clarify that the initial 
mmap call dings virtual memory: call mmap for a 1 MB file and you'll 
immediately see virtual memory increase by 1 MB, but not physical memory.  
Certainly as the pages get accessed and mapped in, we'll start to consume 
physical memory.

bq. For small 200Gb data-sets (~1.4x tasks per container), ZCR does give a perf 
boost because we get to use HADOOP-10047 instead of shuffling it between byte[] 
buffers for decompression.

Thanks, that clarifies why zero-copy read was still useful.

It sounds like you really do need a deterministic way to trigger the {{munmap}} 
calls, i.e. LRU caching or no caching at all described above.

> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> -
>
> Key: HDFS-5957
> URL: https://issues.apache.org/jira/browse/HDFS-5957
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory

2014-02-20 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907724#comment-13907724
 ] 

Jing Zhao commented on HDFS-5982:
-

The failed test should be unrelated. I will commit the patch shortly.

> Need to update snapshot manager when applying editlog for deleting a 
> snapshottable directory
> 
>
> Key: HDFS-5982
> URL: https://issues.apache.org/jira/browse/HDFS-5982
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Tassapol Athiapinya
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch, 
> HDFS-5982.001.patch
>
>
> Currently after deleting a snapshottable directory which does not have 
> snapshots any more, we also remove the directory from the snapshottable 
> directory list in SnapshotManager. This works fine when handling a delete 
> request from user. However, when we apply the OP_DELETE editlog, 
> FSDirectory#unprotectedDelete(String, long) is called, which does not contain 
> the "updating snapshot manager" process. This may leave an non-existent inode 
> id in the snapshottable directory list, and can even lead to FSImage 
> corruption.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5982) Need to update snapshot manager when applying editlog for deleting a snapshottable directory

2014-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907708#comment-13907708
 ] 

Hadoop QA commented on HDFS-5982:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630107/HDFS-5982.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.loadGenerator.TestLoadGenerator

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6194//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6194//console

This message is automatically generated.

> Need to update snapshot manager when applying editlog for deleting a 
> snapshottable directory
> 
>
> Key: HDFS-5982
> URL: https://issues.apache.org/jira/browse/HDFS-5982
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.3.0
>Reporter: Tassapol Athiapinya
>Assignee: Jing Zhao
>Priority: Critical
> Attachments: HDFS-5982.000.patch, HDFS-5982.001.patch, 
> HDFS-5982.001.patch
>
>
> Currently after deleting a snapshottable directory which does not have 
> snapshots any more, we also remove the directory from the snapshottable 
> directory list in SnapshotManager. This works fine when handling a delete 
> request from user. However, when we apply the OP_DELETE editlog, 
> FSDirectory#unprotectedDelete(String, long) is called, which does not contain 
> the "updating snapshot manager" process. This may leave an non-existent inode 
> id in the snapshottable directory list, and can even lead to FSImage 
> corruption.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5990) Create options to search files/dirs in OfflineImageViewer

2014-02-20 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-5990:
---

 Summary: Create options to search files/dirs in OfflineImageViewer
 Key: HDFS-5990
 URL: https://issues.apache.org/jira/browse/HDFS-5990
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Akira AJISAKA
Priority: Minor


The enhancement of HDFS-5975.
I suggest options to search files/dirs in OfflineImageViewer.
An example command is as follows:
{code}
hdfs oiv -i input -o output -p Ls -owner theuser -group supergroup -minSize 
1024 -maxSize 1048576
{code}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4685) Implementation of ACLs in HDFS

2014-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907703#comment-13907703
 ] 

Yongjun Zhang commented on HDFS-4685:
-

HI [~cnauroth], 

I'm seeing trunk branch test failure locally (centOs6) today. And I identified 
it's this merge of this fix that caused the failure. I'm not sure whether other 
people are seeing the same problem, and whether it's because of my env. Prior 
to this change, I don't see the problem. I filed HDFS-5989 to log the issue, in 
case it's a real one. Would you please take a look at it? 

Thanks.



> Implementation of ACLs in HDFS
> --
>
> Key: HDFS-4685
> URL: https://issues.apache.org/jira/browse/HDFS-4685
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs-client, namenode, security
>Affects Versions: 1.1.2
>Reporter: Sachin Jose
>Assignee: Chris Nauroth
> Fix For: 3.0.0
>
> Attachments: HDFS-4685.1.patch, HDFS-4685.2.patch, HDFS-4685.3.patch, 
> HDFS-4685.4.patch, HDFS-ACLs-Design-1.pdf, HDFS-ACLs-Design-2.pdf, 
> HDFS-ACLs-Design-3.pdf, Test-Plan-for-Extended-Acls-1.pdf
>
>
> Currenly hdfs doesn't support Extended file ACL. In unix extended ACL can be 
> achieved using getfacl and setfacl utilities. Is there anybody working on 
> this feature ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.

2014-02-20 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907700#comment-13907700
 ] 

Gopal V commented on HDFS-5957:
---

[~cnauroth]: mmap() does take up physical memory, assuming those pages are 
mapped into RAM and are not disk-resident.

As long as we're on Linux, it will show up in RSS as well as marked in the 
Shared_Clean/Referenced field in /proc//smaps.

YARN could do a better job of calculating "How much memory will be free'd up if 
this process is killed" vs "How much memory does this process use". But that is 
a completely different issue.

When I set the mmap timeout to 1000ms, some of my queries succeeded - mostly 
the queries which were taking > 50 seconds. 

But the really fast ORC queries which take ~10 seconds to run still managed to 
hit around ~50x task failures out of ~3000 map tasks.

The perf dip happens because some of the failures. 

For small 200Gb data-sets (~1.4x tasks per container), ZCR does give a perf 
boost because we get to use HADOOP-10047 instead of shuffling it between byte[] 
buffers for decompression.

> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> -
>
> Key: HDFS-5957
> URL: https://issues.apache.org/jira/browse/HDFS-5957
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5981) PBImageXmlWriter generates malformed XML

2014-02-20 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5981:


  Component/s: tools
 Target Version/s: 3.0.0, 2.4.0
Affects Version/s: 2.4.0
   3.0.0

+1 for the patch.  Thank you for adding the test.  I'll commit this later today.

> PBImageXmlWriter generates malformed XML
> 
>
> Key: HDFS-5981
> URL: https://issues.apache.org/jira/browse/HDFS-5981
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5981.000.patch, HDFS-5981.001.patch, 
> HDFS-5981.002.patch, HDFS-5981.003.patch
>
>
> {{PBImageXmlWriter}} outputs malformed XML file because it closes the 
> {{SnapshotDiffSection}}, {{NameSection}} and {{INodeReferenceSection}} 
> incorrectly.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5989) merge of HDFS-4685 to trunk introduced trunk test failure

2014-02-20 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-5989:
---

 Summary: merge of HDFS-4685 to trunk introduced trunk test failure
 Key: HDFS-5989
 URL: https://issues.apache.org/jira/browse/HDFS-5989
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: CentOS release 6.5 (Final)
cpe:/o:centos:linux:6:GA

Reporter: Yongjun Zhang


HI,

I'm seeing trunk branch test failure locally (centOs6) today. And I identified 
it's this commit that caused the failure. 

Author: Chris Nauroth   2014-02-19 10:34:52
Committer: Chris Nauroth   2014-02-19 10:34:52
Parent: 7215d12fdce727e1f4bce21a156b0505bd9ba72a (YARN-1666. Modified RM HA 
handling of include/exclude node-lists to be available across RM failover by 
making using of a remote configuration-provider. Contributed by Xuan Gong.)
Parent: 603ebb82b31e9300cfbf81ed5dd6110f1cb31b27 (HDFS-4685. Correct minor 
whitespace difference in FSImageSerialization.java in preparation for trunk 
merge.)
Child:  ef8a5bceb7f3ce34d08a5968777effd40e0b1d0f (YARN-1171. Add default queue 
properties to Fair Scheduler documentation (Naren Koneru via Sandy Ryza))
Branches: remotes/apache/HDFS-5535, remotes/apache/trunk, testv10, testv3, 
testv4, testv7
Follows: testv5
Precedes: 

Merge HDFS-4685 to trunk.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1569870 
13f79535-47bb-0310-9956-ffa450edef68


I'm not sure whether other folks are seeing the same, or maybe related to my 
environment. But prior to chis change, I don't see this problem.

The failures are in TestWebHDFS:

Running org.apache.hadoop.hdfs.web.TestWebHDFS
Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 3.687 sec <<< 
FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHDFS
testLargeDirectory(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 2.478 
sec  <<< ERROR!
java.lang.IllegalArgumentException: length != 
10(unixSymbolicPermission=drwxrwxr-x.)
at 
org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
at 
org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
at 
org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
at 
org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
at 
org.apache.hadoop.hdfs.web.TestWebHDFS.testLargeDirectory(TestWebHDFS.java:229)

testNamenodeRestart(org.apache.hadoop.hdfs.web.TestWebHDFS)  Time elapsed: 
0.342 sec  <<< ERROR!
java.lang.IllegalArgumentException: length != 
10(unixSymbolicPermission=drwxrwxr-x.)
at 
org.apache.hadoop.fs.permission.FsPermission.valueOf(FsPermission.java:323)
at 
org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:572)
at 
org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:540)
at 
org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:129)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:146)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataNodeDiskChecker.checkDir(DataNode.java:1835)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:1877)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1859)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1764)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1243)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:699)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:359)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:340)
at 
org.apache.hadoop.hdfs.TestDFSClientRetries.namenodeRestartTest(TestDFSClientRetries.java:886)
at 
org.apache.hadoop.hdfs.web.TestWebHDFS.testNamenode

[jira] [Resolved] (HDFS-5987) Fix findbugs warnings in Rolling Upgrade branch

2014-02-20 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-5987.
-

  Resolution: Fixed
   Fix Version/s: HDFS-5535 (Rolling upgrades)
Target Version/s: HDFS-5535 (Rolling upgrades)
Hadoop Flags: Reviewed

+1 for the patch.

Thanks for fixing these Nicholas!

> Fix findbugs warnings in Rolling Upgrade branch
> ---
>
> Key: HDFS-5987
> URL: https://issues.apache.org/jira/browse/HDFS-5987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Fix For: HDFS-5535 (Rolling upgrades)
>
> Attachments: h5987_20140220.patch
>
>
> {noformat}
> RV
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
>  ignores exceptional return value of java.io.File.mkdirs()
> RV
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
>  ignores exceptional return value of java.io.File.renameTo(File)
> RV
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$ReplicaFileDeleteTask.moveFiles()
>  ignores exceptional return value of java.io.File.mkdirs()
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.qjournal.server.Journal.committedTxnId; locked 92% of 
> time
> NPDereference of the result of readLine() without nullcheck in 
> org.apache.hadoop.hdfs.util.MD5FileUtils.renameMD5File(File, File)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

2014-02-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907687#comment-13907687
 ] 

Suresh Srinivas commented on HDFS-5840:
---

[~atm], sorry for the late reply. I had lost track of this.

{quote}
As for handling the partial upgrade failure as you've described, I'd like to 
add one more RPC call to the JournalManager to initiate analysis/recovery of 
the storage dirs upon first contact, and then refactor the contents of 
FSImage#recoverStorageDirs into NNUpgradeUtil just like was done with the other 
upgrade-related procedures. If this sounds OK to you, I'll go ahead and add 
that stuff and appropriate tests.
{quote}
Why not always recover in preupgrade/upgrade step, instead of adding another 
RPC?

With rolling upgrade getting ready, some of the functionality added in that may 
be useful. For partial failures related to JournalNodes, the choice made in 
that feature to make the operation to rollback JournalNode idempotent. It looks 
like lot of rolling upgrade related code can be leveraged here, since upgrade 
is a special case of rolling upgrade. Should we explore that?

> Follow-up to HDFS-5138 to improve error handling during partial upgrade 
> failures
> 
>
> Key: HDFS-5840
> URL: https://issues.apache.org/jira/browse/HDFS-5840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 3.0.0
>
> Attachments: HDFS-5840.patch
>
>
> Suresh posted some good comment in HDFS-5138 after that patch had already 
> been committed to trunk. This JIRA is to address those. See the first comment 
> of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5957) Provide support for different mmap cache retention policies in ShortCircuitCache.

2014-02-20 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907681#comment-13907681
 ] 

Chris Nauroth commented on HDFS-5957:
-

bq. mmap regions don't consume physical memory. They do consume virtual memory.

YARN has checks on both physical and virtual memory.  I reviewed the logs from 
the application, and it is in fact the physical memory threshold that was 
exceeded.  YARN calculates this by checking /proc/pid/stat for the RSS and 
multiplying by page size.  The process was well within the virtual memory 
threshold, so virtual address space was not a problem.

{code}
containerID=container_1392067467498_0193_01_000282] is running beyond physical 
memory limits. Current usage: 4.5 GB of 4 GB physical memory used; 9.4 GB of 40 
GB virtual memory used. Killing container.

Dump of the process-tree for container_1392067467498_0193_01_000282 :

|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE

|- 27095 27015 27015 27015 (java) 8640 1190 9959014400 1189585 
/grid/0/jdk/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN -server -Xmx3584m 
-Djava.net.preferIPv4Stack=true -XX:+UseNUMA -XX:+UseParallelGC 
-Dlog4j.configuration=tez-container-log4j.properties 
-Dyarn.app.container.log.dir=/grid/4/cluster/yarn/logs/application_1392067467498_0193/container_1392067467498_0193_01_000282
 -Dtez.root.logger=INFO,CLA 
-Djava.io.tmpdir=/grid/4/cluster/yarn/local/usercache/gopal/appcache/application_1392067467498_0193/container_1392067467498_0193_01_000282/tmp
 org.apache.hadoop.mapred.YarnTezDagChild 172.19.0.45 38627 
container_1392067467498_0193_01_000282 application_1392067467498_0193 1 
{code}

bq. I don't think YARN should limit the consumption of virtual memory. virtual 
memory imposes almost no cost on the system and limiting it leads to problems 
like this one.

I don't know the full history behind the virtual memory threshold.  I've always 
assumed that it was in place to guard against virtual address space exhaustion 
and possible intervention by the OOM killer.  So far, the virtual memory 
threshold doesn't appear to be a factor in this case.

bq. It should be possible to limit the consumption of actual memory (not 
virtual address space) and solve this problem that way. What do you think?

Yes, I agree that the issue here is physical memory based on the logs.  What we 
know at this point is that short-circuit reads were counted against the 
process's RSS, eventually triggering YARN's physical memory check.  Then, 
downtuning {{dfs.client.mmap.cache.timeout.ms}} made the problem go away.  I 
think we can come up with a minimal repro that demonstrates it.  Gopal might 
even already have this.

bq. In our tests, mmap provided no performance advantage unless it was reused. 
If Gopal needs to purge mmaps immediately after using them, the correct thing 
is simply not to use zero-copy reads.

Yes, something doesn't quite jive here.  [~gopalv], can you comment on whether 
or not you're seeing a performance benefit with zero-copy read after 
down-tuning {{dfs.client.mmap.cache.timeout.ms}} like I advised?  If so, then 
did I miss something in the description of your application's access pattern?

> Provide support for different mmap cache retention policies in 
> ShortCircuitCache.
> -
>
> Key: HDFS-5957
> URL: https://issues.apache.org/jira/browse/HDFS-5957
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Chris Nauroth
>
> Currently, the {{ShortCircuitCache}} retains {{mmap}} regions for reuse by 
> multiple reads of the same block or by multiple threads.  The eventual 
> {{munmap}} executes on a background thread after an expiration period.  Some 
> client usage patterns would prefer strict bounds on this cache and 
> deterministic cleanup by calling {{munmap}}.  This issue proposes additional 
> support for different caching policies that better fit these usage patterns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5988) Bad fsimage always generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5988:
--

Summary: Bad fsimage always generated after upgrade  (was: Bad fsimage 
generated after upgrade)

> Bad fsimage always generated after upgrade
> --
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-5988) Bad fsimage generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5988 started by Andrew Wang.

> Bad fsimage generated after upgrade
> ---
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5988) Bad fsimage generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5988:
--

Attachment: hdfs-5988-1.patch

{{FSImageFormat#Loader}} was incorrectly basing the decision to populate the 
{{FSDirectory#inodeMap}} based on if the old fsimage layout version supported 
inodes. We only see the error with the new PB-based image, since it iterates 
through {{inodeMap}}, while the old fsimage saver would traverse the directory 
structure instead.

I also added a bunch of trace/debug logging to OIV, which was helpful in 
tracking down this issue. Trust me, lot of effort for a one-line fix :)

> Bad fsimage generated after upgrade
> ---
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5988) Bad fsimage generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5988:
--

Status: Patch Available  (was: In Progress)

> Bad fsimage generated after upgrade
> ---
>
> Key: HDFS-5988
> URL: https://issues.apache.org/jira/browse/HDFS-5988
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Blocker
> Attachments: hdfs-5988-1.patch
>
>
> Internal testing revealed an issue where, after upgrading from an earlier 
> release, we always fail to save a correct PB-based fsimage (namely, missing 
> inodes leading to an inconsistent namespace). This results in substantial 
> data loss, since the upgraded fsimage is broken, as well as the fsimages 
> generated by saveNamespace and checkpointing.
> This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5988) Bad fsimage generated after upgrade

2014-02-20 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-5988:
-

 Summary: Bad fsimage generated after upgrade
 Key: HDFS-5988
 URL: https://issues.apache.org/jira/browse/HDFS-5988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Blocker


Internal testing revealed an issue where, after upgrading from an earlier 
release, we always fail to save a correct PB-based fsimage (namely, missing 
inodes leading to an inconsistent namespace). This results in substantial data 
loss, since the upgraded fsimage is broken, as well as the fsimages generated 
by saveNamespace and checkpointing.

This ended up being a bug in the old fsimage loading code, patch coming.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint

2014-02-20 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5944:
-

Fix Version/s: 1.3.0

> LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and 
> cause SecondaryNameNode failed do checkpoint
> 
>
> Key: HDFS-5944
> URL: https://issues.apache.org/jira/browse/HDFS-5944
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Fix For: 1.3.0, 2.4.0
>
> Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
> HDFS-5944.test.txt, HDFS-5944.trunk.patch
>
>
> In our cluster, we encountered error like this:
> java.io.IOException: saveLeases found path 
> /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
> What happened:
> Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
> And Client A continue refresh it's lease.
> Client B deleted /XXX/20140206/04_30/
> Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
> Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
> Then secondaryNameNode try to do checkpoint and failed due to failed to 
> delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
> The reason is a bug in findLeaseWithPrefixPath:
>  int srclen = prefix.length();
>  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
> entries.put(entry.getKey(), entry.getValue());
>   }
> Here when prefix is /XXX/20140206/04_30/, and p is 
> /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
> The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907661#comment-13907661
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5939:
--

Users won't see the log.  I think we don't need to add the log statement.  This 
is the same as we won't log when a file is not found.  I also suggest not to 
add the new NoDatanodeException - simply use IOException, put the detail 
message there and put InvalidTopologyException as the cause.

> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5951) Provide diagnosis information in the Web UI

2014-02-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907653#comment-13907653
 ] 

Suresh Srinivas commented on HDFS-5951:
---

I think the scope of this jira is probably misunderstood. The proposal is not 
to do away the monitoring systems. Frequently I see many issues that can be 
flagged from HDFS itself. To name a few: 
# Configuration issues
#* Using /tmp for storage
#* For a given size of the cluster getting ipc handler count wrong, number of 
datanode transceivers wrong, and ulimit for daemons wrong etc.
#* JVM heap size misconfiguration for the size of the cluster and for the 
number of the objects etc.
# Flag issues that need to be addressed, which sometimes is missed even with 
monitoring in place, where alerts are categorized incorrectly or were ignored.
#* Checkpoints not happening (I know instances where missing this has resulted 
in startup times of clusters over 18 hours!)
#* Growth in editlog size or editlog.
#* Corruption in fsimage and editlog checkpointing silently ignored.

Some of these are covered in best practices documents that vendors put out or 
in hadoop operations related tech talks. Some of them can be covered in this 
WebUI where issues described can be flagged, with information on why it needs 
to be addressed and how to address it.

> Provide diagnosis information in the Web UI
> ---
>
> Key: HDFS-5951
> URL: https://issues.apache.org/jira/browse/HDFS-5951
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HDFS-5951.000.patch, diagnosis-failure.png, 
> diagnosis-succeed.png
>
>
> HDFS should provide operation statistics in its UI. it can go one step 
> further by leveraging the information to diagnose common problems.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint

2014-02-20 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5944:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and 
> cause SecondaryNameNode failed do checkpoint
> 
>
> Key: HDFS-5944
> URL: https://issues.apache.org/jira/browse/HDFS-5944
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
> HDFS-5944.test.txt, HDFS-5944.trunk.patch
>
>
> In our cluster, we encountered error like this:
> java.io.IOException: saveLeases found path 
> /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
> What happened:
> Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
> And Client A continue refresh it's lease.
> Client B deleted /XXX/20140206/04_30/
> Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
> Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
> Then secondaryNameNode try to do checkpoint and failed due to failed to 
> delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
> The reason is a bug in findLeaseWithPrefixPath:
>  int srclen = prefix.length();
>  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
> entries.put(entry.getKey(), entry.getValue());
>   }
> Here when prefix is /XXX/20140206/04_30/, and p is 
> /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
> The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint

2014-02-20 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5944:
-

Fix Version/s: 2.4.0

> LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and 
> cause SecondaryNameNode failed do checkpoint
> 
>
> Key: HDFS-5944
> URL: https://issues.apache.org/jira/browse/HDFS-5944
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Fix For: 2.4.0
>
> Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
> HDFS-5944.test.txt, HDFS-5944.trunk.patch
>
>
> In our cluster, we encountered error like this:
> java.io.IOException: saveLeases found path 
> /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
> What happened:
> Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
> And Client A continue refresh it's lease.
> Client B deleted /XXX/20140206/04_30/
> Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
> Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
> Then secondaryNameNode try to do checkpoint and failed due to failed to 
> delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
> The reason is a bug in findLeaseWithPrefixPath:
>  int srclen = prefix.length();
>  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
> entries.put(entry.getKey(), entry.getValue());
>   }
> Here when prefix is /XXX/20140206/04_30/, and p is 
> /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
> The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint

2014-02-20 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907644#comment-13907644
 ] 

Brandon Li commented on HDFS-5944:
--

I've committed the patch.

> LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and 
> cause SecondaryNameNode failed do checkpoint
> 
>
> Key: HDFS-5944
> URL: https://issues.apache.org/jira/browse/HDFS-5944
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Fix For: 2.4.0
>
> Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
> HDFS-5944.test.txt, HDFS-5944.trunk.patch
>
>
> In our cluster, we encountered error like this:
> java.io.IOException: saveLeases found path 
> /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
> What happened:
> Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
> And Client A continue refresh it's lease.
> Client B deleted /XXX/20140206/04_30/
> Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
> Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
> Then secondaryNameNode try to do checkpoint and failed due to failed to 
> delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
> The reason is a bug in findLeaseWithPrefixPath:
>  int srclen = prefix.length();
>  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
> entries.put(entry.getKey(), entry.getValue());
>   }
> Here when prefix is /XXX/20140206/04_30/, and p is 
> /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
> The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-20 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907640#comment-13907640
 ] 

Yongjun Zhang commented on HDFS-5939:
-

HI Haohui. 

At least we got a report from the field that we need to provide better message 
so user can quickly tell what's going on. I wonder if a WARN instead of an 
ERROR is more acceptable? 

Thanks.



> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5977) FSImageFormatPBINode does not respect "-renameReserved" upgrade flag

2014-02-20 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-5977.
--

Resolution: Later

> FSImageFormatPBINode does not respect "-renameReserved" upgrade flag
> 
>
> Key: HDFS-5977
> URL: https://issues.apache.org/jira/browse/HDFS-5977
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>  Labels: protobuf
>
> HDFS-5709 added a new upgrade flag "-renameReserved" which can be used to 
> automatically rename reserved paths like "/.reserved" encountered during 
> upgrade. The new protobuf loading code does not have a similar facility, so 
> future reserved paths cannot be automatically renamed via "-renameReserved".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5987) Fix findbugs warnings in Rolling Upgrade branch

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5987:
-

Attachment: h5987_20140220.patch

h5987_20140220.patch: fixes the findbugs warning and adds more cases to 
TestRollingUpgrade.testRollback().

> Fix findbugs warnings in Rolling Upgrade branch
> ---
>
> Key: HDFS-5987
> URL: https://issues.apache.org/jira/browse/HDFS-5987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
> Attachments: h5987_20140220.patch
>
>
> {noformat}
> RV
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
>  ignores exceptional return value of java.io.File.mkdirs()
> RV
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
>  ignores exceptional return value of java.io.File.renameTo(File)
> RV
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$ReplicaFileDeleteTask.moveFiles()
>  ignores exceptional return value of java.io.File.mkdirs()
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.qjournal.server.Journal.committedTxnId; locked 92% of 
> time
> NPDereference of the result of readLine() without nullcheck in 
> org.apache.hadoop.hdfs.util.MD5FileUtils.renameMD5File(File, File)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5986) Capture the number of blocks pending deletion on namenode webUI

2014-02-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907636#comment-13907636
 ] 

Suresh Srinivas commented on HDFS-5986:
---

Yes. It is the PendingDeletetionBlocksCount from invalidateBlocks. I like what 
@atm suggested as well. I do not think there is a metrics corresponding to this.

> Capture the number of blocks pending deletion on namenode webUI
> ---
>
> Key: HDFS-5986
> URL: https://issues.apache.org/jira/browse/HDFS-5986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Suresh Srinivas
>
> When a directory that has large number of directories and files are deleted, 
> the namespace deletes the corresponding inodes immediately. However it is 
> hard to to know when the invalidated blocks are actually deleted on the 
> datanodes, which could take a while.
> I propose adding on namenode webUI, along with under replicated blocks, the 
> number of blocks that are pending deletion.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5987) Fix findbugs warnings in Rolling Upgrade branch

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5987:
-

Description: 
{noformat}
RV  
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
 ignores exceptional return value of java.io.File.mkdirs()
RV  
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
 ignores exceptional return value of java.io.File.renameTo(File)
RV  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$ReplicaFileDeleteTask.moveFiles()
 ignores exceptional return value of java.io.File.mkdirs()
IS  Inconsistent synchronization of 
org.apache.hadoop.hdfs.qjournal.server.Journal.committedTxnId; locked 92% of 
time
NP  Dereference of the result of readLine() without nullcheck in 
org.apache.hadoop.hdfs.util.MD5FileUtils.renameMD5File(File, File)
{noformat}

> Fix findbugs warnings in Rolling Upgrade branch
> ---
>
> Key: HDFS-5987
> URL: https://issues.apache.org/jira/browse/HDFS-5987
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, ha, hdfs-client, namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Minor
>
> {noformat}
> RV
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
>  ignores exceptional return value of java.io.File.mkdirs()
> RV
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.restoreBlockFilesFromTrash(File)
>  ignores exceptional return value of java.io.File.renameTo(File)
> RV
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$ReplicaFileDeleteTask.moveFiles()
>  ignores exceptional return value of java.io.File.mkdirs()
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.qjournal.server.Journal.committedTxnId; locked 92% of 
> time
> NPDereference of the result of readLine() without nullcheck in 
> org.apache.hadoop.hdfs.util.MD5FileUtils.renameMD5File(File, File)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster

2014-02-20 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907615#comment-13907615
 ] 

Haohui Mai commented on HDFS-5939:
--

bq. The case reported in this bug is about no datanode is running, which is 
about unhealthy cluster and definitely need to catch operator's attention. So I 
think it makes sense to log a message in server log. Do you still think we 
don't need to log an error there? It could save the operator time to 
investigate the problem.

Personally I think it is an overkill. Note that if this happens, it means that 
either (1) all datanodes are dead, or (2) there at least one block is missing 
(i.e., no datanodes can serve it) in HDFS. Both the web UI and the monitoring 
applications (e.g., Ambari / CDH) would catch it much earlier before the 
operator looks into the log. The log has little value since it cannot flag the 
error at the first place, and it provides sufficient information to reproduce 
the error (in this case only the client can reproduce it in a reliable way).

> WebHdfs returns misleading error code and logs nothing if trying to create a 
> file with no DNs in cluster
> 
>
> Key: HDFS-5939
> URL: https://issues.apache.org/jira/browse/HDFS-5939
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
> Attachments: HDFS-5939.001.patch, HDFS-5939.002.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will 
> see an exception below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT 
> ".../webhdfs/v1/t1?op=CREATE&user.name=&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
>  must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5987) Fix findbugs warnings in Rolling Upgrade branch

2014-02-20 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-5987:


 Summary: Fix findbugs warnings in Rolling Upgrade branch
 Key: HDFS-5987
 URL: https://issues.apache.org/jira/browse/HDFS-5987
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5986) Capture the number of blocks pending deletion on namenode webUI

2014-02-20 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5986:
-

Issue Type: Improvement  (was: Bug)

Seems like a decent idea to me. We should expose this as a metric as well, if 
not also in the NN web UI.

> Capture the number of blocks pending deletion on namenode webUI
> ---
>
> Key: HDFS-5986
> URL: https://issues.apache.org/jira/browse/HDFS-5986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Suresh Srinivas
>
> When a directory that has large number of directories and files are deleted, 
> the namespace deletes the corresponding inodes immediately. However it is 
> hard to to know when the invalidated blocks are actually deleted on the 
> datanodes, which could take a while.
> I propose adding on namenode webUI, along with under replicated blocks, the 
> number of blocks that are pending deletion.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5986) Capture the number of blocks pending deletion on namenode webUI

2014-02-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907627#comment-13907627
 ] 

Kihwal Lee commented on HDFS-5986:
--

The jmx on NN already has {{PendingDeletionBlocks}} and [~wheat9] made NN webUI 
render on the client-side using the jmx data, so it should be relatively a 
simple change.  Is {{PendingDeletionBlocks}} what we want, or is it something 
else?

> Capture the number of blocks pending deletion on namenode webUI
> ---
>
> Key: HDFS-5986
> URL: https://issues.apache.org/jira/browse/HDFS-5986
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Suresh Srinivas
>
> When a directory that has large number of directories and files are deleted, 
> the namespace deletes the corresponding inodes immediately. However it is 
> hard to to know when the invalidated blocks are actually deleted on the 
> datanodes, which could take a while.
> I propose adding on namenode webUI, along with under replicated blocks, the 
> number of blocks that are pending deletion.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint

2014-02-20 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5944:
-

Summary: LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ 
right and cause SecondaryNameNode failed do checkpoint  (was: 
LeaseManager:findLeaseWithPrefixPath didn't handle path like /a/b/ right cause 
SecondaryNameNode failed do checkpoint)

> LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and 
> cause SecondaryNameNode failed do checkpoint
> 
>
> Key: HDFS-5944
> URL: https://issues.apache.org/jira/browse/HDFS-5944
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.2.0, 2.2.0
>Reporter: zhaoyunjiong
>Assignee: zhaoyunjiong
> Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, 
> HDFS-5944.test.txt, HDFS-5944.trunk.patch
>
>
> In our cluster, we encountered error like this:
> java.io.IOException: saveLeases found path 
> /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949)
> What happened:
> Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write.
> And Client A continue refresh it's lease.
> Client B deleted /XXX/20140206/04_30/
> Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write
> Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log
> Then secondaryNameNode try to do checkpoint and failed due to failed to 
> delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/.
> The reason is a bug in findLeaseWithPrefixPath:
>  int srclen = prefix.length();
>  if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) {
> entries.put(entry.getKey(), entry.getValue());
>   }
> Here when prefix is /XXX/20140206/04_30/, and p is 
> /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'.
> The fix is simple, I'll upload patch later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5977) FSImageFormatPBINode does not respect "-renameReserved" upgrade flag

2014-02-20 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907623#comment-13907623
 ] 

Haohui Mai commented on HDFS-5977:
--

Thanks [~andrew.wang] and [~sureshms] for the info. Let me resolve this jira as 
later and keep it around. We can reopen this jira if we need to add another 
reserved path.

> FSImageFormatPBINode does not respect "-renameReserved" upgrade flag
> 
>
> Key: HDFS-5977
> URL: https://issues.apache.org/jira/browse/HDFS-5977
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.0
>Reporter: Andrew Wang
>  Labels: protobuf
>
> HDFS-5709 added a new upgrade flag "-renameReserved" which can be used to 
> automatically rename reserved paths like "/.reserved" encountered during 
> upgrade. The new protobuf loading code does not have a similar facility, so 
> future reserved paths cannot be automatically renamed via "-renameReserved".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >