[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560419#comment-13560419
 ] 

Suresh Srinivas commented on HDFS-4426:
---

Arpit, the findbugs warnings flagged is valid and needs to be fixed. I also 
noticed that the method join() need not be public.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch, HDFS-4426.patch, HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560413#comment-13560413
 ] 

Hadoop QA commented on HDFS-4426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566086/HDFS-4426.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3871//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3871//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3871//console

This message is automatically generated.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch, HDFS-4426.patch, HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread project (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560407#comment-13560407
 ] 

project commented on HDFS-4425:
---

Thanks Harsh, Do I need to add parameters in hdfs-site.xml and restart
service namenode.

+  public static final String  DFS_NAMENODE_DU_RESERVED_KEY =
"dfs.namenode.resource.du.
reserved";
+  public static final longDFS_NAMENODE_DU_RESERVED_DEFAULT = 1024 *
1024 * 100; // 100 MB






> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: project
>Priority: Critical
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4425.
---

Resolution: Invalid

The Apache JIRA is not for user help but only for confirmed bug reports. Please 
send usage help requests such as your questions to u...@hadoop.apache.org.

I'm resolving this as Invalid; lets carry forward on your email instead. Many 
have already answered you there. The key to tweak the default is 
dfs.namenode.resource.du.reserved.

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: project
>Priority: Critical
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4426:
--

Attachment: HDFS-4426.patch

Hopefully this time correctly rebased patch.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch, HDFS-4426.patch, HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560374#comment-13560374
 ] 

Hadoop QA commented on HDFS-4426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566085/HDFS-4426.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3870//console

This message is automatically generated.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch, HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4426:
--

Attachment: HDFS-4426.patch

rebased patch.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch, HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread project (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

project updated HDFS-4425:
--

Affects Version/s: (was: 0.23.5)
   (was: 2.0.2-alpha)
   2.0.0-alpha

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: project
>Priority: Critical
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread project (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

project updated HDFS-4425:
--

Priority: Critical  (was: Minor)

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: project
>Priority: Critical
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread project (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560370#comment-13560370
 ] 

project commented on HDFS-4425:
---

version is 

Hadoop 2.0.0-cdh4.1.2

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: project
>Priority: Minor
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560368#comment-13560368
 ] 

Hadoop QA commented on HDFS-4426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566081/HDFS-4426.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3869//console

This message is automatically generated.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-4426:
--

Status: Patch Available  (was: Open)

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560366#comment-13560366
 ] 

Suresh Srinivas commented on HDFS-4426:
---

I will commit it tomorrow morning.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560365#comment-13560365
 ] 

Suresh Srinivas commented on HDFS-4426:
---

+1 for the change.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4430) Add a test case for NameNode/SecondaryNameNode startup

2013-01-22 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4430:


Summary: Add a test case for NameNode/SecondaryNameNode startup  (was: We 
need a test case for NameNode/SecondaryNameNode startup)

> Add a test case for NameNode/SecondaryNameNode startup
> --
>
> Key: HDFS-4430
> URL: https://issues.apache.org/jira/browse/HDFS-4430
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>
> The existing unit tests did not catch the regression introduced by 
> HADOOP-9181. We need to test for this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560355#comment-13560355
 ] 

Arpit Agarwal commented on HDFS-4426:
-

I forgot to add that I verified the patch manually.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4426:


Attachment: HDFS-4426.patch

Attached a patch to handle this like the NameNode. Thanks to Suresh for the 
suggesting the fix.

I have not added a new test case as this appears non-trivial to test with 
JUnit. I filed HDFS-4430 to investigate adding a test.

The existing unit tests did not catch the regression because the server did not 
need to survive beyond the lifetime of the calling JUnit thread.

Liang, if I understand you then adding such a configuration knob would not have 
helped in this situation.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-4426.patch
>
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4131) Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree

2013-01-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560351#comment-13560351
 ] 

Jing Zhao commented on HDFS-4131:
-

The diff printout from the new testcase TestSnapshotDiffReport looks like:
{noformat}
Diffence between snapshot s0 and snapshot s2 under directory /TestSnapshot/sub1:
M   /TestSnapshot/sub1
+   /TestSnapshot/sub1/file15
-   /TestSnapshot/sub1/file12
M   /TestSnapshot/sub1/file11
M   /TestSnapshot/sub1/file13

Diffence between snapshot s0 and snapshot s5 under directory /TestSnapshot/sub1:
M   /TestSnapshot/sub1
+   /TestSnapshot/sub1/file15
+   /TestSnapshot/sub1/subsub1
-   /TestSnapshot/sub1/file12
M   /TestSnapshot/sub1/file10
M   /TestSnapshot/sub1/file11
M   /TestSnapshot/sub1/file13

Diffence between snapshot s0 and current directory under directory 
/TestSnapshot/sub1:
M   /TestSnapshot/sub1
+   /TestSnapshot/sub1/file15
+   /TestSnapshot/sub1/subsub1
-   /TestSnapshot/sub1/file12
M   /TestSnapshot/sub1/file10
M   /TestSnapshot/sub1/file11
M   /TestSnapshot/sub1/file13
{noformat}
where M/+/-/R denote modified/created/deleted/renamed respectively (rename is 
not supported in the diff computation currently).

> Add a tool to print the diff between two snapshots and diff of a snapshot 
> from the current tree
> ---
>
> Key: HDFS-4131
> URL: https://issues.apache.org/jira/browse/HDFS-4131
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4131.001.patch, HDFS-4131.002.patch, 
> HDFS-4131.003.patch
>
>
> This jira tracks tool to print diff between an two snapshots at a given path. 
> The tool will also print the difference between the current directory and the 
> given snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4430) We need a test case for NameNode/SecondaryNameNode startup

2013-01-22 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-4430:
---

 Summary: We need a test case for NameNode/SecondaryNameNode startup
 Key: HDFS-4430
 URL: https://issues.apache.org/jira/browse/HDFS-4430
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


The existing unit tests did not catch the regression introduced by HADOOP-9181. 
We need to test for this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4131) Add a tool to print the diff between two snapshots and diff of a snapshot from the current tree

2013-01-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4131:


Attachment: HDFS-4131.003.patch

Rebase the patch.

> Add a tool to print the diff between two snapshots and diff of a snapshot 
> from the current tree
> ---
>
> Key: HDFS-4131
> URL: https://issues.apache.org/jira/browse/HDFS-4131
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4131.001.patch, HDFS-4131.002.patch, 
> HDFS-4131.003.patch
>
>
> This jira tracks tool to print diff between an two snapshots at a given path. 
> The tool will also print the difference between the current directory and the 
> given snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560345#comment-13560345
 ] 

liang xie commented on HDFS-4426:
-

yes, i did run the whole test cases at my devbox beofre, no failure...


maybe we can:
1) add a new "isDaemon" parameter into HttpServer's constructor, and making the 
default value is "false". but the current parameter list is too long enough
or
2) introduce a new configuration key,  HttpServer's constructor has a parameter 
named "conf"

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560341#comment-13560341
 ] 

Suresh Srinivas commented on HDFS-4426:
---

bq. Suresh Srinivas, i guess a better choice is that we should let the 
HttpServer'daemon flag could be set in his constructor,right?
I am not sure I follow you. How does it solve the problem?

bq. very very sorry for this trouble...
These things do happen. It is strange that unit tests did not catch this issue!

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4126) Add reading/writing snapshot information to FSImage

2013-01-22 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-4126.
---

   Resolution: Fixed
Fix Version/s: Snapshot (HDFS-2802)
 Hadoop Flags: Reviewed

I committed the patch to HDFS-2802 branch.

Thank you Jing!

> Add reading/writing snapshot information to FSImage
> ---
>
> Key: HDFS-4126
> URL: https://issues.apache.org/jira/browse/HDFS-4126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Fix For: Snapshot (HDFS-2802)
>
> Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, 
> HDFS-4126.002.patch, HDFS-4126.003.patch
>
>
> After the changes proposed in HDFS-4125 is completed, reading and writing 
> snapshot related information from FSImage can be implemented. This jira 
> tracks changes required for:
> # Loading snapshot information from FSImage
> # Loading snapshot related operations from editlog
> # Writing snapshot information in FSImage
> # Unit tests related to this functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560336#comment-13560336
 ] 

liang xie commented on HDFS-4426:
-

[~sureshms], i guess a better choice is that we should let the 
HttpServer'daemon flag could be set in his constructor,right?  
very very sorry for this trouble...

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4126) Add reading/writing snapshot information to FSImage

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560333#comment-13560333
 ] 

Suresh Srinivas commented on HDFS-4126:
---

Thanks for addressing the comments. +1 for the patch.

> Add reading/writing snapshot information to FSImage
> ---
>
> Key: HDFS-4126
> URL: https://issues.apache.org/jira/browse/HDFS-4126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, 
> HDFS-4126.002.patch, HDFS-4126.003.patch
>
>
> After the changes proposed in HDFS-4125 is completed, reading and writing 
> snapshot related information from FSImage can be implemented. This jira 
> tracks changes required for:
> # Loading snapshot information from FSImage
> # Loading snapshot related operations from editlog
> # Writing snapshot information in FSImage
> # Unit tests related to this functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent

2013-01-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560332#comment-13560332
 ] 

Jing Zhao commented on HDFS-4350:
-

The patch also looks good to me. One minor issue: the unused "import 
com.google.common.base.Preconditions;" can be removed from 
HeartbeatManager.java.

> Make enabling of stale marking on read and write paths independent
> --
>
> Key: HDFS-4350
> URL: https://issues.apache.org/jira/browse/HDFS-4350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, 
> hdfs-4350-4.patch, hdfs-4350.txt
>
>
> Marking of datanodes as stale for the read and write path was introduced in 
> HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, 
> {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently 
> exists a dependency, since you cannot enable write marking without also 
> enabling read marking, since the first key enables both checking of staleness 
> and read marking.
> I propose renaming the first key to 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled 
> if either of the keys are set. This will allow read and write marking to be 
> enabled independently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560317#comment-13560317
 ] 

Suresh Srinivas commented on HDFS-4426:
---

bq. The 2NN creates a daemon thread of itself and implicitly relies on other 
threads to keep the process alive - is this a case of two wrongs make a right? 
Or is there a technical reason why the 2NN shouldn't simply do it's work in the 
main thread?
We should follow the same pattern as the namenode and wait for maint 
thread/threads to end. So main thread should wait on join call. 

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560316#comment-13560316
 ] 

Hadoop QA commented on HDFS-4237:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566056/HDFS-4237.patch.008
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3868//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3868//console

This message is automatically generated.

> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007, 
> HDFS-4237.patch.008
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560303#comment-13560303
 ] 

Suresh Srinivas commented on HDFS-4350:
---

This change was made available in 2.0.2-alpha and has been backported to 1.1.0. 
So changing the configuration name and semantics will have impact on those 
releases. How is it going to be handled?

> Make enabling of stale marking on read and write paths independent
> --
>
> Key: HDFS-4350
> URL: https://issues.apache.org/jira/browse/HDFS-4350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, 
> hdfs-4350-4.patch, hdfs-4350.txt
>
>
> Marking of datanodes as stale for the read and write path was introduced in 
> HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, 
> {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently 
> exists a dependency, since you cannot enable write marking without also 
> enabling read marking, since the first key enables both checking of staleness 
> and read marking.
> I propose renaming the first key to 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled 
> if either of the keys are set. This will allow read and write marking to be 
> enabled independently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4339) Persist inode id in fsimage and editlog

2013-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560284#comment-13560284
 ] 

Hadoop QA commented on HDFS-4339:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12566051/HDFS-4339.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3867//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3867//console

This message is automatically generated.

> Persist inode id in fsimage and editlog
> ---
>
> Key: HDFS-4339
> URL: https://issues.apache.org/jira/browse/HDFS-4339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: editsStored, HDFS-4339.patch, HDFS-4339.patch, 
> HDFS-4339.patch, HDFS-4339.patch
>
>
>  Persist inode id in fsimage and editlog and update offline viewers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4417:
---

Attachment: fail.patch

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: fail.patch, HDFS-4417.002.patch, HDFS-4417.003.patch, 
> HDFS-4417.004.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4417:
---

Attachment: (was: fail.patch)

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: fail.patch, HDFS-4417.002.patch, HDFS-4417.003.patch, 
> HDFS-4417.004.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4417:
---

Attachment: fail.patch

Hi Todd,

In order to verify beyond any doubt that 
{{TestParallelShortCircuitReadUnCached}} is a regression test for HDFS-4417, I 
produced a patch with just the new test and nothing else.  (I also changed 
DFSInputStream to throw an exception if a TCP peer was created.)  I verified 
that it failed, indicating that the test really does catch HDFS-4417.  Here's 
the patch, called fail.patch

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: fail.patch, HDFS-4417.002.patch, HDFS-4417.003.patch, 
> HDFS-4417.004.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4429) Add unit tests for taking snapshots while file appending

2013-01-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4429:


Attachment: HDFS-4429.000.patch

Initial patch. Part of the change has been included in HDFS-4126. Will update 
the patch after HDFS-4126 going through.

> Add unit tests for taking snapshots while file appending
> 
>
> Key: HDFS-4429
> URL: https://issues.apache.org/jira/browse/HDFS-4429
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-4429.000.patch
>
>
> Add unit tests for INodeFileUnderConstructionWithSnapshot, where taking 
> snapshots after/while file appending is tested. Also fix some related bugs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4429) Add unit tests for taking snapshots while file appending

2013-01-22 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-4429:
---

 Summary: Add unit tests for taking snapshots while file appending
 Key: HDFS-4429
 URL: https://issues.apache.org/jira/browse/HDFS-4429
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


Add unit tests for INodeFileUnderConstructionWithSnapshot, where taking 
snapshots after/while file appending is tested. Also fix some related bugs.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4350) Make enabling of stale marking on read and write paths independent

2013-01-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560252#comment-13560252
 ] 

Aaron T. Myers commented on HDFS-4350:
--

Patch looks pretty good to me, and I agree that the test failure seems 
unrelated. One little comment, this doesn't parse so well:
bq. // When the number stale datanodes marked as stale reaches this ratio, 

+1 once this is addressed.

> Make enabling of stale marking on read and write paths independent
> --
>
> Key: HDFS-4350
> URL: https://issues.apache.org/jira/browse/HDFS-4350
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-4350-1.patch, hdfs-4350-2.patch, hdfs-4350-3.patch, 
> hdfs-4350-4.patch, hdfs-4350.txt
>
>
> Marking of datanodes as stale for the read and write path was introduced in 
> HDFS-3703 and HDFS-3912 respectively. This is enabled using two new keys, 
> {{DFS_NAMENODE_CHECK_STALE_DATANODE_KEY}} and 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY}}. However, there currently 
> exists a dependency, since you cannot enable write marking without also 
> enabling read marking, since the first key enables both checking of staleness 
> and read marking.
> I propose renaming the first key to 
> {{DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_READ_KEY}}, and make checking enabled 
> if either of the keys are set. This will allow read and write marking to be 
> enabled independently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-22 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560248#comment-13560248
 ] 

Andy Isaacson commented on HDFS-4237:
-

patch.008 LGTM. +1 (non-binding).

> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007, 
> HDFS-4237.patch.008
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-22 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560236#comment-13560236
 ] 

Andy Isaacson commented on HDFS-4237:
-

bq. BTW, what do you use to catch these spaces?

I open the patch file in less and search for space.  Less highlights matches by 
default.  There are many similar tools; vim has a diff mode that can highlight 
bad whitespace, ReviewBoard turns it bright red, several Git GUI tools such as 
gitk will call it out, it's easy to write a git commit hook that refuses to 
allow a commit with bad whitespace.

> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007, 
> HDFS-4237.patch.008
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4237) Add unit tests for HTTP-based filesystems against secure MiniDFSCluster

2013-01-22 Thread Stephen Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen Chu updated HDFS-4237:
--

Attachment: HDFS-4237.patch.008

Thanks, Andy.

I removed the spaces ahead of "String address ..." and the spaces in the 
license comment in SecureHdfsTestUtil.java. BTW, what do you use to catch these 
spaces?

Now, TestSecureWebHdfsFileSystemContract will be skipped unless the test is 
being run with the external KDC.

I agree that it'd be good to investigate if we can teach the tests to run on a 
reasonable environment.

I'll add a new wiki page about how to develop and run secure unit tests. 

> Add unit tests for HTTP-based filesystems against secure MiniDFSCluster
> ---
>
> Key: HDFS-4237
> URL: https://issues.apache.org/jira/browse/HDFS-4237
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: security, test, webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: Stephen Chu
> Attachments: HDFS-4237.patch.001, HDFS-4237.patch.007, 
> HDFS-4237.patch.008
>
>
> Now that we can start a secure MiniDFSCluster (HADOOP-9004), we need more 
> security unit tests.
> A good area to add secure tests is the HTTP-based filesystems (WebHDFS, 
> HttpFs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4424) fsdataset Mkdirs failed cause nullpointexception and other bad consequence

2013-01-22 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560205#comment-13560205
 ] 

Brandon Li commented on HDFS-4424:
--

@gschen, you can subscribe the mailing list here: 
http://hadoop.apache.org/mailing_lists.html 

> fsdataset  Mkdirs failed  cause  nullpointexception and other bad  
> consequence 
> ---
>
> Key: HDFS-4424
> URL: https://issues.apache.org/jira/browse/HDFS-4424
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 1.0.1
>Reporter: Li Junjun
>
> File: /hadoop-1.0.1/hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java
> from line 205:
>   if (children == null || children.length == 0) {
> children = new FSDir[maxBlocksPerDir];
> for (int idx = 0; idx < maxBlocksPerDir; idx++) {
>   children[idx] = new FSDir(new File(dir, 
> DataStorage.BLOCK_SUBDIR_PREFIX+idx));
> }
>   }
> in FSDir constructer method if faild (  space full,so mkdir fails), but  
> the children still in use !
> the the write comes(after I run balancer ) , when choose FSDir 
> line 192:
> File file = children[idx].addBlock(b, src, false, resetIdx);
>  cause exceptions like this
>   at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.addBlock(FSDataset.java:192)
>   at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.addBlock(FSDataset.java:192)
>   at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.addBlock(FSDataset.java:158)
> 
> should it like this 
>   if (children == null || children.length == 0) {
>   List childrenList = new ArrayList();
> 
> for (int idx = 0; idx < maxBlocksPerDir; idx++) {
>   try{
>childrenList .add( new FSDir(new File(dir, 
> DataStorage.BLOCK_SUBDIR_PREFIX+idx)));
>   }catch(Exception e){
>   }
>   children = childrenList.toArray();
> }
>   }
> 
> bad consequence , in my cluster ,this datanode's num blocks became 0 .

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4339) Persist inode id in fsimage and editlog

2013-01-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-4339:
-

Attachment: HDFS-4339.patch

Rebased the patch.

> Persist inode id in fsimage and editlog
> ---
>
> Key: HDFS-4339
> URL: https://issues.apache.org/jira/browse/HDFS-4339
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: editsStored, HDFS-4339.patch, HDFS-4339.patch, 
> HDFS-4339.patch, HDFS-4339.patch
>
>
>  Persist inode id in fsimage and editlog and update offline viewers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560190#comment-13560190
 ] 

Colin Patrick McCabe commented on HDFS-4417:


The new test works by setting up a scenario where we will have a lot of stale 
UNIX domain sockets in the PeerCache.  It does this by setting the socket 
keepalive to 1 millisecond, enlarging the cache size to 32, and setting the 
cache expiry size to several minutes.  Then it sets 
{{DFSInputStream#tcpReadsDisabledForTesting}}, which will cause an exception if 
we try to read over a TCP socket.

The idea is to catch the issue we saw before where UNIX domain sockets were 
getting stale and causing the socket path to get blacklisted.  This bad 
behavior caused us to fall back on TCP sockets in cases where we shouldn't have.

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, 
> HDFS-4417.004.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560185#comment-13560185
 ] 

Todd Lipcon commented on HDFS-4417:
---

Can you explain how the new test works? Does it provide an effective regression 
test? (ie if you include the new test without this bug fix, does it actually 
fail?) I'm not 100% following. Otherwise the change looks good.

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, 
> HDFS-4417.004.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4340) Update addBlock() to inculde inode id as additional argument

2013-01-22 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-4340:
-

Attachment: HDFS-4340.patch

Upload a new patch with some code cleanup.

> Update addBlock() to inculde inode id as additional argument
> 
>
> Key: HDFS-4340
> URL: https://issues.apache.org/jira/browse/HDFS-4340
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Brandon Li
>Assignee: Brandon Li
> Attachments: HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, 
> HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, HDFS-4340.patch, 
> HDFS-4340.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4428) FsDatasetImpl should disclose what the error is when a rename fails

2013-01-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4428:
---

 Target Version/s: 2.0.3-alpha
Affects Version/s: 2.0.3-alpha

> FsDatasetImpl should disclose what the error is when a rename fails
> ---
>
> Key: HDFS-4428
> URL: https://issues.apache.org/jira/browse/HDFS-4428
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4428.001.patch
>
>
> It would be nice if {{FsDatasetImpl}} would print out an error message when a 
> rename fails, describing what went wrong.  This would make it a lot easier to 
> investigate and resolve test failures like HDFS-4051. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4428) FsDatasetImpl should disclose what the error is when a rename fails

2013-01-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4428:
---

Status: Patch Available  (was: Open)

> FsDatasetImpl should disclose what the error is when a rename fails
> ---
>
> Key: HDFS-4428
> URL: https://issues.apache.org/jira/browse/HDFS-4428
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4428.001.patch
>
>
> It would be nice if {{FsDatasetImpl}} would print out an error message when a 
> rename fails, describing what went wrong.  This would make it a lot easier to 
> investigate and resolve test failures like HDFS-4051. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4428) FsDatasetImpl should disclose what the error is when a rename fails

2013-01-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4428:
---

Attachment: HDFS-4428.001.patch

> FsDatasetImpl should disclose what the error is when a rename fails
> ---
>
> Key: HDFS-4428
> URL: https://issues.apache.org/jira/browse/HDFS-4428
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-4428.001.patch
>
>
> It would be nice if {{FsDatasetImpl}} would print out an error message when a 
> rename fails, describing what went wrong.  This would make it a lot easier to 
> investigate and resolve test failures like HDFS-4051. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4428) FsDatasetImpl should disclose what the error is when a rename fails

2013-01-22 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-4428:
--

 Summary: FsDatasetImpl should disclose what the error is when a 
rename fails
 Key: HDFS-4428
 URL: https://issues.apache.org/jira/browse/HDFS-4428
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-4428.001.patch

It would be nice if {{FsDatasetImpl}} would print out an error message when a 
rename fails, describing what went wrong.  This would make it a lot easier to 
investigate and resolve test failures like HDFS-4051. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4423) Checkpoint exception causes fatal damage to fsimage.

2013-01-22 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560118#comment-13560118
 ] 

Chris Nauroth commented on HDFS-4423:
-

Thank you for the detailed write-up, [~chenfolin].  I have one additional 
question.  You mentioned an exception causing {{NameNode}} to shutdown during 
checkpoint after writing latest name checkpoint time, but before writing latest 
edits checkpoint time.  Do you have details on that exception?  Was that 
exception related to this bug, or was it something unrelated that just exposed 
this problem in the {{loadFSImage}} logic?

Your assessment about the call to {{FSDirectory#updateCountForINodeWithQuota}} 
looks correct.  I'm thinking that we should move that call out of 
{{FSImage#loadFSEdits}} and into {{FSImage#loadFSImage}}, so that the end of 
{{loadFSImage}} would look like this:

{code}
boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
...
  // Load latest edits
  if (latestNameCheckpointTime > latestEditsCheckpointTime)
// the image is already current, discard edits
needToSave |= true;
  else // latestNameCheckpointTime == latestEditsCheckpointTime
needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);

  // update the counts.
  FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();
  return needToSave;
}
{code}

Moving the call there would help guarantee that it always happens.


> Checkpoint exception causes fatal damage to fsimage.
> 
>
> Key: HDFS-4423
> URL: https://issues.apache.org/jira/browse/HDFS-4423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 1.0.4, 1.1.1
> Environment: CentOS 6.2
>Reporter: ChenFolin
>Priority: Blocker
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> The impact of class is org.apache.hadoop.hdfs.server.namenode.FSImage.java
> {code}
> boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
> ...
> latestNameSD.read();
> needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
> LOG.info("Image file of size " + imageSize + " loaded in " 
> + (FSNamesystem.now() - startTime)/1000 + " seconds.");
> 
> // Load latest edits
> if (latestNameCheckpointTime > latestEditsCheckpointTime)
>   // the image is already current, discard edits
>   needToSave |= true;
> else // latestNameCheckpointTime == latestEditsCheckpointTime
>   needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
> 
> return needToSave;
>   }
> {code}
> If it is the normal flow of the checkpoint,the value of 
> latestNameCheckpointTime  is equal to the value of 
> latestEditsCheckpointTime,and it will exec “else”.
> The problem is that,latestNameCheckpointTime > latestEditsCheckpointTime:
> SecondNameNode starts checkpoint,
> ...
> NameNode:rollFSImage,NameNode shutdown after write latestNameCheckpointTime 
> and before write latestEditsCheckpointTime.
> Start NameNode:because latestNameCheckpointTime > 
> latestEditsCheckpointTime,so the value of needToSave is true, and it will not 
> update “rootDir”'s nsCount that is the cluster's file number(update exec at 
> loadFSEdits 
> “FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota()”),and then 
> “saveNamespace” will write file number to fsimage whit default value “1”。
> The next time,loadFSImage will fail.
> Maybe,it will work:
> {code}
> boolean loadFSImage(MetaRecoveryContext recovery) throws IOException {
> ...
> latestNameSD.read();
> needToSave |= loadFSImage(getImageFile(latestNameSD, NameNodeFile.IMAGE));
> LOG.info("Image file of size " + imageSize + " loaded in " 
> + (FSNamesystem.now() - startTime)/1000 + " seconds.");
> 
> // Load latest edits
> if (latestNameCheckpointTime > latestEditsCheckpointTime){
>   // the image is already current, discard edits
>   needToSave |= true;
>   FSNamesystem.getFSNamesystem().dir.updateCountForINodeWithQuota();
> }
> else // latestNameCheckpointTime == latestEditsCheckpointTime
>   needToSave |= (loadFSEdits(latestEditsSD, recovery) > 0);
> 
> return needToSave;
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560106#comment-13560106
 ] 

Daryn Sharp commented on HDFS-4426:
---

The 2NN creates a daemon thread of itself and implicitly relies on other 
threads to keep the process alive - is this a case of two wrongs make a right?  
Or is there a technical reason why the 2NN shouldn't simply do it's work in the 
main thread?

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4427) start-dfs.sh generates malformed ssh command when not running with native libs

2013-01-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560091#comment-13560091
 ] 

Todd Lipcon commented on HDFS-4427:
---

Maybe we need to set up log4j for this command such that the log4j output goes 
to stderr instead of stdout?

Another more general solution might be to get rid of this WARN entirely (change 
to DEBUG) and instead add an equivalent WARN in the various daemon start 
messages, as well as an API so dependent projects like HBase and MR can easily 
issue warnings when native isn't available? I imagine that users will find the 
"WARN" on every "hadoop fs -ls" type command annoying as well.

> start-dfs.sh generates malformed ssh command when not running with native libs
> --
>
> Key: HDFS-4427
> URL: https://issues.apache.org/jira/browse/HDFS-4427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Reporter: Jason Lowe
>Assignee: Robert Parker
>
> After HADOOP-8712 the start-dfs.sh script is generating malformed ssh 
> commands when the native hadoop libraries are not present.  This is because 
> {{hdfs getconf}} is printing a warning, and that warning is accidentally 
> interpreted as one of the machines to target for ssh.
> Here's an example output of hdfs getconf:
> {noformat}
> $ hdfs getconf -namenodes 2>/dev/null
> 2013-01-22 21:03:59,543 WARN  util.NativeCodeLoader 
> (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> localhost
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4258) Rename of Being Written Files

2013-01-22 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560080#comment-13560080
 ] 

Daryn Sharp commented on HDFS-4258:
---

I misread, didn't realize it's another sanity check, not another way to 
reference the file.

> Rename of Being Written Files
> -
>
> Key: HDFS-4258
> URL: https://issues.apache.org/jira/browse/HDFS-4258
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: HDFS-4258.patch, HDFS-4258.patch, HDFS-4258.patch, 
> HDFS-4258.patch
>
>
> When a being written file or it's ancestor directories is renamed, the path 
> in the file lease is also renamed.  Then the writer of the file usually will 
> fail since the file path in the writer is not updated.
> Moreover, I think there is a bug as follow:
> # Client writes 0's to F_0="/foo/file" and writes 1's to F_1="/bar/file" at 
> the same time.
> # Rename /bar to /baz
> # Rename /foo to /bar
> Then, writing to F_0 will fail since /foo/file does not exist anymore but 
> writing to F_1 may succeed since /bar/file exits as a different file.  In 
> such case, the content of /bar/file could be partly 0's and partly 1's.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4126) Add reading/writing snapshot information to FSImage

2013-01-22 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560077#comment-13560077
 ] 

Jing Zhao commented on HDFS-4126:
-

bq. Snapshot in javadoc is missing snapshot name?
The snapshot name is actually stored as the local name of the Snapshot#root. 
Added explanation in the javadoc.

bq. Snapshot related methods should be moved to an inner class or separate 
class. This can be done in a separate jira.
Created a new class SnapshotFSImageFormat in the snapshot package, and move 
snapshot-related fsimage read/write methods to the new class as static members.

> Add reading/writing snapshot information to FSImage
> ---
>
> Key: HDFS-4126
> URL: https://issues.apache.org/jira/browse/HDFS-4126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, 
> HDFS-4126.002.patch, HDFS-4126.003.patch
>
>
> After the changes proposed in HDFS-4125 is completed, reading and writing 
> snapshot related information from FSImage can be implemented. This jira 
> tracks changes required for:
> # Loading snapshot information from FSImage
> # Loading snapshot related operations from editlog
> # Writing snapshot information in FSImage
> # Unit tests related to this functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4126) Add reading/writing snapshot information to FSImage

2013-01-22 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-4126:


Attachment: HDFS-4126.003.patch

Thanks for the comments Suresh! Update the patch to address your comments.

> Add reading/writing snapshot information to FSImage
> ---
>
> Key: HDFS-4126
> URL: https://issues.apache.org/jira/browse/HDFS-4126
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Affects Versions: Snapshot (HDFS-2802)
>Reporter: Suresh Srinivas
>Assignee: Jing Zhao
> Attachments: HDFS-4126.001.patch, HDFS-4126.002.patch, 
> HDFS-4126.002.patch, HDFS-4126.003.patch
>
>
> After the changes proposed in HDFS-4125 is completed, reading and writing 
> snapshot related information from FSImage can be implemented. This jira 
> tracks changes required for:
> # Loading snapshot information from FSImage
> # Loading snapshot related operations from editlog
> # Writing snapshot information in FSImage
> # Unit tests related to this functionality

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2013-01-22 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560068#comment-13560068
 ] 

Colin Patrick McCabe commented on HDFS-347:
---

Hi Brandon,

{{TestParallelLocalRead}} was renamed to {{TestParallelShortCircuitRead}}.  The 
original version is not going to work with this branch because it doesn't set 
the correct configuration keys.  It will fall back on the standard read path.

I think "test local read" was a very unclear name, because all of the 
{{TestParallel}} functions are testing local reads (we are on a 
{{MiniDFSCluster}}, after all.)  It's the fact that we are testing 
short-circuit reads which is important.

HDFS-347 also adds a few tests which have no equivalent in trunk, like 
{{TestParallelShortCircuitReadNoChecksum}} and {{TestParallelUnixDomainRead}}.

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, performance
>Reporter: George Porter
>Assignee: Colin Patrick McCabe
> Attachments: all.tsv, BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-016_cleaned.patch, 
> HDFS-347.016.patch, HDFS-347.017.clean.patch, HDFS-347.017.patch, 
> HDFS-347.018.clean.patch, HDFS-347.018.patch2, HDFS-347.019.patch, 
> HDFS-347.020.patch, HDFS-347.021.patch, HDFS-347.022.patch, 
> HDFS-347.024.patch, HDFS-347.025.patch, HDFS-347.026.patch, 
> HDFS-347.027.patch, HDFS-347.029.patch, HDFS-347.030.patch, 
> HDFS-347.033.patch, HDFS-347.035.patch, HDFS-347-branch-20-append.txt, 
> hdfs-347-merge.txt, hdfs-347-merge.txt, hdfs-347.png, hdfs-347.txt, 
> local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4258) Rename of Being Written Files

2013-01-22 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560052#comment-13560052
 ] 

Daryn Sharp commented on HDFS-4258:
---

We're running into problems with parent directories being renamed and leases 
being held on the new pathnames.  As long as a long-running process/daemon 
continues to renew its lease

This is a big patch, but does the file id present any security issues?  Can I 
brute force access to files by guessing file ids?

> Rename of Being Written Files
> -
>
> Key: HDFS-4258
> URL: https://issues.apache.org/jira/browse/HDFS-4258
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, namenode
>Affects Versions: 3.0.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Brandon Li
> Attachments: HDFS-4258.patch, HDFS-4258.patch, HDFS-4258.patch, 
> HDFS-4258.patch
>
>
> When a being written file or it's ancestor directories is renamed, the path 
> in the file lease is also renamed.  Then the writer of the file usually will 
> fail since the file path in the writer is not updated.
> Moreover, I think there is a bug as follow:
> # Client writes 0's to F_0="/foo/file" and writes 1's to F_1="/bar/file" at 
> the same time.
> # Rename /bar to /baz
> # Rename /foo to /bar
> Then, writing to F_0 will fail since /foo/file does not exist anymore but 
> writing to F_1 may succeed since /bar/file exits as a different file.  In 
> such case, the content of /bar/file could be partly 0's and partly 1's.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560035#comment-13560035
 ] 

Suresh Srinivas commented on HDFS-4426:
---

Release Managers for releases this is considered blocking could consider 
reverting HADOOP-9181.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

2013-01-22 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560029#comment-13560029
 ] 

Brandon Li commented on HDFS-347:
-

@Todd, I tried to run the benchmark on my local machine. For TestParallelRead, 
I didn't see very noticeable regression between HDFS-347 and trunk, which is 
good. 

How did you run TestParallelLoalRead? I simply kept the original 
TestParallelLocalRead.java (it's deleted in the merge patch) but it doesn't 
seem to give bigger throughput than TestParallelRead with HDFS-347. I think I 
missed something here.

> DFS read performance suboptimal when client co-located on nodes with data
> -
>
> Key: HDFS-347
> URL: https://issues.apache.org/jira/browse/HDFS-347
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs-client, performance
>Reporter: George Porter
>Assignee: Colin Patrick McCabe
> Attachments: all.tsv, BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-016_cleaned.patch, 
> HDFS-347.016.patch, HDFS-347.017.clean.patch, HDFS-347.017.patch, 
> HDFS-347.018.clean.patch, HDFS-347.018.patch2, HDFS-347.019.patch, 
> HDFS-347.020.patch, HDFS-347.021.patch, HDFS-347.022.patch, 
> HDFS-347.024.patch, HDFS-347.025.patch, HDFS-347.026.patch, 
> HDFS-347.027.patch, HDFS-347.029.patch, HDFS-347.030.patch, 
> HDFS-347.033.patch, HDFS-347.035.patch, HDFS-347-branch-20-append.txt, 
> hdfs-347-merge.txt, hdfs-347-merge.txt, hdfs-347.png, hdfs-347.txt, 
> local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-4427) start-dfs.sh generates malformed ssh command when not running with native libs

2013-01-22 Thread Robert Parker (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Parker reassigned HDFS-4427:
---

Assignee: Robert Parker

> start-dfs.sh generates malformed ssh command when not running with native libs
> --
>
> Key: HDFS-4427
> URL: https://issues.apache.org/jira/browse/HDFS-4427
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: scripts
>Reporter: Jason Lowe
>Assignee: Robert Parker
>
> After HADOOP-8712 the start-dfs.sh script is generating malformed ssh 
> commands when the native hadoop libraries are not present.  This is because 
> {{hdfs getconf}} is printing a warning, and that warning is accidentally 
> interpreted as one of the machines to target for ssh.
> Here's an example output of hdfs getconf:
> {noformat}
> $ hdfs getconf -namenodes 2>/dev/null
> 2013-01-22 21:03:59,543 WARN  util.NativeCodeLoader 
> (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> localhost
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4427) start-dfs.sh generates malformed ssh command when not running with native libs

2013-01-22 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-4427:


 Summary: start-dfs.sh generates malformed ssh command when not 
running with native libs
 Key: HDFS-4427
 URL: https://issues.apache.org/jira/browse/HDFS-4427
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Reporter: Jason Lowe


After HADOOP-8712 the start-dfs.sh script is generating malformed ssh commands 
when the native hadoop libraries are not present.  This is because {{hdfs 
getconf}} is printing a warning, and that warning is accidentally interpreted 
as one of the machines to target for ssh.

Here's an example output of hdfs getconf:

{noformat}
$ hdfs getconf -namenodes 2>/dev/null
2013-01-22 21:03:59,543 WARN  util.NativeCodeLoader 
(NativeCodeLoader.java:(62)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable
localhost
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4344) dfshealth.jsp throws NumberFormatException when dfs.hosts/dfs.hosts.exclude includes port number

2013-01-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4344:


Attachment: hdfs4344.txt

Fix bug, add tests and some new asserts.

 - DatanodeManager.java: correctly parse int from "hostname:port"
 - DFSTestUtil: assert, rather than returning empty result, when a HTTP request 
returns a non-200 result.
 - TestHostsFiles: new test file, currently containing just one test verifying 
that this bug is fixed.

Passes tests, and testHostsExcludeDfshealthJsp fails without the parsing fix.

> dfshealth.jsp throws NumberFormatException when dfs.hosts/dfs.hosts.exclude 
> includes port number
> 
>
> Key: HDFS-4344
> URL: https://issues.apache.org/jira/browse/HDFS-4344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: tamtam180
>Assignee: Andy Isaacson
> Attachments: hdfs4344.txt
>
>
> dfs.hosts and dfs.hosts.exclude files cannot contain a port number of host.
> If contained, and access a dfshealth.jsp on a webui, we got a 
> NumberFormatException.
> How to reproduce:
> {noformat}
> $ cat /tmp/include.txt
> salve-host1:
> $ cat /tmp/exclude.txt
> slave-host1:
> $ hdfs namenode -Ddfs.hosts=/tmp/include.txt 
> -Ddfs.hosts.exclude=/tmp/exclude.txt
> {noformat}
> Error:
> {noformat}
> Problem accessing /dfshealth.jsp. Reason:
> For input string: ":"
> Caused by:
> java.lang.NumberFormatException: For input string: ":"
>  at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>  at java.lang.Integer.parseInt(Integer.java:449)
>  at java.lang.Integer.valueOf(Integer.java:554)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.parseDNFromHostsEntry(DatanodeManager.java:970)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeListForReport(DatanodeManager.java:1039)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.fetchDatanodes(DatanodeManager.java:892)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NamenodeJspHelper$HealthJsp.generateHealthReport(NamenodeJspHelper.java:288)
>  at 
> org.apache.hadoop.hdfs.server.namenode.dfshealth_jsp._jspService(dfshealth_jsp.java:109)
>  at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>  at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>  at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1071)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>  at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>  at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>  at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>  at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>  at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>  at org.mortbay.jetty.Server.handle(Server.java:326)
>  at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>  at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>  at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>  at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {noformat}
> It's probably because DatanodeManager.parseDNFromHostsEntry() doesn't p

[jira] [Updated] (HDFS-4344) dfshealth.jsp throws NumberFormatException when dfs.hosts/dfs.hosts.exclude includes port number

2013-01-22 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-4344:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> dfshealth.jsp throws NumberFormatException when dfs.hosts/dfs.hosts.exclude 
> includes port number
> 
>
> Key: HDFS-4344
> URL: https://issues.apache.org/jira/browse/HDFS-4344
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha, 3.0.0
>Reporter: tamtam180
>Assignee: Andy Isaacson
> Attachments: hdfs4344.txt
>
>
> dfs.hosts and dfs.hosts.exclude files cannot contain a port number of host.
> If contained, and access a dfshealth.jsp on a webui, we got a 
> NumberFormatException.
> How to reproduce:
> {noformat}
> $ cat /tmp/include.txt
> salve-host1:
> $ cat /tmp/exclude.txt
> slave-host1:
> $ hdfs namenode -Ddfs.hosts=/tmp/include.txt 
> -Ddfs.hosts.exclude=/tmp/exclude.txt
> {noformat}
> Error:
> {noformat}
> Problem accessing /dfshealth.jsp. Reason:
> For input string: ":"
> Caused by:
> java.lang.NumberFormatException: For input string: ":"
>  at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>  at java.lang.Integer.parseInt(Integer.java:449)
>  at java.lang.Integer.valueOf(Integer.java:554)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.parseDNFromHostsEntry(DatanodeManager.java:970)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.getDatanodeListForReport(DatanodeManager.java:1039)
>  at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.fetchDatanodes(DatanodeManager.java:892)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NamenodeJspHelper$HealthJsp.generateHealthReport(NamenodeJspHelper.java:288)
>  at 
> org.apache.hadoop.hdfs.server.namenode.dfshealth_jsp._jspService(dfshealth_jsp.java:109)
>  at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>  at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>  at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1071)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>  at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>  at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>  at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>  at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>  at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>  at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>  at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>  at org.mortbay.jetty.Server.handle(Server.java:326)
>  at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>  at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>  at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>  at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {noformat}
> It's probably because DatanodeManager.parseDNFromHostsEntry() doesn't parse 
> host:port string correctly.
> {noformat}
>   private DatanodeID parseDNFromHostsEntry(String hostLine) {
> DatanodeID dnId;
> String hostStr;
> int port;
> int idx = hostLine.indexOf(':');
> if (-1 == idx) {
>   hostStr = hostLine;
>   port = DFSConfigKeys.DFS_DATANODE_DEFAULT_PORT;
> } else {
>   hostStr =

[jira] [Updated] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-4426:


Assignee: Arpit Agarwal

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Arpit Agarwal
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560001#comment-13560001
 ] 

Suresh Srinivas commented on HDFS-4426:
---

Jason, I will followup on this. Thanks for filing the bug.

> Secondary namenode shuts down immediately after startup
> ---
>
> Key: HDFS-4426
> URL: https://issues.apache.org/jira/browse/HDFS-4426
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Priority: Blocker
>
> After HADOOP-9181 went in, the secondary namenode immediately shuts down 
> after it is started.  From the startup logs:
> {noformat}
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
> min)
> 2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
> (SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
> 2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
> (StringUtils.java:run(616)) - SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
> /
> {noformat}
> I looked into the issue, and it's shutting down because 
> SecondaryNameNode.main starts a bunch of daemon threads then returns.  With 
> nothing but daemon threads remaining, the JVM sees no reason to keep going 
> and proceeds to shutdown.  Apparently we were implicitly relying on the fact 
> that the HttpServer QueuedThreadPool threads were not daemon threads to keep 
> the secondary namenode process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4426) Secondary namenode shuts down immediately after startup

2013-01-22 Thread Jason Lowe (JIRA)
Jason Lowe created HDFS-4426:


 Summary: Secondary namenode shuts down immediately after startup
 Key: HDFS-4426
 URL: https://issues.apache.org/jira/browse/HDFS-4426
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Priority: Blocker


After HADOOP-9181 went in, the secondary namenode immediately shuts down after 
it is started.  From the startup logs:

{noformat}
2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
(SecondaryNameNode.java:initialize(299)) - Checkpoint Period   :3600 secs (60 
min)
2013-01-22 19:54:28,826 INFO  namenode.SecondaryNameNode 
(SecondaryNameNode.java:initialize(301)) - Log Size Trigger:4 txns
2013-01-22 19:54:28,845 INFO  namenode.SecondaryNameNode 
(StringUtils.java:run(616)) - SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down SecondaryNameNode at xx
/
{noformat}

I looked into the issue, and it's shutting down because SecondaryNameNode.main 
starts a bunch of daemon threads then returns.  With nothing but daemon threads 
remaining, the JVM sees no reason to keep going and proceeds to shutdown.  
Apparently we were implicitly relying on the fact that the HttpServer 
QueuedThreadPool threads were not daemon threads to keep the secondary namenode 
process up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread project (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559894#comment-13559894
 ] 

project commented on HDFS-4425:
---

Can you please give me steps to change value. I read the HDFS-1594 but didn't 
get how to fix ? Can anybody give me simple steps ?

Thanks,


> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: project
>Priority: Minor
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4422) Upgrade servlet-api dependency from version 2.5 to 3.0.

2013-01-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559893#comment-13559893
 ] 

Konstantin Boudnik commented on HDFS-4422:
--

I am thinking if this JIRA has to be in hadoop-common rather than here?

> Upgrade servlet-api dependency from version 2.5 to 3.0.
> ---
>
> Key: HDFS-4422
> URL: https://issues.apache.org/jira/browse/HDFS-4422
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-4422.patch
>
>
> Please update the servlet-api jar from 2.5 to javax.servlet 3.0 via Maven:
> 
> javax.servlet
> javax.servlet-api
> 3.0.1
> provided
> 
> I am running a 2.0.3 dev-cluster and can confirm compatibility. I have 
> removed the servlet-api-2.5.jar file and replaced it with 
> javax.servlet-3.0.jar file. I am using javax.servlet-3.0 because it 
> implements methods that I use for a filter, namely the 
> HttpServletResponse.getStatus() method.
> I believe it is a gain to have this dependency as it allows more 
> functionality and has so far proven to be backwards compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4422) Upgrade servlet-api dependency from version 2.5 to 3.0.

2013-01-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559892#comment-13559892
 ] 

Konstantin Boudnik commented on HDFS-4422:
--

It is actually seems to be an independent action. Looks like Jetty 6 works with 
3.0 version of the servlet API.

> Upgrade servlet-api dependency from version 2.5 to 3.0.
> ---
>
> Key: HDFS-4422
> URL: https://issues.apache.org/jira/browse/HDFS-4422
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.3-alpha
>Reporter: Plamen Jeliazkov
>Assignee: Plamen Jeliazkov
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: HDFS-4422.patch
>
>
> Please update the servlet-api jar from 2.5 to javax.servlet 3.0 via Maven:
> 
> javax.servlet
> javax.servlet-api
> 3.0.1
> provided
> 
> I am running a 2.0.3 dev-cluster and can confirm compatibility. I have 
> removed the servlet-api-2.5.jar file and replaced it with 
> javax.servlet-3.0.jar file. I am using javax.servlet-3.0 because it 
> implements methods that I use for a filter, namely the 
> HttpServletResponse.getStatus() method.
> I believe it is a gain to have this dependency as it allows more 
> functionality and has so far proven to be backwards compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4417) HDFS-347: fix case where local reads get disabled incorrectly

2013-01-22 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4417:
---

Attachment: HDFS-4417.004.patch

> HDFS-347: fix case where local reads get disabled incorrectly
> -
>
> Key: HDFS-4417
> URL: https://issues.apache.org/jira/browse/HDFS-4417
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, hdfs-client, performance
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4417.002.patch, HDFS-4417.003.patch, 
> HDFS-4417.004.patch, hdfs-4417.txt
>
>
> In testing HDFS-347 against HBase (thanks [~jdcryans]) we ran into the 
> following case:
> - a workload is running which puts a bunch of local sockets in the PeerCache
> - the workload abates for a while, causing the sockets to go "stale" (ie the 
> DN side disconnects after the keepalive timeout)
> - the workload starts again
> In this case, the local socket retrieved from the cache failed the 
> newBlockReader call, and it incorrectly disabled local sockets on that host. 
> This is similar to an earlier bug HDFS-3376, but not quite the same.
> The next issue we ran into is that, once this happened, it never tried local 
> sockets again, because the cache held lots of TCP sockets. Since we always 
> managed to get a cached socket to the local node, it didn't bother trying 
> local read again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559763#comment-13559763
 ] 

Steve Loughran commented on HDFS-4425:
--

source of HDFS-1594 lists the parameters
{code}
+  public static final String  DFS_NAMENODE_DU_RESERVED_KEY = 
"dfs.namenode.resource.du.reserved";
+  public static final longDFS_NAMENODE_DU_RESERVED_DEFAULT = 1024 * 1024 * 
100; // 100 MB
{code}

What could be done is add a wiki entry on the topic and have the log entry 
point to it.

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: project
>Priority: Minor
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-4425:
-

 Priority: Minor  (was: Major)
Affects Version/s: 2.0.2-alpha
   0.23.5

downgrading to minor and marking as 0.23+ only -I'm assuming that is the case. 
If the original submitter of the issue could state their release version, that 
would help

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: project
>Priority: Minor
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559761#comment-13559761
 ] 

Steve Loughran commented on HDFS-4425:
--

root cause looks like HDFS-1594

> NameNode low on available disk space
> 
>
> Key: HDFS-4425
> URL: https://issues.apache.org/jira/browse/HDFS-4425
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha, 0.23.5
>Reporter: project
>Priority: Minor
>
> Hi,
> Namenode switches into safemode when it has low disk space on the root fs / i 
> have to manually run a command to leave it. Below are log messages for low 
> space on root / fs. Is there any parameter so that i can reduce reserved 
> amount.
> 2013-01-21 01:22:52,217 WARN 
> org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space 
> available on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the 
> configured reserved amount 104857600
> 2013-01-21 01:22:52,218 WARN 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on 
> available disk space. Entering safe mode.
> 2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
> mode is ON.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4425) NameNode low on available disk space

2013-01-22 Thread project (JIRA)
project created HDFS-4425:
-

 Summary: NameNode low on available disk space
 Key: HDFS-4425
 URL: https://issues.apache.org/jira/browse/HDFS-4425
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: project


Hi,

Namenode switches into safemode when it has low disk space on the root fs / i 
have to manually run a command to leave it. Below are log messages for low 
space on root / fs. Is there any parameter so that i can reduce reserved amount.


2013-01-21 01:22:52,217 WARN 
org.apache.hadoop.hdfs.server.namenode.NameNodeResourceChecker: Space available 
on volume '/dev/mapper/vg_lv_root' is 10653696, which is below the configured 
reserved amount 104857600
2013-01-21 01:22:52,218 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNode low on available 
disk space. Entering safe mode.
2013-01-21 01:22:52,218 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe 
mode is ON.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559648#comment-13559648
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-Mapreduce-trunk #1321 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1321/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559602#comment-13559602
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-Hdfs-trunk #1293 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1293/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = FAILURE
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4420) Provide a way to exclude subtree from balancing process

2013-01-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559554#comment-13559554
 ] 

Hadoop QA commented on HDFS-4420:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12565926/Balancer-exclude-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3863//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/3863//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3863//console

This message is automatically generated.

> Provide a way to exclude subtree from balancing process
> ---
>
> Key: HDFS-4420
> URL: https://issues.apache.org/jira/browse/HDFS-4420
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Max Lapan
>Priority: Minor
> Attachments: Balancer-exclude-subtree-0.90.2.patch, 
> Balancer-exclude-trunk.patch
>
>
> During balancer operation, it balances all blocks, regardless of their 
> filesystem hierarchy. Sometimes, it would be usefull to exclude some subtree 
> from balancing process.
> For example, regionservers data locality is cruical for HBase performance. 
> Region's data is tied to regionservers, which reside on specific machines in 
> cluster. During operation, regionservers reads and writes region's data, and 
> after some time, all this data are reside on local machine, so, all reads 
> become local, which is great for performance. Balancer breaks this locality 
> during opertation by moving blocks around.
> This patch adds [-exclude ] switch, and, if path is provided,
> balancer will not move blocks under this path during operation.
> Attached patch have tested for 0.90.2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4403) DFSClient can infer checksum type when not provided by reading first byte

2013-01-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13559541#comment-13559541
 ] 

Hudson commented on HDFS-4403:
--

Integrated in Hadoop-Yarn-trunk #104 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/104/])
HDFS-4403. DFSClient can infer checksum type when not provided by reading 
first byte. Contributed by Todd Lipcon. (Revision 1436730)

 Result = SUCCESS
todd : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1436730
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileChecksumServlets.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/datatransfer.proto


> DFSClient can infer checksum type when not provided by reading first byte
> -
>
> Key: HDFS-4403
> URL: https://issues.apache.org/jira/browse/HDFS-4403
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 3.0.0, 2.0.3-alpha
>
> Attachments: hdfs-4403.txt, hdfs-4403.txt
>
>
> HDFS-3177 added the checksum type to OpBlockChecksumResponseProto, but the 
> new protobuf field is optional, with a default of CRC32. This means that this 
> API, when used against an older cluster (like earlier 0.23 releases) will 
> falsely return CRC32 even if that cluster has written files with CRC32C. This 
> can cause issues for distcp, for example.
> Instead of defaulting the protobuf field to CRC32, we can leave it with no 
> default, and if the OpBlockChecksumResponseProto has no checksum type set, 
> the client can send OP_READ_BLOCK to read the first byte of the block, then 
> grab the checksum type out of that response (which has always been present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4420) Provide a way to exclude subtree from balancing process

2013-01-22 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HDFS-4420:


Attachment: Balancer-exclude-trunk.patch

Trunk version

> Provide a way to exclude subtree from balancing process
> ---
>
> Key: HDFS-4420
> URL: https://issues.apache.org/jira/browse/HDFS-4420
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 0.20.2
>Reporter: Max Lapan
>Priority: Minor
> Attachments: Balancer-exclude-subtree-0.90.2.patch, 
> Balancer-exclude-trunk.patch
>
>
> During balancer operation, it balances all blocks, regardless of their 
> filesystem hierarchy. Sometimes, it would be usefull to exclude some subtree 
> from balancing process.
> For example, regionservers data locality is cruical for HBase performance. 
> Region's data is tied to regionservers, which reside on specific machines in 
> cluster. During operation, regionservers reads and writes region's data, and 
> after some time, all this data are reside on local machine, so, all reads 
> become local, which is great for performance. Balancer breaks this locality 
> during opertation by moving blocks around.
> This patch adds [-exclude ] switch, and, if path is provided,
> balancer will not move blocks under this path during operation.
> Attached patch have tested for 0.90.2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4420) Provide a way to exclude subtree from balancing process

2013-01-22 Thread Max Lapan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Lapan updated HDFS-4420:


Affects Version/s: (was: 0.20.2)
   Status: Patch Available  (was: Open)

> Provide a way to exclude subtree from balancing process
> ---
>
> Key: HDFS-4420
> URL: https://issues.apache.org/jira/browse/HDFS-4420
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Reporter: Max Lapan
>Priority: Minor
> Attachments: Balancer-exclude-subtree-0.90.2.patch, 
> Balancer-exclude-trunk.patch
>
>
> During balancer operation, it balances all blocks, regardless of their 
> filesystem hierarchy. Sometimes, it would be usefull to exclude some subtree 
> from balancing process.
> For example, regionservers data locality is cruical for HBase performance. 
> Region's data is tied to regionservers, which reside on specific machines in 
> cluster. During operation, regionservers reads and writes region's data, and 
> after some time, all this data are reside on local machine, so, all reads 
> become local, which is great for performance. Balancer breaks this locality 
> during opertation by moving blocks around.
> This patch adds [-exclude ] switch, and, if path is provided,
> balancer will not move blocks under this path during operation.
> Attached patch have tested for 0.90.2.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira