[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6135:


Attachment: HDFS-6135.002.patch

Another option is to completely ignore the future layouversion check in Journal 
Node. Since now JN no longer decode the edits (but using the length field to 
scan through), JN has some capability to handle the edits with future 
layoutVersion. 

However, this capability is limited, e.g., in the future if something more 
significant (e.g., the edits segment-based maintenance mechanism) gets updated, 
old software will not be able to process edits generated by new software. But 
maybe a better solution is to split the NameNode layouversion into NameNode 
layoutversion and JournalNode layoutversion, just like what we did for NN and 
DN.

> In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
> when rolling back
> --
>
> Key: HDFS-6135
> URL: https://issues.apache.org/jira/browse/HDFS-6135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, 
> HDFS-6135.002.patch, HDFS-6135.test.txt
>
>
> While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
> the upgrade, the rollback may trigger the following exception in JournalNodes 
> (suppose the new software bumped the layoutversion from -55 to -56):
> {code}
> 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
> roll back possible for one or more JournalNodes. 1 exceptions thrown:
> Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
> Reported: -56. Expecting = -55.
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.(JNStorage.java:73)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:142)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
> {code}
> Looks like for rollback JN with old software cannot handle future 
> layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6135:


Attachment: HDFS-6135.001.patch

> In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
> when rolling back
> --
>
> Key: HDFS-6135
> URL: https://issues.apache.org/jira/browse/HDFS-6135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-6135.000.patch, HDFS-6135.001.patch, 
> HDFS-6135.test.txt
>
>
> While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
> the upgrade, the rollback may trigger the following exception in JournalNodes 
> (suppose the new software bumped the layoutversion from -55 to -56):
> {code}
> 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
> roll back possible for one or more JournalNodes. 1 exceptions thrown:
> Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
> Reported: -56. Expecting = -55.
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.(JNStorage.java:73)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:142)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
> {code}
> Looks like for rollback JN with old software cannot handle future 
> layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944706#comment-13944706
 ] 

Hadoop QA commented on HDFS-6135:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636271/HDFS-6135.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.shell.TestHdfsTextCommand
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength
  org.apache.hadoop.hdfs.server.namenode.TestNameNodeJspHelper
  org.apache.hadoop.hdfs.server.namenode.TestAclConfigFlag
  org.apache.hadoop.hdfs.TestClose
  org.apache.hadoop.hdfs.TestShortCircuitLocalRead
  org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs
  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
  org.apache.hadoop.hdfs.TestFSInputChecker
  org.apache.hadoop.hdfs.server.namenode.TestBackupNode
  org.apache.hadoop.hdfs.TestDataTransferProtocol
  org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat
  
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
  
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
  org.apache.hadoop.hdfs.server.namenode.TestFSNamesystemMBean
  org.apache.hadoop.hdfs.TestDFSClientFailover
  org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyIsHot
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotNameWithInvalidCharacters
  org.apache.hadoop.tools.TestJMXGet
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.server.namenode.TestFSImage
  org.apache.hadoop.hdfs.security.TestDelegationToken
  
org.apache.hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade
  
org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication
  org.apache.hadoop.fs.permission.TestStickyBit
  org.apache.hadoop.hdfs.TestFileConcurrentReader
  org.apache.hadoop.hdfs.server.datanode.TestCachingStrategy
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestCheckpointsWithSnapshots
  
org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork
  
org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks
  org.apache.hadoop.fs.TestFcHdfsCreateMkdir
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.TestAppendDifferentChecksum
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration
  org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs
  org.apache.hadoop.hdfs.server.namenode.TestParallelImageWrite
  
org.apache.hadoop.hdfs.security.TestDelegationTokenForProxyUser
  org.apache.hadoop.hdfs.server.namenode.TestSequentialBlockId
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap
  org.apache.hadoop.hdfs.TestDFSPermission
  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestDisallowModifyROSnapshot
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.TestListFilesInFileContext
  
org.apache.hadoop.hdfs.server.namenode.TestNameNodeResourceChecker
  org.apache.hadoop.fs.viewfs.TestViewFsDefaultValue
  org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics
   

[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA

2014-03-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5138:


Attachment: HDFS-5138.branch-2.001.patch

Rebase the patch for branch-2.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.branch-2.001.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6135:


Affects Version/s: (was: 2.4.0)
   3.0.0

> In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
> when rolling back
> --
>
> Key: HDFS-6135
> URL: https://issues.apache.org/jira/browse/HDFS-6135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-6135.000.patch, HDFS-6135.test.txt
>
>
> While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
> the upgrade, the rollback may trigger the following exception in JournalNodes 
> (suppose the new software bumped the layoutversion from -55 to -56):
> {code}
> 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
> roll back possible for one or more JournalNodes. 1 exceptions thrown:
> Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
> Reported: -56. Expecting = -55.
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.(JNStorage.java:73)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:142)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
> {code}
> Looks like for rollback JN with old software cannot handle future 
> layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6135:


Status: Patch Available  (was: Open)

> In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
> when rolling back
> --
>
> Key: HDFS-6135
> URL: https://issues.apache.org/jira/browse/HDFS-6135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-6135.000.patch, HDFS-6135.test.txt
>
>
> While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
> the upgrade, the rollback may trigger the following exception in JournalNodes 
> (suppose the new software bumped the layoutversion from -55 to -56):
> {code}
> 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
> roll back possible for one or more JournalNodes. 1 exceptions thrown:
> Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
> Reported: -56. Expecting = -55.
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.(JNStorage.java:73)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:142)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
> {code}
> Looks like for rollback JN with old software cannot handle future 
> layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6135:


Attachment: HDFS-6135.000.patch

Upload a simple fix which lets JN understand the corresponding StartupOption, 
and bypass the layoutversion check when creating Journal object if the 
StartupOption is Rollback.

> In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
> when rolling back
> --
>
> Key: HDFS-6135
> URL: https://issues.apache.org/jira/browse/HDFS-6135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-6135.000.patch, HDFS-6135.test.txt
>
>
> While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
> the upgrade, the rollback may trigger the following exception in JournalNodes 
> (suppose the new software bumped the layoutversion from -55 to -56):
> {code}
> 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
> roll back possible for one or more JournalNodes. 1 exceptions thrown:
> Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
> Reported: -56. Expecting = -55.
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.(JNStorage.java:73)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:142)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
> {code}
> Looks like for rollback JN with old software cannot handle future 
> layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-6135:
---

Assignee: Jing Zhao

> In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
> when rolling back
> --
>
> Key: HDFS-6135
> URL: https://issues.apache.org/jira/browse/HDFS-6135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-6135.test.txt
>
>
> While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
> the upgrade, the rollback may trigger the following exception in JournalNodes 
> (suppose the new software bumped the layoutversion from -55 to -56):
> {code}
> 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
> roll back possible for one or more JournalNodes. 1 exceptions thrown:
> Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
> Reported: -56. Expecting = -55.
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.(JNStorage.java:73)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:142)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
> {code}
> Looks like for rollback JN with old software cannot handle future 
> layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-03-23 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944591#comment-13944591
 ] 

Suresh Srinivas commented on HDFS-5138:
---

[~acmurthy], I have HDFS-6135 and HDFS-5840 as blockers.

> Support HDFS upgrade in HA
> --
>
> Key: HDFS-5138
> URL: https://issues.apache.org/jira/browse/HDFS-5138
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Kihwal Lee
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
> hdfs-5138-branch-2.txt
>
>
> With HA enabled, NN wo't start with "-upgrade". Since there has been a layout 
> version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
> necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
> to get around this was to disable HA and upgrade. 
> The NN and the cluster cannot be flipped back to HA until the upgrade is 
> finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
> back on without involving DNs, things will work, but finaliizeUpgrade won't 
> work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
> snapshots won't get removed.
> We will need a different ways of doing layout upgrade and upgrade snapshot.  
> I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
> there is a reasonable workaround that does not increase maintenance window 
> greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

2014-03-23 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-5840:
--

Priority: Blocker  (was: Major)

> Follow-up to HDFS-5138 to improve error handling during partial upgrade 
> failures
> 
>
> Key: HDFS-5840
> URL: https://issues.apache.org/jira/browse/HDFS-5840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HDFS-5840.patch
>
>
> Suresh posted some good comment in HDFS-5138 after that patch had already 
> been committed to trunk. This JIRA is to address those. See the first comment 
> of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6135) In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump when rolling back

2014-03-23 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-6135:
--

Priority: Blocker  (was: Major)

> In HDFS upgrade with HA setup, JournalNode cannot handle layout version bump 
> when rolling back
> --
>
> Key: HDFS-6135
> URL: https://issues.apache.org/jira/browse/HDFS-6135
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Jing Zhao
>Priority: Blocker
> Attachments: HDFS-6135.test.txt
>
>
> While doing HDFS upgrade with HA setup, if the layoutversion gets changed in 
> the upgrade, the rollback may trigger the following exception in JournalNodes 
> (suppose the new software bumped the layoutversion from -55 to -56):
> {code}
> 14/03/21 01:01:53 FATAL namenode.NameNode: Exception in namenode join
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not check if 
> roll back possible for one or more JournalNodes. 1 exceptions thrown:
> Unexpected version of storage directory /grid/1/tmp/journal/mycluster. 
> Reported: -56. Expecting = -55.
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setLayoutVersion(StorageInfo.java:203)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.setFieldsFromProperties(StorageInfo.java:156)
>   at 
> org.apache.hadoop.hdfs.server.common.StorageInfo.readProperties(StorageInfo.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:202)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JNStorage.(JNStorage.java:73)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.(Journal.java:142)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:87)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNode.canRollBack(JournalNode.java:304)
>   at 
> org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.canRollBack(JournalNodeRpcServer.java:228)
> {code}
> Looks like for rollback JN with old software cannot handle future 
> layoutversion brought by new software.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures

2014-03-23 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944588#comment-13944588
 ] 

Suresh Srinivas commented on HDFS-5840:
---

[~atm], any updates on this. [~jingzhao], found some issues and posted comments 
on HDFS-5138 as well.

> Follow-up to HDFS-5138 to improve error handling during partial upgrade 
> failures
> 
>
> Key: HDFS-5840
> URL: https://issues.apache.org/jira/browse/HDFS-5840
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 3.0.0
>
> Attachments: HDFS-5840.patch
>
>
> Suresh posted some good comment in HDFS-5138 after that patch had already 
> been committed to trunk. This JIRA is to address those. See the first comment 
> of this JIRA for the full content of the review.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6143) HftpFileSystem open should throw FileNotFoundException for non-existing paths

2014-03-23 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-6143:
--

Summary: HftpFileSystem open should throw FileNotFoundException for 
non-existing paths  (was: HftpFileSystem open should through 
FileNotFoundException for non-existing paths)

> HftpFileSystem open should throw FileNotFoundException for non-existing paths
> -
>
> Key: HDFS-6143
> URL: https://issues.apache.org/jira/browse/HDFS-6143
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: HDFS-6143.v01.patch
>
>
> HftpFileSystem.open incorrectly handles non-existing paths. 
> - 'open', does not really open anything, i.e., it does not contact the 
> server, and therefore cannot discover FileNotFound, it's deferred until next 
> read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
> get ENOENT on open. 
> [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
>  is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
> instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) HftpFileSystem open should through FileNotFoundException for non-existing paths

2014-03-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944532#comment-13944532
 ] 

Steve Loughran commented on HDFS-6143:
--

This  is important, everything expects `open(path)` to fail if the path isn't 
there (FWIW, it'd be a great optimisation for HTTP filesystems if there was an 
`open(offset)` operation as if you look at the logs, it's usually a pair of 
open & seek.)

# Could you change the code to go {{e.toString()}} and not {{e.getMessage()}}? 
I know some exceptions have null messages -and others may include more detail 
in their string operations.
# The tests that match for string error messages will be brittle and increase 
maintenance costs. Could you use some String.format() operations to create both 
the error text in the exception, and the strings to search for in the 
exceptions? This would guarantee that a change in the source would be reflected 
in the destination.

Sorry to add more work, it's just I think we shouldn't copy bad practises of 
the past when writing new code & tests



> HftpFileSystem open should through FileNotFoundException for non-existing 
> paths
> ---
>
> Key: HDFS-6143
> URL: https://issues.apache.org/jira/browse/HDFS-6143
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: HDFS-6143.v01.patch
>
>
> HftpFileSystem.open incorrectly handles non-existing paths. 
> - 'open', does not really open anything, i.e., it does not contact the 
> server, and therefore cannot discover FileNotFound, it's deferred until next 
> read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
> get ENOENT on open. 
> [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
>  is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
> instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-03-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944517#comment-13944517
 ] 

Steve Loughran commented on HDFS-6134:
--

Alejandro, Sorry what I meant to say is that the PDF refers to other JIRAs 
-they should be added as links to this JIRA.

> Transparent data at rest encryption
> ---
>
> Key: HDFS-6134
> URL: https://issues.apache.org/jira/browse/HDFS-6134
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 2.3.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: HDFSDataAtRestEncryption.pdf
>
>
> Because of privacy and security regulations, for many industries, sensitive 
> data at rest must be in encrypted form. For example: the health­care industry 
> (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
> US government (FISMA regulations).
> This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
> be used transparently by any application accessing HDFS via Hadoop Filesystem 
> Java API, Hadoop libhdfs C library, or WebHDFS REST API.
> The resulting implementation should be able to be used in compliance with 
> different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6146) retry if cannot report Bad Blocks to namenode

2014-03-23 Thread Ding Yuan (JIRA)
Ding Yuan created HDFS-6146:
---

 Summary: retry if cannot report Bad Blocks to namenode
 Key: HDFS-6146
 URL: https://issues.apache.org/jira/browse/HDFS-6146
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.2.0
Reporter: Ding Yuan


Line: 255, File: "org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java"
{noformat}
249: try {
250:   bpNamenode.reportBadBlocks(blocks);
251: } catch (IOException e){
252:   /* One common reason is that NameNode could be in safe mode.
253:* Should we keep on retrying in that case?
254:*/
255:   LOG.warn("Failed to report bad block " + block + " to namenode : "
256:   + " Exception", e);
257: }
{noformat}
 
The comment seems to suggest this should be retried. 

A similar case is at:
   Line: 1430, File: 
"org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java"




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6145) Stopping unexpected exception from propagating to avoid serious consequences

2014-03-23 Thread Ding Yuan (JIRA)
Ding Yuan created HDFS-6145:
---

 Summary: Stopping unexpected exception from propagating to avoid 
serious consequences
 Key: HDFS-6145
 URL: https://issues.apache.org/jira/browse/HDFS-6145
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Ding Yuan


There are a few cases where an exception should never have occurred, but the 
code simply logged it and let the execution continue. Since they shouldn't have 
occurred, a safer way may be to simply terminate the execution and stop them 
from propagating into some unexpected consequences.

==
Case 1:
Line: 336, File: 
"org/apache/hadoop/hdfs/server/namenode/snapshot/INodeDirectorySnapshottable.java"
{noformat}

325:   try {
326: Quota.Counts counts = cleanSubtree(snapshot, prior, 
collectedBlocks,
327: removedINodes, true);
328: INodeDirectory parent = getParent();
 .. ..
335:   } catch(QuotaExceededException e) {
336: LOG.error("BUG: removeSnapshot increases namespace usage.", e);
337:   }
{noformat}

Since this shouldn't have occurred unless some unexpected bugs occur,
should the NN simply stop the execution to prevent bad things from propagation?

Similar handling of QuotaExceededException can be found at:
  Line: 544, File: "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
  Line: 657, File: "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
  Line: 669, File: "org/apache/hadoop/hdfs/server/namenode/INodeReference.java"
==
==
Case 2:
Line: 601, File: "org/apache/hadoop/hdfs/server/namenode/JournalSet.java"

{noformat}
591:  public synchronized RemoteEditLogManifest getEditLogManifest(long 
fromTxId,
..
595:for (JournalAndStream j : journals) {
..
598: try {
599:   allLogs.addAll(fjm.getRemoteEditLogs(fromTxId, forReading, 
false));
600: } catch (Throwable t) {
601:   LOG.warn("Cannot list edit logs in " + fjm, t);
602: }
{noformat}

An exception from addAll will result in some edit log files not considered, and 
not included in the checkpoint, which may result in dataloss.
==
==
Case 3:
Line: 4029, File: "org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java"

{noformat}
4010:   try {
4011: while (fsRunning && shouldNNRmRun) {
4012:   checkAvailableResources();
4013:   if(!nameNodeHasResourcesAvailable()) {
4014: String lowResourcesMsg = "NameNode low on available disk 
space. ";
4015: if (!isInSafeMode()) {
4016:   FSNamesystem.LOG.warn(lowResourcesMsg + "Entering safe 
mode.");
4017: } else {
4018:   FSNamesystem.LOG.warn(lowResourcesMsg + "Already in safe 
mode.");
4019: }
4020: enterSafeMode(true);
4021:   }
.. ..
4027: }
4028:   } catch (Exception e) {
4029: FSNamesystem.LOG.error("Exception in NameNodeResourceMonitor: ", 
e);
4030:   }
{noformat}

enterSafeMode might thrown exception. In the case of not being able to entering 
safe mode, should the execution simply terminate?
==




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6144) Failing to delete balancer.id in close could prevent balancer from starting the next time

2014-03-23 Thread Ding Yuan (JIRA)
Ding Yuan created HDFS-6144:
---

 Summary: Failing to delete balancer.id in close could prevent 
balancer from starting the next time
 Key: HDFS-6144
 URL: https://issues.apache.org/jira/browse/HDFS-6144
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ding Yuan


Line: 215, File: "org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java"

{noformat}
199: void close() {
212:   try {
213: fs.delete(BALANCER_ID_PATH, true);
214:   } catch(IOException ioe) {
215: LOG.warn("Failed to delete " + BALANCER_ID_PATH, ioe);
216:   }
{noformat}

If the FS cannot delete this file for some reason, it will prevent any 
balancers from running in the future. Should at least retry in this case (or 
warn the users about this and suggesting them to manually delete this file)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6143) HftpFileSystem open should through FileNotFoundException for non-existing paths

2014-03-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944397#comment-13944397
 ] 

Hadoop QA commented on HDFS-6143:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12636242/HDFS-6143.v01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6467//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6467//console

This message is automatically generated.

> HftpFileSystem open should through FileNotFoundException for non-existing 
> paths
> ---
>
> Key: HDFS-6143
> URL: https://issues.apache.org/jira/browse/HDFS-6143
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: HDFS-6143.v01.patch
>
>
> HftpFileSystem.open incorrectly handles non-existing paths. 
> - 'open', does not really open anything, i.e., it does not contact the 
> server, and therefore cannot discover FileNotFound, it's deferred until next 
> read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
> get ENOENT on open. 
> [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
>  is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
> instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6110) adding more slow action log in critical write path

2014-03-23 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944387#comment-13944387
 ] 

Liang Xie commented on HDFS-6110:
-

Seems not HBase only,  i just saw hdfs-6139, MR application should be benefit 
if this patch is in.

> adding more slow action log in critical write path
> --
>
> Key: HDFS-6110
> URL: https://issues.apache.org/jira/browse/HDFS-6110
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6110-v2.txt, HDFS-6110.txt
>
>
> After digging a HBase write spike issue caused by slow buffer io in our 
> cluster, just realize we'd better to add more abnormal latency warning log in 
> write flow, such that if other guys hit HLog sync spike, we could know more 
> detail info from HDFS side at the same time.
> Patch will be uploaded soon.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6139) JobTracker blocked on TIMED_WAITING DFSOutputStream.waitForAckedSeqno() running

2014-03-23 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944386#comment-13944386
 ] 

Liang Xie commented on HDFS-6139:
-

If HDFS-6110 was in, it should be very helpful for diagnoses.

> JobTracker blocked on TIMED_WAITING DFSOutputStream.waitForAckedSeqno() 
> running
> ---
>
> Key: HDFS-6139
> URL: https://issues.apache.org/jira/browse/HDFS-6139
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 2.0.0-alpha
>Reporter: Muga Nishizawa
>
> We're using CDH 4.2.1.  The following is a part of threaddump on our 
> JobTracker.
> {code}
> "IPC Server handler 249 on 8021" daemon prio=10 tid=0x7fce80e2c000 
> nid=0x718e in Object.wait() [0x7fc92afe6000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x7fc95f5ffba8> (a java.util.LinkedList)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1708)
> - locked <0x7fc95f5ffba8> (a java.util.LinkedList)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:1694)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1778)
> - locked <0x7fc95f5ff898> (a 
> org.apache.hadoop.hdfs.DFSOutputStream)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:66)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:99)
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3562)
> - locked <0x7fc9652787e0> (a org.apache.hadoop.mapred.JobTracker)
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3475)
> at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:474)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
> ... ...
> "Thread-9489990" daemon prio=10 tid=0x7fce6c01b000 nid=0x14e1 runnable 
> [0x7fc8f38f7000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> - locked <0x7fc9631201d0> (a sun.nio.ch.Util$1)
> - locked <0x7fc9631201e8> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x7fc963120158> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:158)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:117)
> at java.io.FilterInputStream.read(FilterInputStream.java:66)
> at java.io.FilterInputStream.read(FilterInputStream.java:66)
> at 
> org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:169)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1105)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1039)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6143) HftpFileSystem open should through FileNotFoundException for non-existing paths

2014-03-23 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HDFS-6143:


Attachment: HDFS-6143.v01.patch

v01 of the patch for review

> HftpFileSystem open should through FileNotFoundException for non-existing 
> paths
> ---
>
> Key: HDFS-6143
> URL: https://issues.apache.org/jira/browse/HDFS-6143
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: HDFS-6143.v01.patch
>
>
> HftpFileSystem.open incorrectly handles non-existing paths. 
> - 'open', does not really open anything, i.e., it does not contact the 
> server, and therefore cannot discover FileNotFound, it's deferred until next 
> read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
> get ENOENT on open. 
> [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
>  is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
> instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6143) HftpFileSystem open should through FileNotFoundException for non-existing paths

2014-03-23 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HDFS-6143:


Status: Patch Available  (was: Open)

> HftpFileSystem open should through FileNotFoundException for non-existing 
> paths
> ---
>
> Key: HDFS-6143
> URL: https://issues.apache.org/jira/browse/HDFS-6143
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
> Attachments: HDFS-6143.v01.patch
>
>
> HftpFileSystem.open incorrectly handles non-existing paths. 
> - 'open', does not really open anything, i.e., it does not contact the 
> server, and therefore cannot discover FileNotFound, it's deferred until next 
> read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
> get ENOENT on open. 
> [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
>  is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
> instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6143) HftpFileSystem open should through FileNotFoundException for non-existing paths

2014-03-23 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HDFS-6143:


Priority: Blocker  (was: Major)
Target Version/s: 2.4.0

> HftpFileSystem open should through FileNotFoundException for non-existing 
> paths
> ---
>
> Key: HDFS-6143
> URL: https://issues.apache.org/jira/browse/HDFS-6143
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.3.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Blocker
>
> HftpFileSystem.open incorrectly handles non-existing paths. 
> - 'open', does not really open anything, i.e., it does not contact the 
> server, and therefore cannot discover FileNotFound, it's deferred until next 
> read. It's counterintuitive and not how local FS or HDFS work. In POSIX you 
> get ENOENT on open. 
> [LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
>  is an example of the code that's broken because of this.
> - On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST 
> instead of SC_NOT_FOUND for non-exitsing paths



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6143) HftpFileSystem open should through FileNotFoundException for non-existing paths

2014-03-23 Thread Gera Shegalov (JIRA)
Gera Shegalov created HDFS-6143:
---

 Summary: HftpFileSystem open should through FileNotFoundException 
for non-existing paths
 Key: HDFS-6143
 URL: https://issues.apache.org/jira/browse/HDFS-6143
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov


HftpFileSystem.open incorrectly handles non-existing paths. 
- 'open', does not really open anything, i.e., it does not contact the server, 
and therefore cannot discover FileNotFound, it's deferred until next read. It's 
counterintuitive and not how local FS or HDFS work. In POSIX you get ENOENT on 
open. 
[LzoInputFormat.getSplits|https://github.com/kevinweil/elephant-bird/blob/master/core/src/main/java/com/twitter/elephantbird/mapreduce/input/LzoInputFormat.java]
 is an example of the code that's broken because of this.

- On the server side, FileDataServlet incorrectly sends SC_BAD_REQUEST instead 
of SC_NOT_FOUND for non-exitsing paths





--
This message was sent by Atlassian JIRA
(v6.2#6252)