[jira] [Updated] (HDFS-5901) NameNode new UI doesn't support IE8 and IE9 on windows 7

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5901:


Attachment: HDFS-5901.patch

Attaching the patch.
Please review

 NameNode new UI doesn't support IE8 and IE9 on windows 7
 

 Key: HDFS-5901
 URL: https://issues.apache.org/jira/browse/HDFS-5901
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5901.patch


 NameNode new webUI doesnot support IE8 and IE9 on windows 7
 And doesnt render the page correctly.
 js error:
 {noformat}Line: 74
 Error: Unable to get value of the property 'url': object is null or 
 undefined{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5901) NameNode new UI doesn't support IE8 and IE9 on windows 7

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5901:


Status: Patch Available  (was: Open)

 NameNode new UI doesn't support IE8 and IE9 on windows 7
 

 Key: HDFS-5901
 URL: https://issues.apache.org/jira/browse/HDFS-5901
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5901.patch


 NameNode new webUI doesnot support IE8 and IE9 on windows 7
 And doesnt render the page correctly.
 js error:
 {noformat}Line: 74
 Error: Unable to get value of the property 'url': object is null or 
 undefined{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5901) NameNode new UI doesn't support IE8 and IE9 on windows 7

2014-02-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894325#comment-13894325
 ] 

Vinay commented on HDFS-5901:
-

Yes [~wheat9],
That works. Moving {{DOCTYPE}} before apache licence header.

 NameNode new UI doesn't support IE8 and IE9 on windows 7
 

 Key: HDFS-5901
 URL: https://issues.apache.org/jira/browse/HDFS-5901
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5901.patch


 NameNode new webUI doesnot support IE8 and IE9 on windows 7
 And doesnt render the page correctly.
 js error:
 {noformat}Line: 74
 Error: Unable to get value of the property 'url': object is null or 
 undefined{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5901) NameNode new UI doesn't support IE8 and IE9 on windows 7

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5901:


Attachment: HDFS-5901.patch

Updated the patch as per [~wheat9] suggestion

 NameNode new UI doesn't support IE8 and IE9 on windows 7
 

 Key: HDFS-5901
 URL: https://issues.apache.org/jira/browse/HDFS-5901
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5901.patch, HDFS-5901.patch


 NameNode new webUI doesnot support IE8 and IE9 on windows 7
 And doesnt render the page correctly.
 js error:
 {noformat}Line: 74
 Error: Unable to get value of the property 'url': object is null or 
 undefined{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5185) DN fails to startup if one of the data dir is full

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5185:


Attachment: HDFS-5185.patch

Adding a patch to check for disk failures before adding block pools to storage

 DN fails to startup if one of the data dir is full
 --

 Key: HDFS-5185
 URL: https://issues.apache.org/jira/browse/HDFS-5185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5185.patch


 DataNode fails to startup if one of the data dirs configured is out of space. 
 fails with following exception
 {noformat}2013-09-11 17:48:43,680 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (storage id 
 DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
 java.io.IOException: Mkdirs failed to create 
 /opt/nish/data/current/BP-123456-1234567/tmp
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.init(BlockPoolSlice.java:105)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It should continue to start-up with other data dirs available.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5185) DN fails to startup if one of the data dir is full

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5185:


Status: Patch Available  (was: Open)

 DN fails to startup if one of the data dir is full
 --

 Key: HDFS-5185
 URL: https://issues.apache.org/jira/browse/HDFS-5185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5185.patch


 DataNode fails to startup if one of the data dirs configured is out of space. 
 fails with following exception
 {noformat}2013-09-11 17:48:43,680 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (storage id 
 DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
 java.io.IOException: Mkdirs failed to create 
 /opt/nish/data/current/BP-123456-1234567/tmp
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.init(BlockPoolSlice.java:105)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It should continue to start-up with other data dirs available.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5185) DN fails to startup if one of the data dir is full

2014-02-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894453#comment-13894453
 ] 

Vinay commented on HDFS-5185:
-

Even after patch behaviour will be same by default unless 
dfs.datanode.failed.volumes.tolerated is changed.

 DN fails to startup if one of the data dir is full
 --

 Key: HDFS-5185
 URL: https://issues.apache.org/jira/browse/HDFS-5185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Blocker
 Attachments: HDFS-5185.patch


 DataNode fails to startup if one of the data dirs configured is out of space. 
 fails with following exception
 {noformat}2013-09-11 17:48:43,680 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (storage id 
 DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
 java.io.IOException: Mkdirs failed to create 
 /opt/nish/data/current/BP-123456-1234567/tmp
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.init(BlockPoolSlice.java:105)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It should continue to start-up with other data dirs available.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5903) FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed

2014-02-07 Thread Vinay (JIRA)
Vinay created HDFS-5903:
---

 Summary: FSVolumeList#initializeReplicaMaps(..) not doing 
anything, it can be removed
 Key: HDFS-5903
 URL: https://issues.apache.org/jira/browse/HDFS-5903
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Minor


{code}  void initializeReplicaMaps(ReplicaMap globalReplicaMap) throws 
IOException {
for (FsVolumeImpl v : volumes) {
  v.getVolumeMap(globalReplicaMap);
}
  }{code}

This method has been called at the time of initialization even before the 
blockpools are added. So its useless to call this method.

Anyway replica map will be updated for each of the blockpool during 
{{addBlockPool(..)}} by calling {{FSVolumesList#getAllVolumesMap(..)}}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5903) FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5903:


Attachment: HDFS-5903.patch

Attaching a patch 
Please review

 FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed
 

 Key: HDFS-5903
 URL: https://issues.apache.org/jira/browse/HDFS-5903
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Minor
 Attachments: HDFS-5903.patch


 {code}  void initializeReplicaMaps(ReplicaMap globalReplicaMap) throws 
 IOException {
 for (FsVolumeImpl v : volumes) {
   v.getVolumeMap(globalReplicaMap);
 }
   }{code}
 This method has been called at the time of initialization even before the 
 blockpools are added. So its useless to call this method.
 Anyway replica map will be updated for each of the blockpool during 
 {{addBlockPool(..)}} by calling {{FSVolumesList#getAllVolumesMap(..)}}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5903) FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5903:


Status: Patch Available  (was: Open)

 FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed
 

 Key: HDFS-5903
 URL: https://issues.apache.org/jira/browse/HDFS-5903
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Minor
 Attachments: HDFS-5903.patch


 {code}  void initializeReplicaMaps(ReplicaMap globalReplicaMap) throws 
 IOException {
 for (FsVolumeImpl v : volumes) {
   v.getVolumeMap(globalReplicaMap);
 }
   }{code}
 This method has been called at the time of initialization even before the 
 blockpools are added. So its useless to call this method.
 Anyway replica map will be updated for each of the blockpool during 
 {{addBlockPool(..)}} by calling {{FSVolumesList#getAllVolumesMap(..)}}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5903) FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5903:


Attachment: HDFS-5903.patch

Attaching same patch to trigger jenkins again.

 FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed
 

 Key: HDFS-5903
 URL: https://issues.apache.org/jira/browse/HDFS-5903
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Minor
 Attachments: HDFS-5903.patch, HDFS-5903.patch


 {code}  void initializeReplicaMaps(ReplicaMap globalReplicaMap) throws 
 IOException {
 for (FsVolumeImpl v : volumes) {
   v.getVolumeMap(globalReplicaMap);
 }
   }{code}
 This method has been called at the time of initialization even before the 
 blockpools are added. So its useless to call this method.
 Anyway replica map will be updated for each of the blockpool during 
 {{addBlockPool(..)}} by calling {{FSVolumesList#getAllVolumesMap(..)}}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5903) FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed

2014-02-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894559#comment-13894559
 ] 

Vinay commented on HDFS-5903:
-

{noformat}#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32776 bytes for 
Chunk::new
# An error report file with more information is saved as:
# 
/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hs_err_pid21355.log{noformat}

This is not compilation problem. While running javadoc jenkins got OOME.

 FSVolumeList#initializeReplicaMaps(..) not doing anything, it can be removed
 

 Key: HDFS-5903
 URL: https://issues.apache.org/jira/browse/HDFS-5903
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Minor
 Attachments: HDFS-5903.patch


 {code}  void initializeReplicaMaps(ReplicaMap globalReplicaMap) throws 
 IOException {
 for (FsVolumeImpl v : volumes) {
   v.getVolumeMap(globalReplicaMap);
 }
   }{code}
 This method has been called at the time of initialization even before the 
 blockpools are added. So its useless to call this method.
 Anyway replica map will be updated for each of the blockpool during 
 {{addBlockPool(..)}} by calling {{FSVolumesList#getAllVolumesMap(..)}}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2014-02-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895377#comment-13895377
 ] 

Vinay commented on HDFS-3405:
-

Bq. Popping up one level, I wonder, since we're uploading one one fsimage at a 
time, why the code should handle multipart requests?
Upload requests will be handled as multipart requests only right. .? You know 
any other ways.. I am not quite expert in this area.. 
Bq.Personally I'm conservative on bringing new dependency on hadoop-core 
projects.
The new dependency is for only at server side. This doesn't affect downstream 
projects as patch excludes the dependency in hadoop-client. 
So downstream projects classpath will be remain same. 

 Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
 fsimages
 

 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Vinay
 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch


 As Todd points out in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
  the current scheme for a checkpointing daemon to upload a merged fsimage 
 file to an NN is to issue an HTTP get request to tell the target NN to issue 
 another GET request back to the checkpointing daemon to retrieve the merged 
 fsimage file. There's no fundamental reason the checkpointing daemon can't 
 just use an HTTP POST or PUT to send back the merged fsimage file, rather 
 than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5901) NameNode new UI doesn't support IE8 and IE9 on windows 7

2014-02-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5901:


Attachment: HDFS-5901.patch

Updated the patch.
Removed redundant meta tag

 NameNode new UI doesn't support IE8 and IE9 on windows 7
 

 Key: HDFS-5901
 URL: https://issues.apache.org/jira/browse/HDFS-5901
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5901.patch, HDFS-5901.patch, HDFS-5901.patch


 NameNode new webUI doesnot support IE8 and IE9 on windows 7
 And doesnt render the page correctly.
 js error:
 {noformat}Line: 74
 Error: Unable to get value of the property 'url': object is null or 
 undefined{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2014-02-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895398#comment-13895398
 ] 

Vinay commented on HDFS-3405:
-

I found this RFC [link|http://tools.ietf.org/html/rfc1867] telling about the 
uploading of file using multipart data.

 Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
 fsimages
 

 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Vinay
 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch


 As Todd points out in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
  the current scheme for a checkpointing daemon to upload a merged fsimage 
 file to an NN is to issue an HTTP get request to tell the target NN to issue 
 another GET request back to the checkpointing daemon to retrieve the merged 
 fsimage file. There's no fundamental reason the checkpointing daemon can't 
 just use an HTTP POST or PUT to send back the merged fsimage file, rather 
 than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2014-02-06 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-3405:


Attachment: HDFS-3405.patch

Attaching a patch fixing test failures. Missed one line during rebase ;)

 Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
 fsimages
 

 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Vinay
 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch


 As Todd points out in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
  the current scheme for a checkpointing daemon to upload a merged fsimage 
 file to an NN is to issue an HTTP get request to tell the target NN to issue 
 another GET request back to the checkpointing daemon to retrieve the merged 
 fsimage file. There's no fundamental reason the checkpointing daemon can't 
 just use an HTTP POST or PUT to send back the merged fsimage file, rather 
 than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2014-02-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893635#comment-13893635
 ] 

Vinay commented on HDFS-3405:
-

Test failed with below exception
{noformat}Exception in thread Edit log tailer 
org.apache.hadoop.util.ExitUtil$ExitException: java.lang.NullPointerException
at com.google.common.base.Joiner.join(Joiner.java:226)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:695)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296){noformat}
By seeing the code, its impossible to get NPE. I am not sure how this happened. 
Also this is not related to current patch

 Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
 fsimages
 

 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Vinay
 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch


 As Todd points out in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
  the current scheme for a checkpointing daemon to upload a merged fsimage 
 file to an NN is to issue an HTTP get request to tell the target NN to issue 
 another GET request back to the checkpointing daemon to retrieve the merged 
 fsimage file. There's no fundamental reason the checkpointing daemon can't 
 just use an HTTP POST or PUT to send back the merged fsimage file, rather 
 than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2014-02-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893799#comment-13893799
 ] 

Vinay commented on HDFS-3405:
-

If commos fileupload not used then parsing and encoding of multipart requests 
We need to handle which could be difficult and error prone.. 

 Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
 fsimages
 

 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Vinay
 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch


 As Todd points out in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
  the current scheme for a checkpointing daemon to upload a merged fsimage 
 file to an NN is to issue an HTTP get request to tell the target NN to issue 
 another GET request back to the checkpointing daemon to retrieve the merged 
 fsimage file. There's no fundamental reason the checkpointing daemon can't 
 just use an HTTP POST or PUT to send back the merged fsimage file, rather 
 than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2014-02-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894183#comment-13894183
 ] 

Vinay commented on HDFS-3405:
-

Thanks [~wheat9] for explaining your concern.
Actually, commons-fileupload dependency is used only to parse the uploaded 
multipart data at server side. 

Client side connection establishment happens through {{URLConnectionFactory}} 
itself and here data will be encoded by our own code. Not using 
commons-fileupload here. Please check {{TransferImage#uploadImage(..)}} in the 
patch.

Since this library is not involved in establishing connection, I feel there is 
no problem will arise as you mentioned.

Any other problem you are seeing with commons-fileupload dependency..?

 Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
 fsimages
 

 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Vinay
 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch


 As Todd points out in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
  the current scheme for a checkpointing daemon to upload a merged fsimage 
 file to an NN is to issue an HTTP get request to tell the target NN to issue 
 another GET request back to the checkpointing daemon to retrieve the merged 
 fsimage file. There's no fundamental reason the checkpointing daemon can't 
 just use an HTTP POST or PUT to send back the merged fsimage file, rather 
 than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5901) NameNode new UI doesn't support IE8 and IE9 on windows 7

2014-02-06 Thread Vinay (JIRA)
Vinay created HDFS-5901:
---

 Summary: NameNode new UI doesn't support IE8 and IE9 on windows 7
 Key: HDFS-5901
 URL: https://issues.apache.org/jira/browse/HDFS-5901
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Vinay
Assignee: Vinay


NameNode new webUI doesnot support IE8 and IE9 on windows 7
And doesnt render the page correctly.

js error:

{noformat}Line: 74
Error: Unable to get value of the property 'url': object is null or 
undefined{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5890) Avoid NPE in Datanode heartbeat

2014-02-05 Thread Vinay (JIRA)
Vinay created HDFS-5890:
---

 Summary: Avoid NPE in Datanode heartbeat
 Key: HDFS-5890
 URL: https://issues.apache.org/jira/browse/HDFS-5890
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Vinay
Assignee: Vinay


Avoid NPE in datanode heartbeat messages.

{noformat}2014-02-05 17:24:11,347 WARN org.apache.hadoop.ipc.Server: IPC Server 
handler 5 on 9000, call 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
10.18.40.32:13299 Call#1721 Retry#0: error: java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertRollingUpgradeStatus(PBHelper.java:1460)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:124)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:27704)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5890) Avoid NPE in Datanode heartbeat

2014-02-05 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5890:


Attachment: HDFS-5890.patch

Attaching the patch 
1. Avoids NPE
2. Checks for valid RollingUpgradeStatus in response. 

 Avoid NPE in Datanode heartbeat
 ---

 Key: HDFS-5890
 URL: https://issues.apache.org/jira/browse/HDFS-5890
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5890.patch


 Avoid NPE in datanode heartbeat messages.
 {noformat}2014-02-05 17:24:11,347 WARN org.apache.hadoop.ipc.Server: IPC 
 Server handler 5 on 9000, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
 10.18.40.32:13299 Call#1721 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convertRollingUpgradeStatus(PBHelper.java:1460)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:124)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:27704)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
   at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5890) Avoid NPE in Datanode heartbeat

2014-02-05 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892929#comment-13892929
 ] 

Vinay commented on HDFS-5890:
-

Thank you Brandon

 Avoid NPE in Datanode heartbeat
 ---

 Key: HDFS-5890
 URL: https://issues.apache.org/jira/browse/HDFS-5890
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5890.patch


 Avoid NPE in datanode heartbeat messages.
 {noformat}2014-02-05 17:24:11,347 WARN org.apache.hadoop.ipc.Server: IPC 
 Server handler 5 on 9000, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
 10.18.40.32:13299 Call#1721 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convertRollingUpgradeStatus(PBHelper.java:1460)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:124)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:27704)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
   at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5878) Trunk windows build broken after HDFS-5746

2014-02-05 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893025#comment-13893025
 ] 

Vinay commented on HDFS-5878:
-

bq. Vinay, thanks for picking this up. HDFS-5746 added 
TestSharedFileDescriptorFactory, TestDomainSocketWatcher and 
TestShortCircuitSharedMemorySegment. Do you know if any of those tests are 
failing on Windows? It looks like they're all set up to skip running on 
Windows, but I haven't confirmed yet. I'll confirm when I review this patch.
Yes, these tests will be skipped in windows. I have verified in my eclipse on 
windows 7.

 Trunk windows build broken after HDFS-5746
 --

 Key: HDFS-5878
 URL: https://issues.apache.org/jira/browse/HDFS-5878
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay
Priority: Blocker
 Attachments: HDFS-5878.patch


 Hadoop build broken with Native code errors in windows.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.

2014-02-05 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893033#comment-13893033
 ] 

Vinay commented on HDFS-5889:
-

Hi Nicholas,
How you are planning to keep these checkpointed images..? in some special 
directory so that during rollback these can be ignored and during downgrade 
these can be included?

 When rolling upgrade is in progress, standby NN should create checkpoint for 
 downgrade.
 ---

 Key: HDFS-5889
 URL: https://issues.apache.org/jira/browse/HDFS-5889
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE

 After rolling upgrade is started and checkpoint is disabled, the edit log may 
 grow to a huge size.  It is not a problem if rolling upgrade is finalized 
 normally since NN keeps the current state in memory and it writes a new 
 checkpoint during finalize.  However, it is a problem if admin decides to 
 downgrade.  It could take a long time to apply edit log.  Rollback does not 
 have such problem.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-05 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13893036#comment-13893036
 ] 

Vinay commented on HDFS-5869:
-

bq. Once we have HDFS-5889, we do not need to disable the checkpointing in SBN 
anymore. 
Yes, we need not disable, but checkpoints during rollingupgrade should be kept 
separately only for downgrade purpose.
bq. My understand is that: if we have successfully done the checkpoint for the 
startRollingUpgrade call, we will have an fsimage and in case that this is the 
latest fsimage, we do not need to do the checkpoint again while restarting. But 
if the fsimage is lost or this is the SBN/SNN, this extra checkpoint can be 
triggered. Do I understand correctly here?
Yes you are right. 
bq.  And I guess the SecondaryNN actually does not need to do this extra 
checkpoint?
Yes its not required. how to avoid only for SNN?

 When rolling upgrade is in progress, NN should only create checkpoint right 
 before the upgrade marker
 -

 Key: HDFS-5869
 URL: https://issues.apache.org/jira/browse/HDFS-5869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5869_20140204.patch, h5869_20140204b.patch, 
 h5869_20140205.patch


 - When starting rolling upgrade, NN should create a checkpoint before it 
 writes the upgrade marker edit log transaction.
 - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
 calls. Further, if NN restarts, it should create a checkpoint only right 
 before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5890) Avoid NPE in Datanode heartbeat

2014-02-05 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HDFS-5890.
-

   Resolution: Fixed
Fix Version/s: HDFS-5535 (Rolling upgrades)
 Hadoop Flags: Reviewed

 Avoid NPE in Datanode heartbeat
 ---

 Key: HDFS-5890
 URL: https://issues.apache.org/jira/browse/HDFS-5890
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Reporter: Vinay
Assignee: Vinay
 Fix For: HDFS-5535 (Rolling upgrades)

 Attachments: HDFS-5890.patch


 Avoid NPE in datanode heartbeat messages.
 {noformat}2014-02-05 17:24:11,347 WARN org.apache.hadoop.ipc.Server: IPC 
 Server handler 5 on 9000, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
 10.18.40.32:13299 Call#1721 Retry#0: error: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convertRollingUpgradeStatus(PBHelper.java:1460)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.sendHeartbeat(DatanodeProtocolServerSideTranslatorPB.java:124)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:27704)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
   at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages

2014-02-05 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-3405:


Attachment: HDFS-3405.patch

Attaching the rabased patch.

Please review.

 Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged 
 fsimages
 

 Key: HDFS-3405
 URL: https://issues.apache.org/jira/browse/HDFS-3405
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
Reporter: Aaron T. Myers
Assignee: Vinay
 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, 
 HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch


 As Todd points out in [this 
 comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
  the current scheme for a checkpointing daemon to upload a merged fsimage 
 file to an NN is to issue an HTTP get request to tell the target NN to issue 
 another GET request back to the checkpointing daemon to retrieve the merged 
 fsimage file. There's no fundamental reason the checkpointing daemon can't 
 just use an HTTP POST or PUT to send back the merged fsimage file, rather 
 than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HDFS-5185) DN fails to startup if one of the data dir is full

2014-02-04 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HDFS-5185:
---

Assignee: Vinay

 DN fails to startup if one of the data dir is full
 --

 Key: HDFS-5185
 URL: https://issues.apache.org/jira/browse/HDFS-5185
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay
Priority: Blocker

 DataNode fails to startup if one of the data dirs configured is out of space. 
 fails with following exception
 {noformat}2013-09-11 17:48:43,680 FATAL 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
 block pool Block pool registering (storage id 
 DS-308316523-xx.xx.xx.xx-64015-1378896293604) service to /nn1:65110
 java.io.IOException: Mkdirs failed to create 
 /opt/nish/data/current/BP-123456-1234567/tmp
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.init(BlockPoolSlice.java:105)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.addBlockPool(FsVolumeImpl.java:216)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.addBlockPool(FsVolumeList.java:155)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.addBlockPool(FsDatasetImpl.java:1593)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:834)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:217)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It should continue to start-up with other data dirs available.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5177) blocksScheduled count should be decremented for abandoned blocks

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890869#comment-13890869
 ] 

Vinay commented on HDFS-5177:
-

Hi all,
Can someone please take a look at the patch.
Thanks in advance.

 blocksScheduled  count should be decremented for abandoned blocks
 -

 Key: HDFS-5177
 URL: https://issues.apache.org/jira/browse/HDFS-5177
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5177.patch, HDFS-5177.patch, HDFS-5177.patch


 DatanodeDescriptor#incBlocksScheduled() will be called for all datanodes of 
 the block on each allocation. But same should be decremented for abandoned 
 blocks.
 When one of the datanodes is down and same is allocated for the block along 
 with other live datanodes, then this block will be abandoned, but the 
 scheduled count on other datanodes will consider live datanodes as loaded, 
 but in reality these datanodes may not be loaded.
 Anyway this scheduled count will be rolled every 20 mins.
 Problem will come if the rate of creation of files is more. Due to increase 
 in the scheduled count, there might be chances of missing local datanode to 
 write to. and some times writes also can fail in small clusters.
 So we need to decrement the unnecessary count on abandon block call.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890875#comment-13890875
 ] 

Vinay commented on HDFS-5669:
-

Please someone take a look at simple patch. Thanks

 Storage#tryLock() should check for null before logging successfull message
 --

 Key: HDFS-5669
 URL: https://issues.apache.org/jira/browse/HDFS-5669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5669.patch


 In the following code in Storage#tryLock(), there is a possibility that 
 {{file.getChannel().tryLock()}} returns null if the lock is acquired by some 
 other process. In that case even though return value is null, a successfull 
 message confuses.
 {code}try {
 res = file.getChannel().tryLock();
 file.write(jvmName.getBytes(Charsets.UTF_8));
 LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
   } catch(OverlappingFileLockException oe) {{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5503) Datanode#checkDiskError also should check for ClosedChannelException

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890879#comment-13890879
 ] 

Vinay commented on HDFS-5503:
-

Hi All,
Please someone take a look at simple patch. Thanks in advance.

 Datanode#checkDiskError also should check for ClosedChannelException
 

 Key: HDFS-5503
 URL: https://issues.apache.org/jira/browse/HDFS-5503
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5503.patch


 out file
 ==
 {noformat}
 Exception in thread PacketResponder: 
 BP-52063768-x-1383447451718:blk_1073755206_1099511661730, 
 type=LAST_IN_PIPELINE, downstreams=0:[] java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:1363)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1233)
 at java.lang.Thread.run(Thread.java:662){noformat}
 log file
 ===
 {noformat}2013-11-08 21:23:36,622 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in 
 BlockReceiver.run():
 java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
 at 
 org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1212)
 at java.lang.Thread.run(Thread.java:662)
 2013-11-08 21:23:36,622 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 checkDiskError: exception:
 java.nio.channels.ClosedChannelException
 at 
 sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
 at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
 at 
 org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
 at 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
 at 
 org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
 at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at java.io.DataOutputStream.flush(DataOutputStream.java:106)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:1212)
 at java.lang.Thread.run(Thread.java:662){noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891808#comment-13891808
 ] 

Vinay commented on HDFS-5869:
-

Patch looks good nicholas.
I think it will be better to roll edits after saveNamespace() during 
{{startRollingUpgrade()}}. It will have clear separation of edits also.

 When rolling upgrade is in progress, NN should only create checkpoint right 
 before the upgrade marker
 -

 Key: HDFS-5869
 URL: https://issues.apache.org/jira/browse/HDFS-5869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5869_20140204.patch, h5869_20140204b.patch


 - When starting rolling upgrade, NN should create a checkpoint before it 
 writes the upgrade marker edit log transaction.
 - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
 calls. Further, if NN restarts, it should create a checkpoint only right 
 before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891821#comment-13891821
 ] 

Vinay commented on HDFS-5869:
-

Oops. I had forgot that. Thanks for the update. 

 When rolling upgrade is in progress, NN should only create checkpoint right 
 before the upgrade marker
 -

 Key: HDFS-5869
 URL: https://issues.apache.org/jira/browse/HDFS-5869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5869_20140204.patch, h5869_20140204b.patch


 - When starting rolling upgrade, NN should create a checkpoint before it 
 writes the upgrade marker edit log transaction.
 - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
 calls. Further, if NN restarts, it should create a checkpoint only right 
 before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891824#comment-13891824
 ] 

Vinay commented on HDFS-5869:
-

One more thing..
{code}} else if (rollingUpgradeOpt == 
RollingUpgradeStartupOption.STARTED) {
  if (totalEdits  1) {
// save namespace if this is not the second edit transaction
// (the first must be OP_START_LOG_SEGMENT)
fsNamesys.getFSImage().saveNamespace(fsNamesys);
  }{code}
When the standbyNN is restarted twice with RollingUpgradeStartupOption.STARTED 
option, we will loose the OP_UPGRADE_MARKER and hence rollingUpgradeInfo also 
will be lost.
Am I missing something here?

 When rolling upgrade is in progress, NN should only create checkpoint right 
 before the upgrade marker
 -

 Key: HDFS-5869
 URL: https://issues.apache.org/jira/browse/HDFS-5869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5869_20140204.patch, h5869_20140204b.patch


 - When starting rolling upgrade, NN should create a checkpoint before it 
 writes the upgrade marker edit log transaction.
 - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
 calls. Further, if NN restarts, it should create a checkpoint only right 
 before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891837#comment-13891837
 ] 

Vinay commented on HDFS-5869:
-

Got it.
saveNamespace() while loading OP_UPGRADE_MARKER will not include the current 
editlog segment because {{lastAppliedTxId}} of {{FSImage}} still will be 
pointing to previous segment/checkpoint's last txn. Also {{totalEdits}} will be 
1 so it wont try to save again.

 When rolling upgrade is in progress, NN should only create checkpoint right 
 before the upgrade marker
 -

 Key: HDFS-5869
 URL: https://issues.apache.org/jira/browse/HDFS-5869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5869_20140204.patch, h5869_20140204b.patch


 - When starting rolling upgrade, NN should create a checkpoint before it 
 writes the upgrade marker edit log transaction.
 - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
 calls. Further, if NN restarts, it should create a checkpoint only right 
 before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5869) When rolling upgrade is in progress, NN should only create checkpoint right before the upgrade marker

2014-02-04 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13891854#comment-13891854
 ] 

Vinay commented on HDFS-5869:
-

startCheckpoint() will be used only with BackupNode.

As Jing pointed out, we should disable the checkpointing from 
StandbyCheckpointer in StandbyNN when RollingUpgrade in progress.

 When rolling upgrade is in progress, NN should only create checkpoint right 
 before the upgrade marker
 -

 Key: HDFS-5869
 URL: https://issues.apache.org/jira/browse/HDFS-5869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5869_20140204.patch, h5869_20140204b.patch, 
 h5869_20140205.patch


 - When starting rolling upgrade, NN should create a checkpoint before it 
 writes the upgrade marker edit log transaction.
 - When rolling upgrade is in progress, NN should reject saveNamespace rpc 
 calls. Further, if NN restarts, it should create a checkpoint only right 
 before the upgrade marker.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5874) Should not compare DataNode current layout version with that of NameNode in DataStrorage

2014-02-03 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890376#comment-13890376
 ] 

Vinay commented on HDFS-5874:
-

Thanks for filing this jira Brandon.

 Should not compare DataNode current layout version with that of NameNode in 
 DataStrorage
 

 Key: HDFS-5874
 URL: https://issues.apache.org/jira/browse/HDFS-5874
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Brandon Li

 As [~vinayrpet] pointed out in HDFS-5754: in DataStorage 
 DATANODE_LAYOUT_VERSION should not compare with NameNode layout version 
 anymore. 
 {noformat}
   if (DataNodeLayoutVersion.supports(
   LayoutVersion.Feature.FEDERATION,
   HdfsConstants.DATANODE_LAYOUT_VERSION)  
   HdfsConstants.DATANODE_LAYOUT_VERSION == nsInfo.getLayoutVersion()) 
 {
 readProperties(sd, nsInfo.getLayoutVersion());
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5878) Trunk windows build broken after HDFS-5746

2014-02-03 Thread Vinay (JIRA)
Vinay created HDFS-5878:
---

 Summary: Trunk windows build broken after HDFS-5746
 Key: HDFS-5878
 URL: https://issues.apache.org/jira/browse/HDFS-5878
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay
Priority: Blocker


Hadoop build broken with Native code errors in windows.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5878) Trunk windows build broken after HDFS-5746

2014-02-03 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5878:


Attachment: HDFS-5878.patch

This seems to fix the issue. Please check

 Trunk windows build broken after HDFS-5746
 --

 Key: HDFS-5878
 URL: https://issues.apache.org/jira/browse/HDFS-5878
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay
Priority: Blocker
 Attachments: HDFS-5878.patch


 Hadoop build broken with Native code errors in windows.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5878) Trunk windows build broken after HDFS-5746

2014-02-03 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13890460#comment-13890460
 ] 

Vinay commented on HDFS-5878:
-

Build fails with following error
{noformat}D:\SRC\OpenSource\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.sln
 (default target) (1) -
D:\SRC\OpenSource\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.vcxproj
 (default target) (2)
 -
(ClCompile target) -
  src\org\apache\hadoop\io\nativeio\NativeIO.c(677): error C2065: 'PROT_READ' : 
undeclared identifier [D:\SRC\OpenSource
\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.vcxproj]
  src\org\apache\hadoop\io\nativeio\NativeIO.c(678): error C2065: 'PROT_WRITE' 
: undeclared identifier [D:\SRC\OpenSourc
e\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.vcxproj]
  src\org\apache\hadoop\io\nativeio\NativeIO.c(679): error C2065: 'PROT_EXEC' : 
undeclared identifier [D:\SRC\OpenSource
\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.vcxproj]
  src\org\apache\hadoop\io\nativeio\NativeIO.c(680): error C2065: 'MAP_SHARED' 
: undeclared identifier [D:\SRC\OpenSourc
e\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.vcxproj]
  src\org\apache\hadoop\io\nativeio\NativeIO.c(680): error C2065: 'MAP_PRIVATE' 
: undeclared identifier [D:\SRC\OpenSour
ce\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.vcxproj]
  src\org\apache\hadoop\io\nativeio\NativeIO.c(683): error C2065: 'MAP_FAILED' 
: undeclared identifier [D:\SRC\OpenSourc
e\hadoop\trunk\hadoop-common-project\hadoop-common\src\main\native\native.vcxproj]

8 Warning(s)
6 Error(s)

Time Elapsed 00:00:04.76
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 42.557s
[INFO] Finished at: Tue Feb 04 12:51:28 GMT+05:30 2014
[INFO] Final Memory: 20M/145M
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec 
(compile-ms-native-dll) on project hadoop-co
mmon: Command execution failed. Process exited with an error: 1(Exit value: 1) 
- [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException{noformat}

 Trunk windows build broken after HDFS-5746
 --

 Key: HDFS-5878
 URL: https://issues.apache.org/jira/browse/HDFS-5878
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay
Priority: Blocker
 Attachments: HDFS-5878.patch


 Hadoop build broken with Native code errors in windows.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5848) Add rolling upgrade infomation to heartbeat response

2014-01-31 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887562#comment-13887562
 ] 

Vinay commented on HDFS-5848:
-

Changes looks good. +1

 Add rolling upgrade infomation to heartbeat response
 

 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5848_20130130.patch, h5848_20130130b.patch, 
 h5848_20140131.patch


 When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
 responses so that datanode should create hardlinks when deleting blocks.  We 
 only change heartbeat response here.  The datanode change will be done in a 
 separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5861) Add CLI test for Ls output for extended ACL marker

2014-01-31 Thread Vinay (JIRA)
Vinay created HDFS-5861:
---

 Summary: Add CLI test for Ls output for extended ACL marker
 Key: HDFS-5861
 URL: https://issues.apache.org/jira/browse/HDFS-5861
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Vinay
Assignee: Vinay


Add a xml test to aclTestCLI.xml to test the output of the LS command for the 
extended ACL file/dir.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5861) Add CLI test for Ls output for extended ACL marker

2014-01-31 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5861:


Attachment: HDFS-5861.patch

 Add CLI test for Ls output for extended ACL marker
 --

 Key: HDFS-5861
 URL: https://issues.apache.org/jira/browse/HDFS-5861
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5861.patch


 Add a xml test to aclTestCLI.xml to test the output of the LS command for the 
 extended ACL file/dir.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5861) Add CLI test for Ls output for extended ACL marker

2014-01-31 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13888461#comment-13888461
 ] 

Vinay commented on HDFS-5861:
-

Hi [~cnauroth] , Could you take a look at added test.  Thanks.

 Add CLI test for Ls output for extended ACL marker
 --

 Key: HDFS-5861
 URL: https://issues.apache.org/jira/browse/HDFS-5861
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5861.patch


 Add a xml test to aclTestCLI.xml to test the output of the LS command for the 
 extended ACL file/dir.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5855) Reading edits should not stop at UpgradeMarker for normal restart of the namenode

2014-01-30 Thread Vinay (JIRA)
Vinay created HDFS-5855:
---

 Summary: Reading edits should not stop at UpgradeMarker for normal 
restart of the namenode
 Key: HDFS-5855
 URL: https://issues.apache.org/jira/browse/HDFS-5855
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Vinay


As mentioned 
[here|https://issues.apache.org/jira/browse/HDFS-5645?focusedCommentId=13867530page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13867530]
 in HDFS-5645 NameNode restart will stop reading Edits at UpgradeMarker. But 
this should happen only if namenode is rolling back from upgrade. 
There is also a possibility that namenode could get restarted after rolling 
upgrade started. This should not rollback everything.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5855) Reading edits should not stop at UpgradeMarker for normal restart of the namenode

2014-01-30 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886704#comment-13886704
 ] 

Vinay commented on HDFS-5855:
-

Thanks Nicholas. That could actually solve the issue. My bad. I dint see all 
jiras. Will close this as duplicate.

 Reading edits should not stop at UpgradeMarker for normal restart of the 
 namenode
 -

 Key: HDFS-5855
 URL: https://issues.apache.org/jira/browse/HDFS-5855
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Vinay

 As mentioned 
 [here|https://issues.apache.org/jira/browse/HDFS-5645?focusedCommentId=13867530page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13867530]
  in HDFS-5645 NameNode restart will stop reading Edits at UpgradeMarker. But 
 this should happen only if namenode is rolling back from upgrade. 
 There is also a possibility that namenode could get restarted after rolling 
 upgrade started. This should not rollback everything.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5855) Reading edits should not stop at UpgradeMarker for normal restart of the namenode

2014-01-30 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HDFS-5855.
-

Resolution: Duplicate
  Assignee: Vinay

 Reading edits should not stop at UpgradeMarker for normal restart of the 
 namenode
 -

 Key: HDFS-5855
 URL: https://issues.apache.org/jira/browse/HDFS-5855
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Vinay
Assignee: Vinay

 As mentioned 
 [here|https://issues.apache.org/jira/browse/HDFS-5645?focusedCommentId=13867530page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13867530]
  in HDFS-5645 NameNode restart will stop reading Edits at UpgradeMarker. But 
 this should happen only if namenode is rolling back from upgrade. 
 There is also a possibility that namenode could get restarted after rolling 
 upgrade started. This should not rollback everything.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-30 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887540#comment-13887540
 ] 

Vinay commented on HDFS-5754:
-

First of all, sorry for coming in late ( that too after commit )...

All Changes look good, except following one.

{noformat}+  if (DataNodeLayoutVersion.supports(
+  LayoutVersion.Feature.FEDERATION,
+  HdfsConstants.DATANODE_LAYOUT_VERSION)  
+  HdfsConstants.DATANODE_LAYOUT_VERSION == nsInfo.getLayoutVersion()) {
 readProperties(sd, nsInfo.getLayoutVersion());
 writeProperties(sd);
 LOG.info(Layout version rolled back to  +{noformat}

Here while rolling back and previous directory not exist, 
{{DATANODE_LAYOUT_VERSION}} is directly compared with 
{{NAMENODE_LAYOUT_VERSION}} to update the VERSION file.
But LayoutVersion is split because both these could be different in same 
version of hadoop right?

{noformat}+if (HdfsConstants.DATANODE_LAYOUT_VERSION != 
nsInfo.getLayoutVersion()) {
+  LOG.info(DataNode and NameNode layout versions are different: +
+ DataNode version: + HdfsConstants.DATANODE_LAYOUT_VERSION +
+ NameNode version: + nsInfo.getLayoutVersion());{noformat}
May be this log not required. Only below log is enough with condition removed.
{noformat}+if( HdfsConstants.DATANODE_LAYOUT_VERSION == 
nsInfo.getLayoutVersion())
+  LOG.info(Data-node version:  + HdfsConstants.DATANODE_LAYOUT_VERSION + 
+   and name-node layout version:  + nsInfo.getLayoutVersion());{noformat}

If any changes required I am ok to do in separate Jira instead of re-opening 
this.

 Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
 

 Key: HDFS-5754
 URL: https://issues.apache.org/jira/browse/HDFS-5754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Brandon Li
 Fix For: HDFS-5535 (Rolling upgrades)

 Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
 HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
 HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, 
 HDFS-5754.009.patch, HDFS-5754.010.patch, HDFS-5754.012.patch, 
 HDFS-5754.013.patch


 Currently, LayoutVersion defines the on-disk data format and supported 
 features of the entire cluster including NN and DNs.  LayoutVersion is 
 persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
 supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
 different LayoutVersion than NN cannot register with the NN.
 We propose to split LayoutVersion into two independent values that are local 
 to the nodes:
 - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
 the format of FSImage, editlog and the directory structure.
 - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
 the format of block data file, metadata file, block pool layout, and the 
 directory structure.  
 The LayoutVersion check will be removed in DN registration.  If 
 NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
 upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands

2014-01-29 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5702 started by Vinay.

 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
 ---

 Key: HDFS-5702
 URL: https://issues.apache.org/jira/browse/HDFS-5702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5702.patch, HDFS-5702.patch


 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands

2014-01-29 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5702:


Attachment: HDFS-5702.patch

Added mentioned 4 tests. 
Please review.
I couldn't avoid the long line in expected message as its necessary to compare 
the exact output.

 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
 ---

 Key: HDFS-5702
 URL: https://issues.apache.org/jira/browse/HDFS-5702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5702.patch, HDFS-5702.patch


 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885590#comment-13885590
 ] 

Vinay commented on HDFS-5585:
-

changes looks good kihwal.

some minor suggestions
you might want to add \n at the end of these lines for better looking,
bq. +String shutdownDatanode = -shutdownDatanode datanode_host:ipc_port 
\[upgrade\] +

bq. +String pingDatanode = -pingDatanode datanode_host:ipc_port +

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5586) Add quick-restart option for datanode

2014-01-29 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885593#comment-13885593
 ] 

Vinay commented on HDFS-5586:
-

I think this is being covered in HDFS-5585. 

Can we make it duplicate.?

 Add quick-restart option for datanode
 -

 Key: HDFS-5586
 URL: https://issues.apache.org/jira/browse/HDFS-5586
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee

 This feature, combined with the graceful shutdown feature, will enable data 
 nodes to come back up and start serving quickly.  This is likely a command 
 line option for data node, which triggers it to look for saved state 
 information in its local storage.  If the information is present and 
 reasonably up-to-date, data node would skip some of the startup steps.
 Ideally it should be able to do quick registration without requiring removal 
 of all blocks from the date node descriptor on the name node and 
 reconstructing it with the initial full block report. This implies that all 
 RBW blocks are recorded during shutdown and on start-up they are not turned 
 into RWR. Other than the quick registration, name node should treat the 
 restart as if few heart beats were lost from the node. There should be no 
 unexpected replica state changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5827) Children are not inheriting parent's default ACLs

2014-01-25 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HDFS-5827.
-

Resolution: Duplicate

This will be implemented in HDFS-5616
And tests are already added as part of HDFS-5702.
So closing this as duplicate.

Thanks Chris for the update

 Children are not inheriting parent's default ACLs
 -

 Key: HDFS-5827
 URL: https://issues.apache.org/jira/browse/HDFS-5827
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay
Assignee: Chris Nauroth

 Children are not inheriting the parent's default ACLs on creation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5832) Deadlock found in NN between SafeMode#canLeave and DatanodeManager#handleHeartbeat

2014-01-25 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5832:


Summary: Deadlock found in NN between SafeMode#canLeave and 
DatanodeManager#handleHeartbeat  (was: Deadeadlock found in NN between 
SafeMode#canLeave and DatanodeManager#handleHeartbeat)

 Deadlock found in NN between SafeMode#canLeave and 
 DatanodeManager#handleHeartbeat
 --

 Key: HDFS-5832
 URL: https://issues.apache.org/jira/browse/HDFS-5832
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Blocker
 Attachments: HDFS-5832.patch, jcarder_nn_deadlock.gif


 Found the deadlock during the Namenode startup. Attached jcarder report which 
 shows the cycles about the deadlock situation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5832) Deadlock found in NN between SafeMode#canLeave and DatanodeManager#handleHeartbeat

2014-01-25 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882189#comment-13882189
 ] 

Vinay commented on HDFS-5832:
-

As mentioned in HDFS-5132, 
Moving SafemodeMonitor#run() checks under fsn write lock, will solve the issue. 

1. handleHeartbeat() is always done under fsn readlock
2. incrementSafeBlockCount() and getNumLivedatanodes() will always will be 
called under writeLock().

By directly seeing the synchronization order it appears to be deadlock. But its 
avoided by the fsn lock.
 I think jcarder will not identify the read-write lock mechanism.

For this reason only I have made HDFS-5368 duplicate of HDFS-5132

 Deadlock found in NN between SafeMode#canLeave and 
 DatanodeManager#handleHeartbeat
 --

 Key: HDFS-5832
 URL: https://issues.apache.org/jira/browse/HDFS-5832
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Rakesh R
Assignee: Rakesh R
Priority: Blocker
 Attachments: HDFS-5832.patch, jcarder_nn_deadlock.gif


 Found the deadlock during the Namenode startup. Attached jcarder report which 
 shows the cycles about the deadlock situation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5827) Children are not inheriting parent's default ACLs

2014-01-24 Thread Vinay (JIRA)
Vinay created HDFS-5827:
---

 Summary: Children are not inheriting parent's default ACLs
 Key: HDFS-5827
 URL: https://issues.apache.org/jira/browse/HDFS-5827
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay


Children are not inheriting the parent's default ACLs on creation.

Following is the ACLs on parent and child in HDFS
{noformat}# file: /dir1
# owner: vinay
# group: supergroup
user::rwx
mask::r-x
other::r-x
default:user::rwx
default:user:charlie:r-x
default:group::r-x
default:group:admin:rwx
default:mask::rwx
default:other::r-x

# file: /dir1/dir2
# owner: vinay
# group: supergroup
user::rwx
group::r-x
other::r-x

# file: /dir1/file
# owner: vinay
# group: supergroup
user::rw-
group::r--
other::r--{noformat}


Following is the output in linux ACL
# file: testAcl
# owner: vinay
# group: users
user::rwx
user:vinay:r--
group::r-x
group:users:r-x
mask::r-x
other::r-x
default:user::rwx
default:user:vinay:r-x
default:group::r-x
default:group:users:rwx
default:mask::rwx
default:other::r-x

# file: testAcl/hello
# owner: vinay
# group: users
user::rw-
user:vinay:r-x  #effective:r--
group::r-x  #effective:r--
group:users:rwx #effective:rw-
mask::rw-
other::r--
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5827) Children are not inheriting parent's default ACLs

2014-01-24 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880940#comment-13880940
 ] 

Vinay commented on HDFS-5827:
-

Following is the ACLs on parent and child in HDFS
{noformat}# file: /dir1
# owner: vinay
# group: supergroup
user::rwx
mask::r-x
other::r-x
default:user::rwx
default:user:charlie:r-x
default:group::r-x
default:group:admin:rwx
default:mask::rwx
default:other::r-x

# file: /dir1/dir2
# owner: vinay
# group: supergroup
user::rwx
group::r-x
other::r-x

# file: /dir1/file
# owner: vinay
# group: supergroup
user::rw-
group::r--
other::r--{noformat}


Following is the output in linux ACL
{noformat}
# file: testAcl
# owner: vinay
# group: users
user::rwx
user:vinay:r--
group::r-x
group:users:r-x
mask::r-x
other::r-x
default:user::rwx
default:user:vinay:r-x
default:group::r-x
default:group:users:rwx
default:mask::rwx
default:other::r-x

# file: testAcl/hello
# owner: vinay
# group: users
user::rw-
user:vinay:r-x  #effective:r--
group::r-x  #effective:r--
group:users:rwx #effective:rw-
mask::rw-
other::r--
{noformat}

 Children are not inheriting parent's default ACLs
 -

 Key: HDFS-5827
 URL: https://issues.apache.org/jira/browse/HDFS-5827
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay

 Children are not inheriting the parent's default ACLs on creation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5827) Children are not inheriting parent's default ACLs

2014-01-24 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5827:


Description: 
Children are not inheriting the parent's default ACLs on creation.


  was:
Children are not inheriting the parent's default ACLs on creation.

Following is the ACLs on parent and child in HDFS
{noformat}# file: /dir1
# owner: vinay
# group: supergroup
user::rwx
mask::r-x
other::r-x
default:user::rwx
default:user:charlie:r-x
default:group::r-x
default:group:admin:rwx
default:mask::rwx
default:other::r-x

# file: /dir1/dir2
# owner: vinay
# group: supergroup
user::rwx
group::r-x
other::r-x

# file: /dir1/file
# owner: vinay
# group: supergroup
user::rw-
group::r--
other::r--{noformat}


Following is the output in linux ACL
# file: testAcl
# owner: vinay
# group: users
user::rwx
user:vinay:r--
group::r-x
group:users:r-x
mask::r-x
other::r-x
default:user::rwx
default:user:vinay:r-x
default:group::r-x
default:group:users:rwx
default:mask::rwx
default:other::r-x

# file: testAcl/hello
# owner: vinay
# group: users
user::rw-
user:vinay:r-x  #effective:r--
group::r-x  #effective:r--
group:users:rwx #effective:rw-
mask::rw-
other::r--
{noformat}


 Children are not inheriting parent's default ACLs
 -

 Key: HDFS-5827
 URL: https://issues.apache.org/jira/browse/HDFS-5827
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay

 Children are not inheriting the parent's default ACLs on creation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5827) Children are not inheriting parent's default ACLs

2014-01-24 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880942#comment-13880942
 ] 

Vinay commented on HDFS-5827:
-

Hi [~cnauroth], Could you take a look at this.
This I found while writing xml tests. 
Please correct me if I am wrong.

 Children are not inheriting parent's default ACLs
 -

 Key: HDFS-5827
 URL: https://issues.apache.org/jira/browse/HDFS-5827
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Vinay

 Children are not inheriting the parent's default ACLs on creation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands

2014-01-24 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5702:


Attachment: HDFS-5702.patch

Attaching the patch with some of the initial tests. Please review
Among added tests last 2 tests will fail due to  HDFS-5827.

Please suggest if some more tests needs to be added.

 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
 ---

 Key: HDFS-5702
 URL: https://issues.apache.org/jira/browse/HDFS-5702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5702.patch


 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-01-23 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13880721#comment-13880721
 ] 

Vinay commented on HDFS-5723:
-

bq. Hi Vinay, one question about the patch: so this inconsistent generation 
stamp can also be caused by a delayed block-received report? I.e., after the 
first close(), the DN's report gets delayed and is received by NN when the 
append starts. In that case, will we have any issue by wrongly putting the 
(block, DN) into the corruptBlockMap?
It can happen. Earlier block with prev genstamp will be marked corrupt, if 
append pipeline setup with genstamp update happened before previous report 
comes.
But append pipeline creation will update the genstamp in that datanode also, so 
one more block-received report is expectedn with correct genstamp. This time it 
will remove the old block from corrupt replica map.

 Append failed FINALIZED replica should not be accepted as valid when that 
 block is underconstruction
 

 Key: HDFS-5723
 URL: https://issues.apache.org/jira/browse/HDFS-5723
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5723.patch


 Scenario:
 1. 3 node cluster with 
 dfs.client.block.write.replace-datanode-on-failure.enable set to false.
 2. One file is written with 3 replicas, blk_id_gs1
 3. One of the datanode DN1 is down.
 4. File was opened with append and some more data is added to the file and 
 synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
 5. Now  DN1 restarted
 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
 be marked corrupted.
 but since NN having appended block state as UnderConstruction, at this time 
 its not detecting this block as corrupt and adding to valid block locations.
 As long as the namenode is alive, this datanode also will be considered as 
 valid replica and read/append will fail in that datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-22 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5728:


Attachment: HDFS-5728.patch

Thanks Kihwal for taking a look.
Attaching a patch by removing unnecessary lines as you suggested.
Please review.

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 0.23.10, 2.2.0
Reporter: Vinay
Assignee: Vinay
Priority: Critical
 Attachments: HDFS-5728.patch, HDFS-5728.patch, HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-19 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876178#comment-13876178
 ] 

Vinay commented on HDFS-5728:
-

bq. Logically we already truncated in memory by having the integrity check. 
There is no use of considering data more than crc bytes covered. And this 
truncation will not make recovery of block. This is just making crc and 
blockFile having same length (as data integrity expects). Recovery will make 
actual block file truncation upto where new length proposed for block recovery.
I agree Uma. I will try post new patch based on your input

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-19 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5728:


Attachment: HDFS-5728.patch

Attaching the updated patch as per Uma's comment
Please review

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5728.patch, HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5745) Unnecessary disk check triggered when socket operation has problem.

2014-01-09 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866443#comment-13866443
 ] 

Vinay commented on HDFS-5745:
-

HDFS-5503 also look similar, 
but with ClosedChannelException

 Unnecessary disk check triggered when socket operation has problem.
 ---

 Key: HDFS-5745
 URL: https://issues.apache.org/jira/browse/HDFS-5745
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 1.2.1
Reporter: MaoYuan Xian

 When BlockReceiver transfer data fails, it can be found SocketOutputStream 
 translates the exception as IOException with the message The stream is 
 closed:
 2014-01-06 11:48:04,716 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
 IOException in BlockReceiver.run():
 java.io.IOException: The stream is closed
 at org.apache.hadoop.net.SocketOutputStream.write
 at java.io.BufferedOutputStream.flushBuffer
 at java.io.BufferedOutputStream.flush
 at java.io.DataOutputStream.flush
 at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run
 at java.lang.Thread.run
 Which makes the checkDiskError method of DataNode called and triggers the 
 disk scan.
 Can we make the modifications like below in checkDiskError to avoiding this 
 unneccessary disk scan operations?:
 {code}
 --- a/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java
 +++ b/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java
 @@ -938,7 +938,8 @@ public class DataNode extends Configured
   || e.getMessage().startsWith(An established connection was 
 aborted)
   || e.getMessage().startsWith(Broken pipe)
   || e.getMessage().startsWith(Connection reset)
 - || e.getMessage().contains(java.nio.channels.SocketChannel)) {
 + || e.getMessage().contains(java.nio.channels.SocketChannel)
 + || e.getMessage().startsWith(The stream is closed)) {
LOG.info(Not checking disk as checkDiskError was called on a network 
 +
   related exception); 
return;
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-09 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867487#comment-13867487
 ] 

Vinay commented on HDFS-5728:
-

bq. Is this case happened only if we restart DN where crc has less data?
Yes
bq. as we convert all RBW replica states to RWR and here length will be 
calculated based on crc chunks. If that is the case, how about just setting the 
file length also to same after creating RWR state?
I too thought of same thing. That will be a implicit truncation without 
recovery being called. But I felt better we come through recovery flow itself 
and do truncation only on demand

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-01-08 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5723:


Assignee: Vinay
  Status: Patch Available  (was: Open)

 Append failed FINALIZED replica should not be accepted as valid when that 
 block is underconstruction
 

 Key: HDFS-5723
 URL: https://issues.apache.org/jira/browse/HDFS-5723
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5723.patch


 Scenario:
 1. 3 node cluster with 
 dfs.client.block.write.replace-datanode-on-failure.enable set to false.
 2. One file is written with 3 replicas, blk_id_gs1
 3. One of the datanode DN1 is down.
 4. File was opened with append and some more data is added to the file and 
 synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
 5. Now  DN1 restarted
 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
 be marked corrupted.
 but since NN having appended block state as UnderConstruction, at this time 
 its not detecting this block as corrupt and adding to valid block locations.
 As long as the namenode is alive, this datanode also will be considered as 
 valid replica and read/append will fail in that datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-01-08 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5723:


Attachment: HDFS-5723.patch

Attached the patch, Please review

 Append failed FINALIZED replica should not be accepted as valid when that 
 block is underconstruction
 

 Key: HDFS-5723
 URL: https://issues.apache.org/jira/browse/HDFS-5723
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Vinay
 Attachments: HDFS-5723.patch


 Scenario:
 1. 3 node cluster with 
 dfs.client.block.write.replace-datanode-on-failure.enable set to false.
 2. One file is written with 3 replicas, blk_id_gs1
 3. One of the datanode DN1 is down.
 4. File was opened with append and some more data is added to the file and 
 synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
 5. Now  DN1 restarted
 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
 be marked corrupted.
 but since NN having appended block state as UnderConstruction, at this time 
 its not detecting this block as corrupt and adding to valid block locations.
 As long as the namenode is alive, this datanode also will be considered as 
 valid replica and read/append will fail in that datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-01-08 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866350#comment-13866350
 ] 

Vinay commented on HDFS-5535:
-

Very good work. 

From the doc what I can understand is,
 Rolling upgrades are only possible from the versions which contains this 
feature, to future versions, *but not from the already released versions.*

Some doubts:
bq. The minor releases are for introducing features. Both these MUST NOT 
introduce incompatible changes.
This means even changing the internal protocols also breaks the compatibility. 
In that case have to wait for major releases..? or add overloaded apis..?

bq. This will allow an 8600 node cluster to complete in 24 hours.
I dint understand this calculation. :-(


 Umbrella jira for improved HDFS rolling upgrades
 

 Key: HDFS-5535
 URL: https://issues.apache.org/jira/browse/HDFS-5535
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, ha, hdfs-client, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Nathan Roberts
 Attachments: HDFSRollingUpgradesHighLevelDesign.pdf


 In order to roll a new HDFS release through a large cluster quickly and 
 safely, a few enhancements are needed in HDFS. An initial High level design 
 document will be attached to this jira, and sub-jiras will itemize the 
 individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-07 Thread Vinay (JIRA)
Vinay created HDFS-5728:
---

 Summary: [Diskfull] Block recovery will fail if the metafile not 
having crc for all chunks of the block
 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay


1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not 
one time upload, data will be written slowly.
2. One of the DataNode got diskfull ( due to some other data filled up disks)
3. Unfortunately block was being written to only this datanode in cluster, so 
client write has also failed.

4. After some time disk is made free and all processes are restarted.
5. Now HMaster try to recover the file by calling recoverLease. 
At this time recovery was failing saying file length mismatch.

When checked,
 actual block file length: 62484480
 Calculated block length: 62455808

This was because, metafile was having crc for only 62455808 bytes, and it 
considered 62455808 as the block size.

No matter how many times, recovery was continously failing.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-07 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865059#comment-13865059
 ] 

Vinay commented on HDFS-5728:
-

2013-12-28 13:22:30,467 WARN 
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to 
updateBlock 
(newblock=BP-720706819-x-1389113739092:blk_5575900364052391670_517444, 
datanode=tmm-e8:11242)
java.io.IOException: File length mismatched.  The length of 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670
 is 62484480 but r=ReplicaUnderRecovery, blk_5575900364052391670_320295, RUR
  getNumBytes() = 62455808
  getBytesOnDisk()  = 62455808
  getVisibleLength()= -1
  getVolume()   = 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current
  getBlockFile()= 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670
  recoveryId=517444
  original=ReplicaWaitingToBeRecovered, blk_5575900364052391670_320295, RWR
  getNumBytes() = 62455808
  getBytesOnDisk()  = 62455808
  getVisibleLength()= -1
  getVolume()   = 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current
  getBlockFile()= 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670
  unlinked=false
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkReplicaFiles(FsDatasetImpl.java:1063)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.updateReplicaUnderRecovery(FsDatasetImpl.java:1541)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.updateReplicaUnderRecovery(DataNode.java:1907)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$BlockRecord.updateReplicaUnderRecovery(DataNode.java:1938)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:2090)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1988)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:225)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$2.run(DataNode.java:1869)

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay

 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5728:


Status: Patch Available  (was: Open)

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

2014-01-07 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5728:


Attachment: HDFS-5728.patch

Attached the patch, please review

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-01-06 Thread Vinay (JIRA)
Vinay created HDFS-5723:
---

 Summary: Append failed FINALIZED replica should not be accepted as 
valid when that block is underconstruction
 Key: HDFS-5723
 URL: https://issues.apache.org/jira/browse/HDFS-5723
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Vinay


Scenario:
1. 3 node cluster with 
dfs.client.block.write.replace-datanode-on-failure.enable set to false.
2. One file is written with 3 replicas, blk_id_gs1
3. One of the datanode DN1 is down.
4. File was opened with append and some more data is added to the file and 
synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
5. Now  DN1 restarted
6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
be made marked corrupted.
but since NN having appended block state as UnderConstruction, at this time its 
not detecting this block as corrupt and adding to valid block locations.

As long as the namenode is alive, this datanode also will be considered as 
valid replica and read/append will fail in that datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-01-06 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863898#comment-13863898
 ] 

Vinay commented on HDFS-5723:
-

{noformat}2014-01-07 09:47:22,878 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: ip:25012:DataXceiver error 
processing WRITE_BLOCK operation  src: /ip:56873 dest: /ip2:25012
org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Cannot append 
to a non-existent replica BP-1746676845-ip-1388725564463:blk_1073742062_1268
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getReplicaInfo(FsDatasetImpl.java:372)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:507)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.append(FsDatasetImpl.java:93)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:200)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:457)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:115)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662){noformat}


{noformat}2014-01-07 09:47:46,773 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: ip:25012:DataXceiver error 
processing READ_BLOCK operation  src: /ip:35125 dest: /ip:25012
java.io.IOException: Replica gen stamp  block genstamp, 
block=BP-1746676845-ip-1388725564463:blk_1073742062_1270, 
replica=FinalizedReplica, blk_1073742062_1266, FINALIZED
  getNumBytes() = 6
  getBytesOnDisk()  = 6
  getVisibleLength()= 6
  getVolume()   = /home/vinay/hadoop/dfs/data/current
  getBlockFile()= 
/home/vinay/hadoop/dfs/data/current/BP-1746676845-ip-1388725564463/current/finalized/blk_1073742062
  unlinked  =false
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:247)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:328)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662){noformat}

 Append failed FINALIZED replica should not be accepted as valid when that 
 block is underconstruction
 

 Key: HDFS-5723
 URL: https://issues.apache.org/jira/browse/HDFS-5723
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Vinay

 Scenario:
 1. 3 node cluster with 
 dfs.client.block.write.replace-datanode-on-failure.enable set to false.
 2. One file is written with 3 replicas, blk_id_gs1
 3. One of the datanode DN1 is down.
 4. File was opened with append and some more data is added to the file and 
 synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
 5. Now  DN1 restarted
 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
 be made marked corrupted.
 but since NN having appended block state as UnderConstruction, at this time 
 its not detecting this block as corrupt and adding to valid block locations.
 As long as the namenode is alive, this datanode also will be considered as 
 valid replica and read/append will fail in that datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5723) Append failed FINALIZED replica should not be accepted as valid when that block is underconstruction

2014-01-06 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5723:


Description: 
Scenario:
1. 3 node cluster with 
dfs.client.block.write.replace-datanode-on-failure.enable set to false.
2. One file is written with 3 replicas, blk_id_gs1
3. One of the datanode DN1 is down.
4. File was opened with append and some more data is added to the file and 
synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
5. Now  DN1 restarted
6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
be marked corrupted.
but since NN having appended block state as UnderConstruction, at this time its 
not detecting this block as corrupt and adding to valid block locations.

As long as the namenode is alive, this datanode also will be considered as 
valid replica and read/append will fail in that datanode.

  was:
Scenario:
1. 3 node cluster with 
dfs.client.block.write.replace-datanode-on-failure.enable set to false.
2. One file is written with 3 replicas, blk_id_gs1
3. One of the datanode DN1 is down.
4. File was opened with append and some more data is added to the file and 
synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
5. Now  DN1 restarted
6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
be made marked corrupted.
but since NN having appended block state as UnderConstruction, at this time its 
not detecting this block as corrupt and adding to valid block locations.

As long as the namenode is alive, this datanode also will be considered as 
valid replica and read/append will fail in that datanode.


 Append failed FINALIZED replica should not be accepted as valid when that 
 block is underconstruction
 

 Key: HDFS-5723
 URL: https://issues.apache.org/jira/browse/HDFS-5723
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: Vinay

 Scenario:
 1. 3 node cluster with 
 dfs.client.block.write.replace-datanode-on-failure.enable set to false.
 2. One file is written with 3 replicas, blk_id_gs1
 3. One of the datanode DN1 is down.
 4. File was opened with append and some more data is added to the file and 
 synced. (to only 2 live nodes DN2 and DN3)-- blk_id_gs2
 5. Now  DN1 restarted
 6. In this block report, DN1 reported FINALIZED block blk_id_gs1, this should 
 be marked corrupted.
 but since NN having appended block state as UnderConstruction, at this time 
 its not detecting this block as corrupt and adding to valid block locations.
 As long as the namenode is alive, this datanode also will be considered as 
 valid replica and read/append will fail in that datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands

2013-12-26 Thread Vinay (JIRA)
Vinay created HDFS-5702:
---

 Summary: FsShell Cli: Add XML based End-to-End test for getfacl 
and setfacl commands
 Key: HDFS-5702
 URL: https://issues.apache.org/jira/browse/HDFS-5702
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Vinay
Assignee: Vinay


FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage

2013-12-24 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856308#comment-13856308
 ] 

Vinay commented on HDFS-5698:
-

Thanks for filing this Jira [~wheat9]. Looks like its going to make 
maintainance of fsimage very easy.

 Use protobuf to serialize / deserialize FSImage
 ---

 Key: HDFS-5698
 URL: https://issues.apache.org/jira/browse/HDFS-5698
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai

 Currently, the code serializes FSImage using in-house serialization 
 mechanisms. There are a couple disadvantages of the current approach:
 # Mixing the responsibility of reconstruction and serialization / 
 deserialization. The current code paths of serialization / deserialization 
 have spent a lot of effort on maintaining compatibility. What is worse is 
 that they are mixed with the complex logic of reconstructing the namespace, 
 making the code difficult to follow.
 # Poor documentation of the current FSImage format. The format of the FSImage 
 is practically defined by the implementation. An bug in implementation means 
 a bug in the specification. Furthermore, it also makes writing third-party 
 tools quite difficult.
 # Changing schemas is non-trivial. Adding a field in FSImage requires bumping 
 the layout version every time. Bumping out layout version requires (1) the 
 users to explicitly upgrade the clusters, and (2) putting new code to 
 maintain backward compatibility.
 This jira proposes to use protobuf to serialize the FSImage. Protobuf has 
 been used to serialize / deserialize the RPC message in Hadoop.
 Protobuf addresses all the above problems. It clearly separates the 
 responsibility of serialization and reconstructing the namespace. The 
 protobuf files document the current format of the FSImage. The developers now 
 can add optional fields with ease, since the old code can always read the new 
 FSImage.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5600) FsShell CLI: add getfacl and setfacl with minimal support for getting and setting ACLs.

2013-12-24 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5600:


Attachment: HDFS-5600.patch

Attaching the updated patch.
Addressed Uma's comment.
Updated stickybit printing based on OTHER's permission.

 FsShell CLI: add getfacl and setfacl with minimal support for getting and 
 setting ACLs.
 ---

 Key: HDFS-5600
 URL: https://issues.apache.org/jira/browse/HDFS-5600
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Vinay
 Attachments: HDFS-5600.patch, HDFS-5600.patch, HDFS-5600.patch


 Implement and test FsShell CLI commands for getfacl and setfacl.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5600) FsShell CLI: add getfacl and setfacl with minimal support for getting and setting ACLs.

2013-12-21 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5600:


Attachment: HDFS-5600.patch

Thanks Chris for the detailed review and comments.

I have tried to address all your comments.

Regarding XML based tests, started working on it. Will update in a day or two 
as I am on travel as of now.

please have a look at updated patch. Thanks again

 FsShell CLI: add getfacl and setfacl with minimal support for getting and 
 setting ACLs.
 ---

 Key: HDFS-5600
 URL: https://issues.apache.org/jira/browse/HDFS-5600
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Vinay
 Attachments: HDFS-5600.patch, HDFS-5600.patch


 Implement and test FsShell CLI commands for getfacl and setfacl.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848939#comment-13848939
 ] 

Vinay commented on HDFS-5496:
-

{quote}How about changing this to
{code}if (!isInSafeMode() || ((isInSafeMode()  
safeMode.isPopulatingReplQueues())  haEnabled)){code}
I.e., in non-HA setup, maybe we do not need to restart the processing since the 
NN already loads all the editlog before entering safemode?
And in checkMode(), can we change
{code}  if (canInitializeReplQueues()  !isPopulatingReplQueues()) {
initializeReplQueues();
  }{code}
to
  {code}if (canInitializeReplQueues()  !isPopulatingReplQueues()  
!haEnabled) {
initializeReplQueues();
  }{code}
because in HA setup we will run processMisReplicateBlocks in 
startActiveService.{quote}

I was thinking about this again. I still have some doubts.
Above change will avoid reprocessing in NonHA.
But in HA, if startActiveServices() is called before any safemode reaches 
threshold, then following check will fail and skip call to initialize queues.
{code}if (!isInSafeMode() || ((isInSafeMode()  
safeMode.isPopulatingReplQueues())  haEnabled)){code} 
And in safemode#checkMode() also initialization will be skipped because of 
haEnabled check.
{code}if (!isInSafeMode() || ((isInSafeMode()  
safeMode.isPopulatingReplQueues())  haEnabled)){code}
So it can completely miss initialization of replication queues itself. 

Am I missing something..?


 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5496:


Attachment: HDFS-5496.patch

Attaching the patch with below changes.
1. Included reprocessing from startActiveServices() in case of HA.
2. Moved initializeReplQueues outside of SafeModeInfo
3. Reverted the last change about POSTPONE blocks, as reprocessing may move all 
blocks to postpone list. Now it will move only overreplicated blocks.


 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848962#comment-13848962
 ] 

Vinay commented on HDFS-5496:
-

Also added a method to get the progress of initialization. Which can be used 
later to show in UI.

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5496:


Attachment: HDFS-5496.patch

Added missed change

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
 HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5592) DIR* completeFile: /file is closed by DFSClient_ should be logged only for successful closure of the file.

2013-12-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848986#comment-13848986
 ] 

Vinay commented on HDFS-5592:
-

Thanks Uma

 DIR* completeFile: /file is closed by DFSClient_ should be logged only for 
 successful closure of the file.
 

 Key: HDFS-5592
 URL: https://issues.apache.org/jira/browse/HDFS-5592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Vinay
Assignee: Vinay
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5592.patch


 Following log message in {{FSNameSystem#completeFile(..)}} should be logged 
 only if the file is closed.
 {code}getEditLog().logSync();
 NameNode.stateChangeLog.info(DIR* completeFile:  + src +  is closed by 
 
 + holder);
 return success;{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849961#comment-13849961
 ] 

Vinay commented on HDFS-5496:
-

Thanks for the explanation Jing. I knew I missed something. Now came to know I 
missed SafeMode#leave.
I will arrange one more patch. 

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
 HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-16 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5496:


Attachment: HDFS-5496.patch

Here is updated patch

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch, HDFS-5496.patch, 
 HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (HDFS-5638) FileContext API for ACLs.

2013-12-15 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay reassigned HDFS-5638:
---

Assignee: Vinay

 FileContext API for ACLs.
 -

 Key: HDFS-5638
 URL: https://issues.apache.org/jira/browse/HDFS-5638
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Vinay
 Attachments: HDFS-5638.patch


 Add new methods to {{AbstractFileSystem}} and {{FileContext}} for 
 manipulating ACLs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message

2013-12-14 Thread Vinay (JIRA)
Vinay created HDFS-5669:
---

 Summary: Storage#tryLock() should check for null before logging 
successfull message
 Key: HDFS-5669
 URL: https://issues.apache.org/jira/browse/HDFS-5669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinay
Assignee: Vinay


In the following code in Storage#tryLock(), there is a possibility that {{ 
file.getChannel().tryLock();}} returns null if the lock is acquired by some 
other process. In that case even though return value is null, a successfull 
message confuses.
{code}try {
res = file.getChannel().tryLock();
file.write(jvmName.getBytes(Charsets.UTF_8));
LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
  } catch(OverlappingFileLockException oe) {{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message

2013-12-14 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5669:


Affects Version/s: 2.2.0
   Status: Patch Available  (was: Open)

 Storage#tryLock() should check for null before logging successfull message
 --

 Key: HDFS-5669
 URL: https://issues.apache.org/jira/browse/HDFS-5669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5669.patch


 In the following code in Storage#tryLock(), there is a possibility that {{ 
 file.getChannel().tryLock();}} returns null if the lock is acquired by some 
 other process. In that case even though return value is null, a successfull 
 message confuses.
 {code}try {
 res = file.getChannel().tryLock();
 file.write(jvmName.getBytes(Charsets.UTF_8));
 LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
   } catch(OverlappingFileLockException oe) {{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message

2013-12-14 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5669:


Attachment: HDFS-5669.patch

 Storage#tryLock() should check for null before logging successfull message
 --

 Key: HDFS-5669
 URL: https://issues.apache.org/jira/browse/HDFS-5669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5669.patch


 In the following code in Storage#tryLock(), there is a possibility that {{ 
 file.getChannel().tryLock();}} returns null if the lock is acquired by some 
 other process. In that case even though return value is null, a successfull 
 message confuses.
 {code}try {
 res = file.getChannel().tryLock();
 file.write(jvmName.getBytes(Charsets.UTF_8));
 LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
   } catch(OverlappingFileLockException oe) {{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5669) Storage#tryLock() should check for null before logging successfull message

2013-12-14 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5669:


Description: 
In the following code in Storage#tryLock(), there is a possibility that 
{{file.getChannel().tryLock()}} returns null if the lock is acquired by some 
other process. In that case even though return value is null, a successfull 
message confuses.
{code}try {
res = file.getChannel().tryLock();
file.write(jvmName.getBytes(Charsets.UTF_8));
LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
  } catch(OverlappingFileLockException oe) {{code}

  was:
In the following code in Storage#tryLock(), there is a possibility that {{ 
file.getChannel().tryLock();}} returns null if the lock is acquired by some 
other process. In that case even though return value is null, a successfull 
message confuses.
{code}try {
res = file.getChannel().tryLock();
file.write(jvmName.getBytes(Charsets.UTF_8));
LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
  } catch(OverlappingFileLockException oe) {{code}


 Storage#tryLock() should check for null before logging successfull message
 --

 Key: HDFS-5669
 URL: https://issues.apache.org/jira/browse/HDFS-5669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5669.patch


 In the following code in Storage#tryLock(), there is a possibility that 
 {{file.getChannel().tryLock()}} returns null if the lock is acquired by some 
 other process. In that case even though return value is null, a successfull 
 message confuses.
 {code}try {
 res = file.getChannel().tryLock();
 file.write(jvmName.getBytes(Charsets.UTF_8));
 LOG.info(Lock on  + lockF +  acquired by nodename  + jvmName);
   } catch(OverlappingFileLockException oe) {{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846178#comment-13846178
 ] 

Vinay commented on HDFS-5496:
-

bq. The following change would have been fine if leaving safe mode and 
initializing replication queues were synchronized. It appears checkMode() can 
start a background initialization before leaving the safe mode. Since the 
queues are unconditionally cleared right before the following, an on-going 
initialization should be stopped and redone.
If I understand correctly, Need to restart initializing queues right?

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847026#comment-13847026
 ] 

Vinay commented on HDFS-5496:
-

Hi Jing,
I think that would work.
With that change, multiple initializations will not happen in Non-HA mode.
In HA setup, if the safemode is in extenstion after calling 
{{initializeReplQueues()}} by the time of {{startActiveServices()}}, 
re-initialization will be called.

Now my only question is, at this time, do we need to restart initialization or 
continue if any ongoing initialization..?
I feel its ok to continue. Any thoughts..?

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847049#comment-13847049
 ] 

Vinay commented on HDFS-5496:
-

I think, block processing will  be postponed, not put into invalid queue. 
Because of the below change in the patch
{code}+// postpone making any decision with stale replicas
+if (numCurrentReplica  expectedReplication
+ num.replicasOnStaleNodes()  0) {
+  // If any of the replicas of this block are on nodes that are
+  // considered stale, then these replicas may in fact have
+  // already been deleted. So, we cannot safely act on the
+  // over-replication until a later point in time, when
+  // the stale nodes have block reported.
+  return MisReplicationResult.POSTPONE;
+}{code}

right?

In that case, better to re-initialize?

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-12 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847063#comment-13847063
 ] 

Vinay commented on HDFS-5496:
-

bq. So for that part of the change, can we make sure that 
markAllDatanodesStale() is called before calling processMisReplicateBlocks? 
Currently in {{startActiveServices()}},  markAllDatanodesStale() is called 
before processMisReplicateBlocks(). Also queues are cleared unconditionally at 
this time which destroys the result of ongoing initialization.
{code}blockManager.setPostponeBlocksFromFuture(false);
blockManager.getDatanodeManager().markAllDatanodesStale();
blockManager.clearQueues();
blockManager.processAllPendingDNMessages();{code}
So that means if we continue with the ongoing initialization, all further 
blocks will be added to postponed list only. right?
Something we are missing here?

 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
Assignee: Vinay
 Attachments: HDFS-5496.patch, HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5600) FsShell CLI: add getfacl and setfacl with minimal support for getting and setting ACLs.

2013-12-12 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5600:


Attachment: HDFS-5600.patch

Attaching a patch for FsShell CLI implementations.
Tests needs to be updated after namenode implementations.

 FsShell CLI: add getfacl and setfacl with minimal support for getting and 
 setting ACLs.
 ---

 Key: HDFS-5600
 URL: https://issues.apache.org/jira/browse/HDFS-5600
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: tools
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Vinay
 Attachments: HDFS-5600.patch


 Implement and test FsShell CLI commands for getfacl and setfacl.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (HDFS-5638) FileContext API for ACLs.

2013-12-12 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5638:


Attachment: HDFS-5638.patch

Attaching a patch to add ACL apis to FileContext and AbstractFileSystem

 FileContext API for ACLs.
 -

 Key: HDFS-5638
 URL: https://issues.apache.org/jira/browse/HDFS-5638
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
 Attachments: HDFS-5638.patch


 Add new methods to {{AbstractFileSystem}} and {{FileContext}} for 
 manipulating ACLs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous

2013-12-10 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845046#comment-13845046
 ] 

Vinay commented on HDFS-5496:
-

bq. For (4), looks like currently we only retrieve metrics information from 
postponedMisreplicatedBlocks and we always check if the corresponding DNs are 
still stale before we make INVALIDATE decision. Thus it should be safe if we 
delay its initialization. 
For this I am trying make some changes in the patch. Hope next patch will 
include this.
bq. For (2), currently we add under-replicated blocks into neededReplications 
when 1) initially populating the replication queue, 2) checking replication 
when finalizing an under-construction file, 3) checking replication progress 
for decommissioning DN, and 4) pending replicas timeout. Delaying 1) and making 
it happen in parallel with 2)~4) should also be safe.
I guess this already in place. i.e. UnderReplicated Blocks are not added to 
neededReplications in {{processMisReplicatedBlock(..)}}.
{code}if (!block.isComplete()) {
  // Incomplete blocks are never considered mis-replicated --
  // they'll be reached when they are completed or recovered.
  return MisReplicationResult.UNDER_CONSTRUCTION;
}{code}
bq. For the current patch, I understand we need a new iterator that can iterate 
the blocksMap and not throw exception when concurrent modifications happen. 
However, I guess we may only need to define a new iterator and do not need to 
define the new BlocksMapGSet here. Also, since the new iterator shares most of 
the code with the existing LightWeightGSet#SetIterator, maybe we can simply 
extend SetIterator here?
Yes. Sure. 
bq. So for case 3, in non-HA setup, I think maybe we do not need to restart the 
processing since there should not be any pending editlog for NN to process in 
startActiveService? In HA setup, since we can always run 
processMisReplicateBlocks in startActiveService, we actually do not need to 
populate replication queue while still in safemode? If we're able to make these 
two changes, for the current patch, we do not need to worry about some 
already-running replication initializing thread.
This can be done.  do not need to worry about  already-running replication 
initializing  means just return the call if already initialization is in 
progress?


 Make replication queue initialization asynchronous
 --

 Key: HDFS-5496
 URL: https://issues.apache.org/jira/browse/HDFS-5496
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Kihwal Lee
 Attachments: HDFS-5496.patch


 Today, initialization of replication queues blocks safe mode exit and certain 
 HA state transitions. For a big name space, this can take hundreds of seconds 
 with the FSNamesystem write lock held.  During this time, important requests 
 (e.g. initial block reports, heartbeat, etc) are blocked.
 The effect of delaying the initialization would be not starting replication 
 right away, but I think the benefit outweighs. If we make it asynchronous, 
 the work per iteration should be limited, so that the lock duration is 
 capped. 
 If full/incremental block reports and any other requests that modifies block 
 state properly performs replication checks while the blocks are scanned and 
 the queues populated in background, every block will be processed. (Some may 
 be done twice)  The replication monitor should run even before all blocks are 
 processed.
 This will allow namenode to exit safe mode and start serving immediately even 
 with a big name space. It will also reduce the HA failover latency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   3   4   5   6   >