[jira] [Resolved] (HDFS-6495) In some case, the hedged read will lead to client infinite wait.

2014-06-06 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu resolved HDFS-6495.
---

Resolution: Duplicate

Duplicate with HDFS-6494

 In some case, the  hedged read will lead to client  infinite wait.
 --

 Key: HDFS-6495
 URL: https://issues.apache.org/jira/browse/HDFS-6495
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: LiuLei

 When I use hedged read, If there is only one live datanode, the reading 
 from  the datanode throw TimeoutException and ChecksumException., the Client 
 will infinite wait.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6496) WebHDFS cannot open file

2014-06-06 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6496:
-

 Summary: WebHDFS cannot open file
 Key: HDFS-6496
 URL: https://issues.apache.org/jira/browse/HDFS-6496
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu


WebHDFS cannot open the file on the name node web UI. I attched screen.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6338) Add a RPC to allow administrator to delete file lease.

2014-05-04 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6338:
-

 Summary: Add a RPC to allow administrator to delete file lease.
 Key: HDFS-6338
 URL: https://issues.apache.org/jira/browse/HDFS-6338
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor


we have to wait file lease expire after unexpected interrupt during HDFS 
writing. so I want to add a RPC method to allow administrator delete the file 
lease.

Please leave comments here, I am workong on the patch now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu reopened HDFS-6299:
---


I reopened this issue, because I found there are more than two issues after I 
review.

 Protobuf for XAttr and client-side implementation 
 --

 Key: HDFS-6299
 URL: https://issues.apache.org/jira/browse/HDFS-6299
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6299.patch


 This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
 in DistributedFilesystem and DFSClient. 
 With this JIRA we may just keep the dummy  implemenation for Xattr API of 
 ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HDFS-6299) Protobuf for XAttr and client-side implementation

2014-04-30 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu reopened HDFS-6299:
---


It cannot be closed, I still reopen it. and the committed should be reverted. 
and revise these comments here.

 Protobuf for XAttr and client-side implementation 
 --

 Key: HDFS-6299
 URL: https://issues.apache.org/jira/browse/HDFS-6299
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode
Affects Versions: HDFS XAttrs (HDFS-2006)
Reporter: Uma Maheswara Rao G
Assignee: Yi Liu
 Fix For: HDFS XAttrs (HDFS-2006)

 Attachments: HDFS-6299.patch


 This JIRA tracks Protobuf for XAttr and implementation for XAttr interfaces 
 in DistributedFilesystem and DFSClient. 
 With this JIRA we may just keep the dummy  implemenation for Xattr API of 
 ClientProtocol in NameNodeRpcServer



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6318) refreshServiceAcl cannot affect both active NN and standby NN

2014-04-30 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6318:
-

 Summary: refreshServiceAcl cannot affect both active NN and 
standby NN
 Key: HDFS-6318
 URL: https://issues.apache.org/jira/browse/HDFS-6318
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu


refreshServiceAcl cannot affect both active NN and standby NN, it only select 
one NN to reload the ACL configuration. but we should reload Acl on both active 
NN and standby NN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6267) Upgrade Jetty6 to Jetty9

2014-04-21 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6267:
-

 Summary: Upgrade Jetty6 to Jetty9
 Key: HDFS-6267
 URL: https://issues.apache.org/jira/browse/HDFS-6267
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 3.0.0
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor


Jetty stable version is 9.x, but it requires Java7, so I want to target 3.0 for 
this upgrade.

Jetty9 is incompatible with Jetty6, so the patch will be big.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6252) Namenode old webUI should be deprecated

2014-04-16 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6252:
-

 Summary: Namenode old webUI should be deprecated
 Key: HDFS-6252
 URL: https://issues.apache.org/jira/browse/HDFS-6252
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.5.0
Reporter: Fengdong Yu
Priority: Minor


We've deprecated hftp and hsftp in HDFS-5570, so if we always download file 
from download this file on the browseDirectory.jsp, it will throw an error:

Problem accessing /streamFile/***

because streamFile servlet was deleted in HDFS-5570.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6137) Datanode cannot rollback because LayoutVersion incorrect

2014-03-21 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6137:
-

 Summary: Datanode cannot rollback because LayoutVersion incorrect
 Key: HDFS-6137
 URL: https://issues.apache.org/jira/browse/HDFS-6137
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Fengdong Yu






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6130) NPE during upgrade using trunk after RU merged

2014-03-20 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6130:
-

 Summary: NPE during upgrade using trunk after RU merged
 Key: HDFS-6130
 URL: https://issues.apache.org/jira/browse/HDFS-6130
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.0
Reporter: Fengdong Yu


I want upgrade an old cluster(0.20.2-cdh3u1) to trunk instance, 

I can upgrade successfully if I don't configurage HA, but if HA enabled,
there is NPE when I run ' hdfs namenode -initializeSharedEdits'

{code}
14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
14/03/20 15:06:41 INFO namenode.FSNamesystem: Retry cache will use 0.03 of 
total heap and retry cache entry expiry time is 60 millis
14/03/20 15:06:41 INFO util.GSet: Computing capacity for map NameNodeRetryCache
14/03/20 15:06:41 INFO util.GSet: VM type   = 64-bit
14/03/20 15:06:41 INFO util.GSet: 0.02999329447746% max memory 896 MB = 
275.3 KB
14/03/20 15:06:41 INFO util.GSet: capacity  = 2^15 = 32768 entries
14/03/20 15:06:41 INFO namenode.AclConfigFlag: ACLs enabled? false
14/03/20 15:06:41 INFO common.Storage: Lock on 
/data/hadoop/data1/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176
14/03/20 15:06:42 INFO common.Storage: Lock on 
/data/hadoop/data2/dfs/name/in_use.lock acquired by nodename 7326@10-150-170-176
14/03/20 15:06:42 INFO namenode.FSImage: No edit log streams selected.
14/03/20 15:06:42 INFO namenode.FSImageFormatPBINode: Loading 1 INodes.
14/03/20 15:06:42 FATAL namenode.NameNode: Exception in namenode join
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.isReservedName(FSDirectory.java:2984)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.addToParent(FSImageFormatPBINode.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode$Loader.loadINodeDirectorySection(FSImageFormatPBINode.java:162)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.loadInternal(FSImageFormatProtobuf.java:243)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf$Loader.load(FSImageFormatProtobuf.java:168)
at 
org.apache.hadoop.hdfs.server.namenode.FSImageFormat$LoaderDelegator.load(FSImageFormat.java:120)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:895)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:881)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:704)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:271)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initializeSharedEdits(NameNode.java:912)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1276)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
14/03/20 15:06:42 INFO util.ExitUtil: Exiting with status 1
14/03/20 15:06:42 INFO namenode.NameNode: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down NameNode at 10-150-170-176/10.150.170.176
/
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6113) Rolling upgrae exception

2014-03-17 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-6113:
-

 Summary: Rolling upgrae exception
 Key: HDFS-6113
 URL: https://issues.apache.org/jira/browse/HDFS-6113
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Fengdong Yu


I've a hadoop-2.3 running non-securable on the cluster. then I built a trunk 
instance, also non securable.

NN1 - active
NN2 - standby
DN1 - datanode 
DN2 - datanode
JN1,JN2,JN3 - Journal and ZK

then on the NN2:
{code}
hadoop-dameon.sh stop namenode
hadoop-dameon.sh stop zkfc
{code}

then:
change the environment variables to the new hadoop.(trunk version)

then:

{code}
hadoop-dameon.sh start namenode
{code}

NN2 throws exception:
{code}
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Could not journal CTime 
for one more JournalNodes. 1 exceptions thrown:
10.100.91.33:8485: Failed on local exception: java.io.EOFException; Host 
Details : local host is: 10-204-8-136/10.204.8.136; destination host is: 
jn33.com:8485;
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223)
at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getJournalCTime(QuorumJournalManager.java:631)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.getSharedLogCTime(FSEditLog.java:1383)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:738)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:600)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:360)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:258)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:653)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:444)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:500)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:656)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:641)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1294)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1360)
{code}


JN throws Exception:
{code}
2014-03-18 12:19:01,960 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 8485: readAndProcess threw exception java.io.IOException: Unable to read 
authentication method from client 10.204.8.136. Count of bytes read: 0
java.io.IOException: Unable to read authentication method
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1344)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:761)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:560)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:535)
2014-03-18 12:19:01,960 DEBUG org.apache.hadoop.ipc.Server: IPC Server listener 
on 8485: disconnecting client 10.204.8.136:39063. Number of active connections: 
1
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-5763) Service ACL not refresh on both ANN and SNN

2014-01-13 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-5763:
-

 Summary: Service ACL not refresh on both ANN and SNN
 Key: HDFS-5763
 URL: https://issues.apache.org/jira/browse/HDFS-5763
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 3.0.0
Reporter: Fengdong Yu


Configured hadoop-policy.xml on the active NN, then:
hdfs dfsadmin -refreshServiceAcl

but service ACL refreshed only on the standby NN or active NN, not both.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5670) FSPermission check is incorrect

2013-12-15 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu resolved HDFS-5670.
---

Resolution: Not A Problem

Sorry for my mistake.

 FSPermission check is incorrect
 ---

 Key: HDFS-5670
 URL: https://issues.apache.org/jira/browse/HDFS-5670
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Fengdong Yu
 Fix For: 3.0.0, 2.3.0


 FSPermission check is incorrect after update in the trunk recently.
 I submitted MR job using root, but the whole output directory must be owned 
 by root, otherwise, it throws Exception:
 {code}
 [root@10 ~]# hadoop fs -ls /
 Found 1 items
 drwxr-xr-x   - hadoop supergroup  0 2013-12-15 10:04 /user
 [root@10 ~]# 
 [root@10 ~]# hadoop fs -ls /user
 Found 1 items
 drwxr-xr-x   - root root  0 2013-12-15 10:04 /user/root
 {code}
 {code}
 [root@10 ~]# hadoop jar airui.jar  /input /user/root/
 Exception in thread main org.apache.hadoop.security.AccessControlException: 
 Permission denied: user=root, access=WRITE, 
 inode=/user:hadoop:supergroup:drwxr-xr-x
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:161)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5410)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3236)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3190)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3174)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:708)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:514)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (HDFS-5670) FSPermission check is incorrect

2013-12-14 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-5670:
-

 Summary: FSPermission check is incorrect
 Key: HDFS-5670
 URL: https://issues.apache.org/jira/browse/HDFS-5670
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.2.0, 3.0.0
Reporter: Fengdong Yu
 Fix For: 3.0.0, 2.3.0


FSPermission check is incorrect after update in the trunk recently.
I submitted MR job using root, but the whole output directory must be owned by 
root, otherwise, it throws Exception:
{code}
[root@10 ~]# hadoop fs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup  0 2013-12-15 10:04 /user
[root@10 ~]# 
[root@10 ~]# hadoop fs -ls /user
Found 1 items
drwxr-xr-x   - root root  0 2013-12-15 10:04 /user/root
{code}

{code}
[root@10 ~]# hadoop jar airui.jar  /input /user/root/
Exception in thread main org.apache.hadoop.security.AccessControlException: 
Permission denied: user=root, access=WRITE, 
inode=/user:hadoop:supergroup:drwxr-xr-x
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:161)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5410)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:3236)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:3190)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3174)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:708)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:514)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
{code}




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (HDFS-5550) Journal Node is not upgrade during HDFS upgrade

2013-11-26 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu resolved HDFS-5550.
---

Resolution: Not A Problem

I closed this issue. during upgrade,  we need to run -initializeSharedEdits to 
upgrade Journal nodes. 

 Journal Node is not upgrade during HDFS upgrade
 ---

 Key: HDFS-5550
 URL: https://issues.apache.org/jira/browse/HDFS-5550
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs-client
Affects Versions: 3.0.0, 2.2.0, 2.2.1
Reporter: Fengdong Yu
Priority: Blocker

 HDFS upgrade from 2.0.3 to 2.2.0,  but Journal node doesn't upgrade, a 
 directly problem is VERSION file is old.
 so SNN relay edit log through http://hostname:8480/getJournal?*** get 403, 
 because VERSION is mismatched.
 I marked this as Blocker, does that OK?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HDFS-5553) SNN crashed because edit log has gap after upgrade

2013-11-26 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu resolved HDFS-5553.
---

Resolution: Not A Problem

I closed this issue. during upgrade,  we need to empty JN's edit's dir before  
run -initializeSharedEdits to upgrade Journal nodes. 

 SNN crashed because edit log has gap after upgrade
 --

 Key: HDFS-5553
 URL: https://issues.apache.org/jira/browse/HDFS-5553
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs-client
Affects Versions: 3.0.0, 2.2.0
Reporter: Fengdong Yu
Priority: Blocker

 As HDFS-5550 depicted, journal nodes doesn't upgrade, so I change the VERSION 
 manually according to NN's VERSION.
 then , I do upgrade and get this exception. I also marked this as Blocker.
 my steps as following:
 It's a fresh cluster with hadoop-2.0.1 before upgrading.
 0) install hadoop-2.2.0 hadoop package on all nodes.
 1) stop-dfs.sh on active NN
 2) disable HA in the core-site.xml and hdfs-site.xml on active NN and SNN
 3) start-dfs.sh -upgrade -clusterId test-cluster on active NN(only one NN 
 now.)
 4) stop-dfs.sh after active NN started successfully.
 5) enable HA in the core-site.xml and hdfs-site.xml on active NN and SNN
 6) change all journal nodes' VERSION manually according to NN's VERSION
 7) rm -f 'dfs.journalnode.edits.dir'/test-cluster/current/* (just keep 
 VERSION here)
 8) delete all data under 'dfs.namenode.name.dir' on SNN
 9) scp -r 'dfs.namenode.name.dir' to SNN on active NN
 10) start-dfs.sh



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5573) CacheAdmin doesn't work

2013-11-26 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-5573:
-

 Summary: CacheAdmin doesn't work
 Key: HDFS-5573
 URL: https://issues.apache.org/jira/browse/HDFS-5573
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Fengdong Yu


The code is compile from the trunk, and run cacheAdmin on the Active NN,

Exceptions as follow:
{code}
[hadoop@10 ~]$ hdfs cacheadmin -addPool test2
Successfully added cache pool test2.
[hadoop@10 ~]$ hdfs cacheadmin -addDirective -path /test/core-site.xml -pool 
test2 -replication 3
Added cache directive 3
[hadoop@10 ~]$ 
[hadoop@10 ~]$ 
[hadoop@10 ~]$ 
[hadoop@10 ~]$ hdfs cacheadmin -listDirectives
Exception in thread main 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): 
Operation category READ is not supported in state standby
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1562)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1128)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.listCacheDirectives(FSNamesystem.java:7168)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1267)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer$ServerSideCacheEntriesIterator.makeRequest(NameNodeRpcServer.java:1253)
at 
org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77)
at 
org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85)
at 
org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.listCacheDirectives(ClientNamenodeProtocolServerSideTranslatorPB.java:1085)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1961)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1957)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1515)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1955)

at org.apache.hadoop.ipc.Client.call(Client.java:1405)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.listCacheDirectives(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1079)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB$CacheEntriesIterator.makeRequest(ClientNamenodeProtocolTranslatorPB.java:1064)
at 
org.apache.hadoop.fs.BatchedRemoteIterator.makeRequest(BatchedRemoteIterator.java:77)
at 
org.apache.hadoop.fs.BatchedRemoteIterator.makeRequestIfNeeded(BatchedRemoteIterator.java:85)
at 
org.apache.hadoop.fs.BatchedRemoteIterator.hasNext(BatchedRemoteIterator.java:99)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$32.hasNext(DistributedFileSystem.java:1656)
at 
org.apache.hadoop.hdfs.tools.CacheAdmin$ListCacheDirectiveInfoCommand.run(CacheAdmin.java:450)
at org.apache.hadoop.hdfs.tools.CacheAdmin.run(CacheAdmin.java:84)
at org.apache.hadoop.hdfs.tools.CacheAdmin.main(CacheAdmin.java:89)
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5561) New Web UI cannot display correctly

2013-11-24 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-5561:
-

 Summary: New Web UI cannot display correctly
 Key: HDFS-5561
 URL: https://issues.apache.org/jira/browse/HDFS-5561
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0, 3.0.0
Reporter: Fengdong Yu
Assignee: Haohui Mai
Priority: Minor


the new web UI cannot display correctly, I attached the screen shot.

I've tried on Chrome31.0.1650, Firefox 25.0.1.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-5553) SNN crashed because edit log has gap after upgrade

2013-11-22 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-5553:
-

 Summary: SNN crashed because edit log has gap after upgrade
 Key: HDFS-5553
 URL: https://issues.apache.org/jira/browse/HDFS-5553
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs-client
Affects Versions: 2.2.0, 3.0.0
Reporter: Fengdong Yu
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HDFS-4959) Decommission data nodes, There is no response

2013-07-05 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4959:
-

 Summary: Decommission data nodes, There is no response
 Key: HDFS-4959
 URL: https://issues.apache.org/jira/browse/HDFS-4959
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.5-alpha
Reporter: Fengdong Yu




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4939) Retain old edits log, don't retain all minimum required logs

2013-06-26 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4939:
-

 Summary: Retain old edits log, don't retain all minimum required 
logs
 Key: HDFS-4939
 URL: https://issues.apache.org/jira/browse/HDFS-4939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: Fengdong Yu
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-4939) Retain old edits log, don't retain all minimum required logs

2013-06-26 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu resolved HDFS-4939.
---

Resolution: Won't Fix

 Retain old edits log, don't retain all minimum required logs
 

 Key: HDFS-4939
 URL: https://issues.apache.org/jira/browse/HDFS-4939
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor

 JNStorage.java
 {code}
   private static void purgeMatching(File dir, ListPattern patterns,
   long minTxIdToKeep) throws IOException {
 for (File f : FileUtil.listFiles(dir)) {
   if (!f.isFile()) continue;
   for (Pattern p : patterns) {
 Matcher matcher = p.matcher(f.getName());
 if (matcher.matches()) {
   // This parsing will always succeed since the group(1) is
   // /\d+/ in the regex itself.
   long txid = Long.valueOf(matcher.group(1));
   if (txid  minTxIdToKeep) {
 LOG.info(Purging no-longer needed file  + txid);
 if (!f.delete()) {
   LOG.warn(Unable to delete no-longer-needed data  +
   f);
 }
 break;
   }
 }
   }
 }
   }
 {code}
 Why break the for loop here? if so, only delete one file for each retain, am 
 I right?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes

2013-06-25 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4932:
-

 Summary: Avoid a long line on the name node webUI if we have more 
Journal nodes
 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta


If we have more Journal nodes, It shows a long line on the name node webUI, 
this patch wrapped line. just show three journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4922) Improve the short-circuit document

2013-06-20 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4922:
-

 Summary: Improve the short-circuit document
 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HDFS-4533) start-dfs.sh ignored additional parameters besides -upgrade

2013-06-18 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu reopened HDFS-4533:
---


There was a bug for add addtional name node's options.

 start-dfs.sh ignored additional parameters besides -upgrade
 ---

 Key: HDFS-4533
 URL: https://issues.apache.org/jira/browse/HDFS-4533
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.0.3-alpha
Reporter: Fengdong Yu
Assignee: Fengdong Yu
  Labels: patch
 Fix For: 2.1.0-beta

 Attachments: HDFS-4533_2.patch, HDFS-4533.patch


 start-dfs.sh only takes -upgrade option and ignored others. 
 So If run the following command, it will ignore the clusterId option.
 start-dfs.sh -upgrade -clusterId 1234

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4917) Start-dfs.sh cannot pass the parameters correctly

2013-06-18 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4917:
-

 Summary: Start-dfs.sh cannot pass the parameters correctly
 Key: HDFS-4917
 URL: https://issues.apache.org/jira/browse/HDFS-4917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Affects Versions: 2.0.4-alpha
Reporter: Fengdong Yu
Assignee: Fengdong Yu
 Fix For: 2.1.0-beta


There is a typo, which I found during my upgrade test. I've uploaded the patch 
here.
{code}
  nameStartOpt=$nameStartOpts $@ 
{code}

it should be:
{code}
  nameStartOpt=$nameStartOpt $@
{code}

otherwise, some parameters will be ignored, such as: -upgrade, -rollback


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4918) HDFS permission check is incorrect

2013-06-18 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4918:
-

 Summary: HDFS permission check is incorrect
 Key: HDFS-4918
 URL: https://issues.apache.org/jira/browse/HDFS-4918
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Affects Versions: 2.0.4-alpha
Reporter: Fengdong Yu
 Fix For: 2.1.0-beta




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4654) FileNotFoundException: ID mismatch

2013-03-31 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4654:
-

 Summary: FileNotFoundException: ID mismatch
 Key: HDFS-4654
 URL: https://issues.apache.org/jira/browse/HDFS-4654
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 3.0.0
Reporter: Fengdong Yu
 Fix For: 3.0.0


Mu cluster was build from source code trunk r1463074.

I got an exception as follows when I put a file to the HDFS.

13/04/01 09:33:45 WARN retry.RetryInvocationHandler: Exception while invoking 
addBlock of class ClientNamenodeProtocolTranslatorPB. Trying to fail over 
immediately.
13/04/01 09:33:45 WARN hdfs.DFSClient: DataStreamer Exception
java.io.FileNotFoundException: ID mismatch. Request id and saved id: 1073 , 1050
at org.apache.hadoop.hdfs.server.namenode.INodeId.checkId(INodeId.java:51)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2501)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2298)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2212)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:498)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:356)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40979)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:526)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1018)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1818)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1814)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1489)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1812)


please reproduce as :

hdfs dfs -put test.data  /user/data/test.data
after this command start to run, then kill active name node process.


I have only three nodes(A,B,C) for test
A and B are name nodes.
B and C are data nodes.
ZK deployed on A, B and C.

A, B and C are all journal nodes.

Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-4631) Support customized call back method during failover automatically.

2013-03-25 Thread Fengdong Yu (JIRA)
Fengdong Yu created HDFS-4631:
-

 Summary: Support customized call back method during failover 
automatically.
 Key: HDFS-4631
 URL: https://issues.apache.org/jira/browse/HDFS-4631
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha
Affects Versions: 3.0.0
Reporter: Fengdong Yu
 Fix For: 3.0.0


ZKFC add HealthCallbacks bu default, this can do quiteElection at least. but we 
often want to be alerted if there is fail over occurring(such as send email, 
short message), especially for prod cluster.

There is a configured fence script. maybe we can put all these logics in the 
script. but in reasonable, fence script does only one thing: fence :)

So I added this patch, we can configure customized HM callback method, if there 
is no configration, then only HealthCallbacks is added.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira