[jira] [Reopened] (HDFS-257) Does a big delete starve other clients?

2014-07-21 Thread Milind Bhandarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar reopened HDFS-257:



Yes, it does.

Reopening.

 Does a big delete starve other clients?
 ---

 Key: HDFS-257
 URL: https://issues.apache.org/jira/browse/HDFS-257
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Robert Chansler

 Or, more generally, is there _any_ operation that has the potential to 
 severely starve other clients?
 The speculation is that deleting a directory with 50,000 files might starve 
 other client for several seconds. Is that true? Is that necessary?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion

2014-07-21 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068254#comment-14068254
 ] 

Vinayakumar B commented on HDFS-6710:
-

Patch looks good. 

Nit: I can see trailing spaces in some places.

+1, on addressing that.

 Archival Storage: Consider block storage policy in replica deletion
 ---

 Key: HDFS-6710
 URL: https://issues.apache.org/jira/browse/HDFS-6710
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6710_20140720.patch


 Replica deletion should be modified in a way that the deletion won't break 
 the block storage policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6676) KMS throws AuthenticationException when enabling kerberos authentication

2014-07-21 Thread liyunzhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068314#comment-14068314
 ] 

liyunzhang commented on HDFS-6676:
--

you can enable kerberos http spnego authentication of KMS in following steps
1.config kdc server successfully
2.generate keberos ticket in client
ex.kinit HTTP/liyunzhangcentos.sh.intel@sh.intel.com -kt 
/home/zly/http.keytab 
3.edit kms-site.xml, edit following property item
property
namehadoop.kms.authentication.type/name
valuekerberos/value
description
 simple or kerberos
/description
/property
property
namehadoop.kms.authentication.kerberos.keytab/name
value/home/zly/hadoop-3.0.0-SNAPSHOT/etc/hadoop/kerberos/HTTP.keytab/value
description
/description
/property   
property
namehadoop.kms.authentication.kerberos.principal/name
valueHTTP/liyunzhangcentos.sh.intel@sh.intel.com/value
description
/description
/property
4.start kms server
5.use curl to test kms functions like create key
ex:#curl -i --negotiate -u: -X POST -d @createkey.json 
http://liyunzhangcentos.sh.intel.com:16000/kms/v1/keys --header 
Content-Type:application/json
HTTP/1.1 401 Unauthorized
Server: Apache-Coyote/1.1
WWW-Authenticate: Negotiate
Set-Cookie: hadoop.auth=; Expires=Thu, 01-Jan-1970 00:00:00 GMT; HttpOnly
Content-Type: text/html;charset=utf-8
Content-Length: 997
Date: Mon, 21 Jul 2014 06:27:59 GMT
HTTP/1.1 201 Created
Server: Apache-Coyote/1.1
Set-Cookie: 
hadoop.auth=u=HTTPp=HTTP/liyunzhangcentos.sh.intel@sh.intel.comt=kerberose=1405960084208s=UgeM6AwoHo46HDntyVXB/OLK6u8=;
 Expires=Mon, 21-Jul-2014 16:28:04 GMT; HttpOnly
Location: http://liyunzhangcentos.sh.intel.com:16000/kms/v1/keys/v1/key/k1
Content-Type: application/json
Content-Length: 55
Date: Mon, 21 Jul 2014 06:28:33 GMT
Res {versionName : k1@0,
  material : 12345w==
} 

 KMS throws AuthenticationException when enabling kerberos authentication 
 -

 Key: HDFS-6676
 URL: https://issues.apache.org/jira/browse/HDFS-6676
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.4.0
Reporter: liyunzhang
Priority: Minor

 When I made a request http://server-1941.novalocal:16000/kms/v1/names in 
 firefox. (before, i set configs in firefox according 
 https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/sso-config-firefox.html),
  following info was found in logs/kms.log.
 2014-07-14 19:18:30,461 WARN  AuthenticationFilter - Authentication 
 exception: GSSException: Failure unspecified at GSS-API level (Mechanism 
 level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but 
 decryption key is of type NULL)
 org.apache.hadoop.security.authentication.client.AuthenticationException: 
 GSSException: Failure unspecified at GSS-API level (Mechanism levelis of type 
 NULL)
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:380)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:357)
   at 
 org.apache.hadoop.crypto.key.kms.server.KMSAuthenticationFilter.doFilter(KMSAuthenticationFilter.java:100)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
   at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism 
 level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but 
 decryption key is of type NULL)
   at 
 sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788)
   at 
 sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
   at 
 sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
   at 
 

[jira] [Resolved] (HDFS-6676) KMS throws AuthenticationException when enabling kerberos authentication

2014-07-21 Thread liyunzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang resolved HDFS-6676.
--

Resolution: Not a Problem

 KMS throws AuthenticationException when enabling kerberos authentication 
 -

 Key: HDFS-6676
 URL: https://issues.apache.org/jira/browse/HDFS-6676
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.4.0
Reporter: liyunzhang
Priority: Minor

 When I made a request http://server-1941.novalocal:16000/kms/v1/names in 
 firefox. (before, i set configs in firefox according 
 https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/sso-config-firefox.html),
  following info was found in logs/kms.log.
 2014-07-14 19:18:30,461 WARN  AuthenticationFilter - Authentication 
 exception: GSSException: Failure unspecified at GSS-API level (Mechanism 
 level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but 
 decryption key is of type NULL)
 org.apache.hadoop.security.authentication.client.AuthenticationException: 
 GSSException: Failure unspecified at GSS-API level (Mechanism levelis of type 
 NULL)
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:380)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:357)
   at 
 org.apache.hadoop.crypto.key.kms.server.KMSAuthenticationFilter.doFilter(KMSAuthenticationFilter.java:100)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
   at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism 
 level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but 
 decryption key is of type NULL)
   at 
 sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788)
   at 
 sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
   at 
 sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
   at 
 sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875)
   at 
 sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548)
   at 
 sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
   at 
 sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:347)
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:329)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:329)
   ... 14 more
 Caused by: KrbException: EncryptedData is encrypted using keytype DES CBC 
 mode with CRC-32 but decryption key is of type NULL
   at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:169)
   at sun.security.krb5.KrbCred.init(KrbCred.java:131)
   at 
 sun.security.jgss.krb5.InitialToken$OverloadedChecksum.init(InitialToken.java:282)
   at 
 sun.security.jgss.krb5.InitSecContextToken.init(InitSecContextToken.java:130)
   at 
 sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771)
   ... 25 more
   
 Kerberos is enabled successful in my environment:
 klist
 Ticket cache: FILE:/tmp/krb5cc_0
 Default principal: HTTP/server-1941.novalocal@NOVALOCAL
 Valid starting ExpiresService principal
 

[jira] [Commented] (HDFS-6637) Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer

2014-07-21 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068350#comment-14068350
 ] 

Vinayakumar B commented on HDFS-6637:
-

I think, the configuration dfs.ha.standby.checkpoints added to disable the 
checkpoints for tests, since the HTTP ports could be specified ephemeral. In 
this case Standby NN will not know the Active NN's http port to do checkpoint.

See the comment in MiniDFSCluster
{code}  // In an HA cluster, in order for the StandbyNode to perform 
checkpoints,
  // it needs to know the HTTP port of the Active. So, if ephemeral ports
  // are chosen, disable checkpoints for the test.
  if (!nnTopology.allHttpPortsSpecified() 
  nnTopology.isHA()) {
LOG.info(MiniDFSCluster disabling checkpointing in the Standby node  +
since no HTTP ports have been specified.);
conf.setBoolean(DFS_HA_STANDBY_CHECKPOINTS_KEY, false);
  }{code}

Even though I agree that user also could disable checkpointing, but I don't 
think this is possible in real HA cluster. What user will do with checkpoints 
disabled ..? ;)

 Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer
 -

 Key: HDFS-6637
 URL: https://issues.apache.org/jira/browse/HDFS-6637
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Dian Fu
 Attachments: HDFS-6637.patch, HDFS-6637.patch.1


 In HA setup cluster, for rolling upgrade, the image file fsimage_rollback 
 is generated by StandbyCheckpointer thread of SBN. While if configuration 
 dfs.ha.standby.checkpoints is set false, there will be no 
 StandbyCheckpointer thread in SBN. This will lead to the rolling upgrade 
 never finish. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-21 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6698:


Attachment: HDFS-6698.txt

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-21 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6698:


Status: Patch Available  (was: Open)

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6637) Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer

2014-07-21 Thread Dian Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068353#comment-14068353
 ] 

Dian Fu commented on HDFS-6637:
---

Thanks very much for the comments, Vinay. The explanation make sense. Thanks 
very much.

 Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer
 -

 Key: HDFS-6637
 URL: https://issues.apache.org/jira/browse/HDFS-6637
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Dian Fu
 Attachments: HDFS-6637.patch, HDFS-6637.patch.1


 In HA setup cluster, for rolling upgrade, the image file fsimage_rollback 
 is generated by StandbyCheckpointer thread of SBN. While if configuration 
 dfs.ha.standby.checkpoints is set false, there will be no 
 StandbyCheckpointer thread in SBN. This will lead to the rolling upgrade 
 never finish. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster

2014-07-21 Thread Vinayakumar B (JIRA)
Vinayakumar B created HDFS-6714:
---

 Summary: TestBlocksScheduledCounter#testBlocksScheduledCounter 
should shutdown cluster
 Key: HDFS-6714
 URL: https://issues.apache.org/jira/browse/HDFS-6714
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Minor


TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the 
cluster after test. 

This could lead to errors in windows while running non-forked tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster

2014-07-21 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6714:


Attachment: HDFS-6714.patch

Attached the patch. Please review

 TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
 -

 Key: HDFS-6714
 URL: https://issues.apache.org/jira/browse/HDFS-6714
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Minor
 Attachments: HDFS-6714.patch


 TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the 
 cluster after test. 
 This could lead to errors in windows while running non-forked tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster

2014-07-21 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6714:


Status: Patch Available  (was: Open)

 TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
 -

 Key: HDFS-6714
 URL: https://issues.apache.org/jira/browse/HDFS-6714
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Minor
 Attachments: HDFS-6714.patch


 TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the 
 cluster after test. 
 This could lead to errors in windows while running non-forked tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-21 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068361#comment-14068361
 ] 

Liang Xie commented on HDFS-6698:
-

In normal situation, The HFiles which all of HBase (p)reads against to should 
be immuable, so i assumed the attached patch per [~saint@gmail.com]'s 
suggestion is enough to relieve the pread(s) were blocked by read request in 
HBase issue. Let's see QA result...

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6637) Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer

2014-07-21 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6637:


Resolution: Invalid
Status: Resolved  (was: Patch Available)

Thanks [~dian.fu], Making this as invalid due to invalid (Un-real) 
configuration.

Feel free to re-open if you feel this has to be fixed.

 Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer
 -

 Key: HDFS-6637
 URL: https://issues.apache.org/jira/browse/HDFS-6637
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Dian Fu
 Attachments: HDFS-6637.patch, HDFS-6637.patch.1


 In HA setup cluster, for rolling upgrade, the image file fsimage_rollback 
 is generated by StandbyCheckpointer thread of SBN. While if configuration 
 dfs.ha.standby.checkpoints is set false, there will be no 
 StandbyCheckpointer thread in SBN. This will lead to the rolling upgrade 
 never finish. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing

2014-07-21 Thread Yu Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068447#comment-14068447
 ] 

Yu Li commented on HDFS-6441:
-

Hi [~szetszwo] [~aagarwal] and [~benoyantony],

Sorry for the late response, really didn't expect a response after 1 month or 
so :-P

Sure I don't mind if we contribute the feature here, I'm glad if only the 
feature could be added, no matter how we get it done. :-)

About the patch, I could see the advantage of using a file to pass the 
node-list of include/exclude nodes especially when the list is long, meanwhile 
I'd say it would be great if we also support passing the servers through 
parameter, which is much easier to invoke the tool from another program(so we 
could still complete the HDFS-6009 work :-))

 Add ability to exclude/include few datanodes while balancing
 

 Key: HDFS-6441
 URL: https://issues.apache.org/jira/browse/HDFS-6441
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
 HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
 HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch


 In some use cases, it is desirable to ignore a few data nodes  while 
 balancing. The administrator should be able to specify a list of data nodes 
 in a file similar to the hosts file and the balancer should ignore these data 
 nodes while balancing so that no blocks are added/removed on these nodes.
 Similarly it will be beneficial to specify that only a particular list of 
 datanodes should be considered for balancing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6702) DFSClient should create blocks using StorageType

2014-07-21 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068456#comment-14068456
 ] 

Vinayakumar B commented on HDFS-6702:
-

Patch looks good 

One nit.
{code}
+final ListStorageTypeProto protos = new ArrayListStorageTypeProto(
+types.length);
+for (int i = startIdx; i  types.length; ++i) {
+  protos.add(convertStorageType(types[i]));
+}
{code}
Here, initialCapacity can be given as {{types.length-startIdx}}, otherwise 
extra allocation will happen for non-zero startIdx.

+1 on addressing this

 DFSClient should create blocks using StorageType 
 -

 Key: HDFS-6702
 URL: https://issues.apache.org/jira/browse/HDFS-6702
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6702_20140719.patch, h6702_20140721.patch, 
 h6702_20140721b.patch


 When DFSClient asks NN for a new block (via addBlock), NN returns a 
 LocatedBlock with storage type information.  However, DFSClient does not use 
 StorageType to create blocks with DN.  As a result, the block replicas could 
 possibly be created with a different storage type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068461#comment-14068461
 ] 

Hadoop QA commented on HDFS-6698:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656843/HDFS-6698.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.TestDatanodeConfig
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7408//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7408//console

This message is automatically generated.

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-21 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068467#comment-14068467
 ] 

Liang Xie commented on HDFS-6698:
-

Those three failure cases are not related with current patch.

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-6698.txt


 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster

2014-07-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068470#comment-14068470
 ] 

Hadoop QA commented on HDFS-6714:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656844/HDFS-6714.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager
  
org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7409//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7409//console

This message is automatically generated.

 TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
 -

 Key: HDFS-6714
 URL: https://issues.apache.org/jira/browse/HDFS-6714
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Minor
 Attachments: HDFS-6714.patch


 TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the 
 cluster after test. 
 This could lead to errors in windows while running non-forked tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6702) DFSClient should create blocks using StorageType

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068496#comment-14068496
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6702:
---

Thanks Vinay.  I do have new ArrayList with types.length-startIdx in my 
previous patch (h6702_20140721.patch).  However, TestDiskError fails so that I 
change it to types.length.

 DFSClient should create blocks using StorageType 
 -

 Key: HDFS-6702
 URL: https://issues.apache.org/jira/browse/HDFS-6702
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6702_20140719.patch, h6702_20140721.patch, 
 h6702_20140721b.patch


 When DFSClient asks NN for a new block (via addBlock), NN returns a 
 LocatedBlock with storage type information.  However, DFSClient does not use 
 StorageType to create blocks with DN.  As a result, the block replicas could 
 possibly be created with a different storage type.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6710:
--

Attachment: h6710_20140721.patch

h6710_20140721.patch: removes trailing spaces.

 Archival Storage: Consider block storage policy in replica deletion
 ---

 Key: HDFS-6710
 URL: https://issues.apache.org/jira/browse/HDFS-6710
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6710_20140720.patch, h6710_20140721.patch


 Replica deletion should be modified in a way that the deletion won't break 
 the block storage policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-6710.
---

  Resolution: Fixed
Hadoop Flags: Reviewed

Thank Vinay for reviewing the patch.

I have committed this.

 Archival Storage: Consider block storage policy in replica deletion
 ---

 Key: HDFS-6710
 URL: https://issues.apache.org/jira/browse/HDFS-6710
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6710_20140720.patch, h6710_20140721.patch


 Replica deletion should be modified in a way that the deletion won't break 
 the block storage policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6679) Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6679:
--

Hadoop Flags: Reviewed

+1 patch looks good.

 Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files
 --

 Key: HDFS-6679
 URL: https://issues.apache.org/jira/browse/HDFS-6679
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Vinayakumar B
 Attachments: HDFS-6679.patch, HDFS-6679.patch, HDFS-6679.patch, 
 editsStored


 HDFS-6677 changed fsimage for storing storage policy IDs.  We should bump the 
 NameNodeLayoutVersion and as well fix the tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6679) Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-6679.
---

   Resolution: Fixed
Fix Version/s: Archival Storage (HDFS-6584)

I have committed this.  Thanks, Vinay!

 Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files
 --

 Key: HDFS-6679
 URL: https://issues.apache.org/jira/browse/HDFS-6679
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Vinayakumar B
 Fix For: Archival Storage (HDFS-6584)

 Attachments: HDFS-6679.patch, HDFS-6679.patch, HDFS-6679.patch, 
 editsStored


 HDFS-6677 changed fsimage for storing storage policy IDs.  We should bump the 
 NameNodeLayoutVersion and as well fix the tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6710:
--

Fix Version/s: Archival Storage (HDFS-6584)

 Archival Storage: Consider block storage policy in replica deletion
 ---

 Key: HDFS-6710
 URL: https://issues.apache.org/jira/browse/HDFS-6710
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Fix For: Archival Storage (HDFS-6584)

 Attachments: h6710_20140720.patch, h6710_20140721.patch


 Replica deletion should be modified in a way that the deletion won't break 
 the block storage policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6686) Archival Storage: Use fallback storage types

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6686:
--

Attachment: h6710_20140721.patch

Vinay, you are right that the patch depends on HDFS-6710 which is now 
committed.  Thanks for trying the patch.

h6710_20140721.patch: updates with the branch.

 Archival Storage: Use fallback storage types
 

 Key: HDFS-6686
 URL: https://issues.apache.org/jira/browse/HDFS-6686
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6686_20140721.patch, h6710_20140721.patch


 HDFS-6671 changes replication monitor to use block storage policy for 
 replication.  It should also use the fallback storage types when a particular 
 type of storage is full.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-49) MiniDFSCluster.stopDataNode will always shut down a node in the cluster if a matching name is not found

2014-07-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068635#comment-14068635
 ] 

Steve Loughran commented on HDFS-49:


nope, [Still 
There|https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java#L1798]

 MiniDFSCluster.stopDataNode will always shut down a node in the cluster if a 
 matching name is not found
 ---

 Key: HDFS-49
 URL: https://issues.apache.org/jira/browse/HDFS-49
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0, 0.20.205.0, 1.1.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
  Labels: codereview, newbie
 Attachments: hdfs-49.patch

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 The stopDataNode method will shut down the last node in the list of nodes, if 
 one matching a specific name is not found
 This is possibly not what was intended. Better to return false or fail in 
 some other manner if the named node was not located
  synchronized boolean stopDataNode(String name) {
 int i;
 for (i = 0; i  dataNodes.size(); i++) {
   DataNode dn = dataNodes.get(i).datanode;
   if (dn.dnRegistration.getName().equals(name)) {
 break;
   }
 }
 return stopDataNode(i);
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-169) dfshealth.jsp: sorting on remaining doesn't actually sort

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-169.
---

Resolution: Incomplete

hdfs gui has been rewritten. closing as stale.

 dfshealth.jsp: sorting on remaining doesn't actually sort
 ---

 Key: HDFS-169
 URL: https://issues.apache.org/jira/browse/HDFS-169
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Michael Bieniosek

 When I try to sort by remaining in dfshealth.jsp, I get sent to an url that 
 looks like: 
 http://example.com:50070/dfshealth.jsp?sorter/field=remainingsorter/order=ASC
 But, the resultant table doesn't seem to be sorted at all (though the order 
 is different).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2251) Namenode does not recognize incorrectly sized blocks

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068749#comment-14068749
 ] 

Allen Wittenauer commented on HDFS-2251:


Despite Brian saying that we could close this out, it'd be good if we could 
verify that this is fixed. 

 Namenode does not recognize incorrectly sized blocks
 

 Key: HDFS-2251
 URL: https://issues.apache.org/jira/browse/HDFS-2251
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brian Bockelman

 We had a lot of file system corruption resulting in incorrectly sized blocks 
 (on disk, they're truncated to 192KB when they should be 64MB).
 However, I cannot make Hadoop realize that these blocks are incorrectly 
 sized.  When I try to drain off the node, I get the following messages:
 2008-10-29 18:46:51,293 WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent 
 size for block blk_-4403534125663454855_9937 reported from 172.16.1.150:50010 
 current size is 67108864 reported size is 196608
 Here 172.16.1.150 is not the node which has the problematic block, but the 
 destination of the file transfer.  I propose that Hadoop should either:
 a) Upon startup, make sure that all blocks are properly sized (pro: rather 
 cheap check; con: doesn't catch any truncations which happen while on disk)
 b) Upon detecting the incorrectly sized copy, Hadoop should ask the source of 
 the block to perform a block verification.
 Thanks,
 Brian



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-867) Add a PowerTopology class to aid replica placement and enhance availability of blocks

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-867.
---

Resolution: Fixed

I'm going to close this as fixed:  using either a multi-level topology or just 
within the network topology, one can build the power topology into the system. 
(And, in practice, this is what many of us do...)

 Add a PowerTopology class to aid replica placement and enhance availability 
 of blocks 
 --

 Key: HDFS-867
 URL: https://issues.apache.org/jira/browse/HDFS-867
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jeff Hammerbacher
Priority: Minor

 Power outages are a common reason for a DataNode to become unavailable. 
 Having a data structure to represent to the power topology of your data 
 center can be used to implement a power-aware replica placement policy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-333) A State Machine for name-node blocks.

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068760#comment-14068760
 ] 

Allen Wittenauer commented on HDFS-333:
---

Given the changes with HA, etc,  I wonder if this is still valid.  Ping!

 A State Machine for name-node blocks.
 -

 Key: HDFS-333
 URL: https://issues.apache.org/jira/browse/HDFS-333
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Konstantin Shvachko

 Blocks on the name-node can belong to different collections like the 
 blocksMap, under-replicated, over-replicated lists, etc.
 It is getting more and more complicated to keep the lists consistent.
 It would be good to formalize the movement of the blocks between the 
 collections using a state machine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-201) Spring and OSGi support

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-201.
---

Resolution: Duplicate

 Spring and OSGi support
 ---

 Key: HDFS-201
 URL: https://issues.apache.org/jira/browse/HDFS-201
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jon Brisbin
Assignee: Jean-Baptiste Onofré
 Attachments: HDFS-201.patch


 I was able to compile 0.18.2 in eclipse into a new OSGi bundle using eclipse 
 PDE. Using Spring to control the HDFS nodes, however, seems out of the 
 question for the time being because of inter-dependencies between packages 
 that should be separate OSGi bundles (for example, SecondaryNameNode includes 
 direct references to StatusHttpServer, which should be in a bundle with a 
 web personality that is separate from Hadoop Core). Looking through the 
 code that starts the daemons, it would seem code changes are necessary to 
 allow for components to be dependency-injected. Rather than instantiating a 
 StatusHttpServer inside the SecondaryNameNode, that reference should (at the 
 very least) be able to be dependency-injected (for example from an OSGi 
 service from another bundle). Adding setters for infoServer would allow that 
 reference to be injected by Spring. This is just an example of the changes 
 that would need to be made to get Hadoop to live happily inside an OSGi 
 container.
 As a starting point, it would be nice if Hadoop core was able to be split 
 into a client bundle that could be deployed into OSGi containers that would 
 provide client-only access to HDFS clusters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-331) hadoop fsck should ignore lost+found

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-331:
--

Labels: newbie  (was: )

 hadoop fsck should ignore lost+found
 

 Key: HDFS-331
 URL: https://issues.apache.org/jira/browse/HDFS-331
 Project: Hadoop HDFS
  Issue Type: Improvement
 Environment: All
Reporter: Lohit Vijayarenu
Priority: Minor
  Labels: newbie

 hadoop fsck / would check state of entire filesystem. It would be good to 
 have an option to ignore lost+found directory. Better yet, a way to specify 
 ignore list.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6446) NFS: Different error messages for appending/writing data from read only mount

2014-07-21 Thread Yesha Vora (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068789#comment-14068789
 ] 

Yesha Vora commented on HDFS-6446:
--

[~abutala], The issue was not tested for trunk. It was tested with Hadoop 
2.2.0. [~brandonli], do you have any idea if this Jira is fixed recently?

 NFS: Different error messages for appending/writing data from read only mount
 -

 Key: HDFS-6446
 URL: https://issues.apache.org/jira/browse/HDFS-6446
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora

 steps:
 1) set dfs.nfs.exports.allowed.hosts = nfs_client ro
 2) Restart nfs server
 3) Append data on file present on hdfs from read only mount point
 Append data
 {noformat}
 bash$ cat /tmp/tmp_10MB.txt  /tmp/tmp_mnt/expected_data_stream
 cat: write error: Input/output error
 {noformat}
 4) Write data from read only mount point
 Copy data
 {noformat}
 bash$ cp /tmp/tmp_10MB.txt /tmp/tmp_mnt/tmp/
 cp: cannot create regular file `/tmp/tmp_mnt/tmp/tmp_10MB.txt': Permission 
 denied
 {noformat}
 Both operations are treated differently. Copying data returns valid error 
 message: 'Permission denied' . Though append data does not return valid error 
 message



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-48) NN should check a block's length even if the block is not a new block when processing a blockreport

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-48.
--

Resolution: Duplicate

I'm going to close this as a dupe of HDFS-2251.

 NN should check a block's length even if the block is not a new block when 
 processing a blockreport
 ---

 Key: HDFS-48
 URL: https://issues.apache.org/jira/browse/HDFS-48
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hairong Kuang
Assignee: Hairong Kuang

 If the block length does not match the one in the blockMap, we should mark 
 the block as corrupted. This could help clearing the polluted replicas caused 
 by HADOOP-4810 and also help detect the on-disk block gets truncated/enlarged 
 manually by accident.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-425) fuse has major performance drop on slower machines

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-425:
--

Summary: fuse has major performance drop on slower machines  (was: Major 
performance drop on slower machines)

 fuse has major performance drop on slower machines
 --

 Key: HDFS-425
 URL: https://issues.apache.org/jira/browse/HDFS-425
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: fuse-dfs
Reporter: Marc-Olivier Fleury
Priority: Minor

 When running fuse_dfs on machines that have different CPU characteristics, I 
 noticed that the performance of fuse_dfs is very sensitive to the machine 
 power. 
 The command I used was simply a cat over a rather large amount of data stored 
 on HDFS. Here are the comparative times for the different types of machines:
 Intel(R) Pentium(R) 4 CPU 2.40GHz :2 min 40 s 
 Intel(R) Pentium(R) 4 CPU 3.06GHz: 1 min 50 s 
 2 x Intel(R) Pentium(R) 4 CPU 3.00GHz:   0 min 40 s 
 2 x Intel(R) Xeon(TM) MP CPU 3.33GHz:   0 min 28 s 
 Intel(R) Core(TM)2 Quad CPUQ6600  @ 2.40GHz  0 min 15 s
 I tried to find other explanations for the drop in performance, such as 
 network configuration, or data locality, but the faster machines are the ones 
 that are further away from the others considering the network 
 configuration, and that don't run datanodes.
 top shows that the CPU usage of fuse_dfs is between 80-90% on the slower 
 machines, and about 40% on the fastest one.
 This leads me to the conclusion that fuse_dfs consumes a lot of CPU 
 resources, much more than expected.
 Any help or insight concerning this issue will be greatly appreciated, since 
 these difference actually result in days of computations for a given job.
 Thank you



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-221) Trigger block scans for datanode

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-221.
---

Resolution: Duplicate

I'm going to close this a dupe of HDFS-366.

 Trigger block scans for datanode
 

 Key: HDFS-221
 URL: https://issues.apache.org/jira/browse/HDFS-221
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Brian Bockelman
Assignee: Lei (Eddy) Xu
 Attachments: manual_block_scan.patch, manual_fsck_scan.patch


 Provide a mechanism to trigger block scans in a datanode upon request.  
 Support interfaces for commands sent by the namenode and through the HTTP 
 interface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-366) Support manually fsck in DataNode

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068838#comment-14068838
 ] 

Allen Wittenauer commented on HDFS-366:
---

I wonder how this works in a post-security world.

 Support manually fsck in DataNode
 -

 Key: HDFS-366
 URL: https://issues.apache.org/jira/browse/HDFS-366
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Lei (Eddy) Xu
 Attachments: HADOOP-4763.patch, fsck.patch, fsck.patch


 Now DataNode only support scan all blocks periodically.  Our site need a tool 
 to check some blocks and files manually. 
 My current design is to add a parameter to DFSck to indicate deeply and 
 manually fsck request, then let NameNode collect property block identifies 
 and sent them to associated DataNode. 
 I'll let DataBlockScanner runs in two ways: periodically ( original one ) and 
 manually. 
 Any suggestions on this are welcome. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-166) NameNode#invalidateBlock's requirement on more than 1 valid replica exists before scheduling a replica to delete is too strict

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068861#comment-14068861
 ] 

Allen Wittenauer commented on HDFS-166:
---

Ping!

I bet this has been fixed.

 NameNode#invalidateBlock's requirement on more than 1 valid replica exists 
 before scheduling a replica to delete is too strict
 --

 Key: HDFS-166
 URL: https://issues.apache.org/jira/browse/HDFS-166
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hairong Kuang

 Currently invalideBlock does not allow to delete a replica only if at least 
 two valid replicas exist before deletion is scheduled. This is too 
 restrictive if the replica to delete is a corrupt one. NameNode could delete 
 a corrupt replica as long as at least one copy (no matter valid or corrupt) 
 will be left.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing

2014-07-21 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068869#comment-14068869
 ] 

Benoy Antony commented on HDFS-6441:


Thanks [~carp84], [~szetszwo] and [~arpitagarwal]. 
I'll rebase to the current trunk and will try to include Yu's suggestions.



 Add ability to exclude/include few datanodes while balancing
 

 Key: HDFS-6441
 URL: https://issues.apache.org/jira/browse/HDFS-6441
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
 HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
 HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch


 In some use cases, it is desirable to ignore a few data nodes  while 
 balancing. The administrator should be able to specify a list of data nodes 
 in a file similar to the hosts file and the balancer should ignore these data 
 nodes while balancing so that no blocks are added/removed on these nodes.
 Similarly it will be beneficial to specify that only a particular list of 
 datanodes should be considered for balancing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly

2014-07-21 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068895#comment-14068895
 ] 

Devaraj Das commented on HDFS-6680:
---

I am not sure if that loop needs to change. Seems to me that chooseTarget will 
have the same result with/without the change. I am missing something i guess..

 BlockPlacementPolicyDefault does not choose favored nodes correctly
 ---

 Key: HDFS-6680
 URL: https://issues.apache.org/jira/browse/HDFS-6680
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6680_20140714.patch, h6680_20140716.patch


 In one of the chooseTarget(..) methods, it tries all the favoredNodes to 
 chooseLocalNode(..).  It expects chooseLocalNode to return null if the local 
 node is not a good target.  Unfortunately, chooseLocalNode will fallback to 
 chooseLocalRack but not returning null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6570) add api that enables checking if a user has certain permissions on a file

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068894#comment-14068894
 ] 

Colin Patrick McCabe commented on HDFS-6570:


bq. acl.proto: I'm not sure it's backwards-compatible to take the existing 
FsActionProto nested inside AclEntryProto and move it to top level. If protobuf 
encodes the message name now as AclEntryProto.FsActionProto, then it might 
break interop. It would be interesting to test hdfs dfs -getfacl on files 
with ACLs using a mix of old client + new server or new client + old server. If 
there is a problem, then we might need to find a way to refer to the nested 
definition, or if all else fails maintain duplicate definitions (nested and 
top-level) just for comaptibility.

Protobuf doesn't encode field names.  It just assumes that the data you're 
giving it fits the schema you're giving it.  As far as I know, moving the enum 
from nested to top-level will not change its representation.Enums are just 
represented as varints in protobuf... i.e. the same as uint32s is represented.  
Unless you're changing the value of the enum constants, it shouldn't change 
anything.  So I believe this part is OK.

 add api that enables checking if a user has certain permissions on a file
 -

 Key: HDFS-6570
 URL: https://issues.apache.org/jira/browse/HDFS-6570
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Jitendra Nath Pandey
 Attachments: HDFS-6570-prototype.1.patch, HDFS-6570.2.patch


 For some of the authorization modes in Hive, the servers in Hive check if a 
 given user has permissions on a certain file or directory. For example, the 
 storage based authorization mode allows hive table metadata to be modified 
 only when the user has access to the corresponding table directory on hdfs. 
 There are likely to be such use cases outside of Hive as well.
 HDFS does not provide an api for such checks. As a result, the logic to check 
 if a user has permissions on a directory gets replicated in Hive. This 
 results in duplicate logic and there introduces possibilities for 
 inconsistencies in the interpretation of the permission model. This becomes a 
 bigger problem with the complexity of ACL logic.
 HDFS should provide an api that provides functionality that is similar to 
 access function in unistd.h - http://linux.die.net/man/2/access .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-313) Threads in servers should not die silently.

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068896#comment-14068896
 ] 

Allen Wittenauer commented on HDFS-313:
---

This probably needs to get revisited so PING!

 Threads in servers should not die silently.
 ---

 Key: HDFS-313
 URL: https://issues.apache.org/jira/browse/HDFS-313
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze

 If there is an uncaught exception, some threads in a server may die silently. 
 The corresponding error message does not show up in the log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-89) Datanode should verify block sizes vs metadata on startup

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-89.
--

Resolution: Duplicate

Not only was it previously reported Brian, it was done by you! :D

Closing as a dupe, as I found the JIRA.

 Datanode should verify block sizes vs metadata on startup
 -

 Key: HDFS-89
 URL: https://issues.apache.org/jira/browse/HDFS-89
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brian Bockelman

 I could have sworn this bug had been reported by someone else already, but I 
 can't find it on JIRA after searching apologies if this is a duplicate.
 The datanode, upon starting up, should check and make sure that all block 
 sizes as reported via `stat` are the same as the block sizes as reported via 
 the block's metadata.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-215) Offline Namenode fsImage verification

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-215.
---

Resolution: Fixed

As Jakob points out, oiv does this. Closing as fixed.

 Offline Namenode fsImage verification
 -

 Key: HDFS-215
 URL: https://issues.apache.org/jira/browse/HDFS-215
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Brian Bockelman

 Currently, there is no way to verify that a copy of the fsImage is not 
 corrupt.  I propose that we should have an offline tool that loads the 
 fsImage into memory to see if it is usable.  This will allow us to automate 
 backup testing to some extent.
 One can start a namenode process on the fsImage to see if it can be loaded, 
 but this is not easy to automate.
 To use HDFS in production, it is greatly desired to have both checkpoints - 
 and have some idea that the checkpoints are valid!  No one wants to see the 
 day where they reload from backup only to find that the fsImage in the backup 
 wasn't usable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-275) FSNamesystem should have an InvalidateBlockMap class to manage blocks scheduled to remove

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068913#comment-14068913
 ] 

Allen Wittenauer commented on HDFS-275:
---

I'm going to link these, since they seem like competing goals.

 FSNamesystem should have an InvalidateBlockMap class to manage blocks 
 scheduled to remove
 -

 Key: HDFS-275
 URL: https://issues.apache.org/jira/browse/HDFS-275
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: invalidateBlocksMap.patch


 This jira intends to move the code that handles recentInvalideSet to a 
 separate class InvalidateBlockMap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-207) add querying block's info in the fsck facility

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-207:
--

Component/s: namenode

 add querying block's info in the fsck facility
 --

 Key: HDFS-207
 URL: https://issues.apache.org/jira/browse/HDFS-207
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Reporter: zhangwei
Assignee: zhangwei
Priority: Minor
 Attachments: HADOOP-5019-2.patch, HADOOP-5019.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 As now the fsck can do pretty well,but when the developer happened to the log 
 such Block blk_28622148 is not valid.etc
 We wish to know which file and the datanodes the block belongs to.It  can be 
 solved by running bin/hadoop fsck -files -blocks -locations / | grep 
 blockid ,but as mentioned early in the HADOOP-4945 ,it's not an effective 
 way in a big product cluster.
 so maybe we could do something to let the fsck more convenience .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-207) add querying block's info in the fsck facility

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068919#comment-14068919
 ] 

Allen Wittenauer commented on HDFS-207:
---

Ping!

 add querying block's info in the fsck facility
 --

 Key: HDFS-207
 URL: https://issues.apache.org/jira/browse/HDFS-207
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Reporter: zhangwei
Assignee: zhangwei
Priority: Minor
 Attachments: HADOOP-5019-2.patch, HADOOP-5019.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 As now the fsck can do pretty well,but when the developer happened to the log 
 such Block blk_28622148 is not valid.etc
 We wish to know which file and the datanodes the block belongs to.It  can be 
 solved by running bin/hadoop fsck -files -blocks -locations / | grep 
 blockid ,but as mentioned early in the HADOOP-4945 ,it's not an effective 
 way in a big product cluster.
 so maybe we could do something to let the fsck more convenience .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068935#comment-14068935
 ] 

Colin Patrick McCabe commented on HDFS-6707:


Good find, Yongjun.

Parsing the output of this shell command is definitely worrisome.  It has been 
the source of bunch of bugs in the past.  Unfortunately, even if we came up 
with a JNI method to do this, not everyone would use it, so here we are.

Can we do this by skipping over the first code point following the comma, 
rather than by looking for a specific quote type?  That seems more flexible in 
case there are any more wacky variants.

 Intermittent failure of Symlink tests 
 TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem 
 -

 Key: HDFS-6707
 URL: https://issues.apache.org/jira/browse/HDFS-6707
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: symlinks
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, 
 HDFS-6707.003.dbg.patch, HDFS-6707.004.patch


 Symlink tests failure happened from time to time,
 https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/
 https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/
 {code}
 Failed
 org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink
 Failing for the past 1 build (Since Failed#7376 )
 Took 83 ms.
 Error Message
 Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
 Stacktrace
 java.io.IOException: Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
   at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266)
   at 
 org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 Standard Output
 2014-07-17 23:31:37,770 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile':
  No such file or directory
 2014-07-17 23:31:38,109 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile':
  File exists
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6570) add api that enables checking if a user has certain permissions on a file

2014-07-21 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068931#comment-14068931
 ] 

Chris Nauroth commented on HDFS-6570:
-

Thanks, Colin.

 add api that enables checking if a user has certain permissions on a file
 -

 Key: HDFS-6570
 URL: https://issues.apache.org/jira/browse/HDFS-6570
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Jitendra Nath Pandey
 Attachments: HDFS-6570-prototype.1.patch, HDFS-6570.2.patch


 For some of the authorization modes in Hive, the servers in Hive check if a 
 given user has permissions on a certain file or directory. For example, the 
 storage based authorization mode allows hive table metadata to be modified 
 only when the user has access to the corresponding table directory on hdfs. 
 There are likely to be such use cases outside of Hive as well.
 HDFS does not provide an api for such checks. As a result, the logic to check 
 if a user has permissions on a directory gets replicated in Hive. This 
 results in duplicate logic and there introduces possibilities for 
 inconsistencies in the interpretation of the permission model. This becomes a 
 bigger problem with the complexity of ACL logic.
 HDFS should provide an api that provides functionality that is similar to 
 access function in unistd.h - http://linux.die.net/man/2/access .



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6701) Make seed optional in NetworkTopology#sortByDistance

2014-07-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068943#comment-14068943
 ] 

Andrew Wang commented on HDFS-6701:
---

Hi Ashwin,

Just a nitty thing, otherwise +1:

{code}
+property
+  namedfs.namenode.randomize-block-locations-per-block/name
+  valuefalse/value
+  descriptionWhen there is no node local block, the default behavior
+while getting block locations is that - block locations of a block 
+are not randomized,so requests for a block go to same replica to take 
+advantage of page cache effects. 
+However, in some network topologies,hitting the same replica may cause 
+issues like container taking a long time to download from hdfs and 
eventually
+failing. In these cases, we could make this property true and randomize 
+block locations of a block, which in turn would load balance requests
+among replicas.
+  /description
+/property
{code}

* that - block locations remove dash
* randomized,so needs space
* topologies,hitting needs space
* hdfs should be HDFS
* Since this is XML, quotes need to be escaped. Or you can just remove them. 
Line breaks are also not going to show up.

Recommend something like the following (feel free to copy paste):

When fetching replica locations of a block, the replicas are sorted based on 
network distance. This configuration parameter determines whether the replicas 
at the same network distance are randomly shuffled. By default, this is false, 
such that repeated requests for a block's replicas always result in the same 
order. This potentially improves page cache behavior. However, for some network 
topologies, it is desirable to shuffle this order for better load balancing.

 Make seed optional in NetworkTopology#sortByDistance
 

 Key: HDFS-6701
 URL: https://issues.apache.org/jira/browse/HDFS-6701
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.0
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
 Attachments: HDFS-6701-v1.txt, HDFS-6701-v3-branch2.txt, 
 HDFS-6701-v3.txt


 Currently seed in NetworkTopology#sortByDistance is set to the blockid which 
 causes the RNG to generate same pseudo random order for each block. If no 
 node local block location is present,this causes the same rack local replica 
 to be hit for a particular block.
 It'll be good to make the seed optional, so that one could turn it off if 
 they want block locations of a block to be randomized.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-118) Namenode clients should recover from connection or Namenode restarts

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-118.
---

Resolution: Fixed

HA!

As in highly available has been added to the NN which helps mitigate this 
problem so closing as fixed.

 Namenode clients should recover from connection or Namenode restarts
 

 Key: HDFS-118
 URL: https://issues.apache.org/jira/browse/HDFS-118
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 This Jira discusses the client side recovery from namenode restarts, fail 
 overs and network connectivity issues. This does not address Namenode high 
 availability and tracks only the client side recovery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-125) Consistency of different replicas of the same block is not checked.

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068950#comment-14068950
 ] 

Allen Wittenauer commented on HDFS-125:
---

This feels like a special case of the problems mentioned in HDFS-366 .

 Consistency of different replicas of the same block is not checked.
 ---

 Key: HDFS-125
 URL: https://issues.apache.org/jira/browse/HDFS-125
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Konstantin Shvachko

 HDFS currently detects corrupted replicas by verifying that its contents 
 matches the checksum stored in the block meta-file. This is done 
 independently for each replica of the block on the data-node it belongs to. 
 But we do not check that the replicas are identical across data-nodes as long 
 as they have the same size.
 This is not common but can happen as a result of a software bug or an 
 operator mismanagement. And in this case different clients will read 
 different data from the same file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HDFS-125) Consistency of different replicas of the same block is not checked.

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068950#comment-14068950
 ] 

Allen Wittenauer edited comment on HDFS-125 at 7/21/14 6:12 PM:


This feels like a special case of the problems mentioned in HDFS-366 and 
friends.


was (Author: aw):
This feels like a special case of the problems mentioned in HDFS-366 .

 Consistency of different replicas of the same block is not checked.
 ---

 Key: HDFS-125
 URL: https://issues.apache.org/jira/browse/HDFS-125
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Konstantin Shvachko

 HDFS currently detects corrupted replicas by verifying that its contents 
 matches the checksum stored in the block meta-file. This is done 
 independently for each replica of the block on the data-node it belongs to. 
 But we do not check that the replicas are identical across data-nodes as long 
 as they have the same size.
 This is not common but can happen as a result of a software bug or an 
 operator mismanagement. And in this case different clients will read 
 different data from the same file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6455) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts

2014-07-21 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068962#comment-14068962
 ] 

Brandon Li commented on HDFS-6455:
--

Thank you, [~abutala]. The patch looks good. It would be nice to add a unit 
test to validate the fix. You can use TestReaddir as a reference to add the 
unit test.

 NFS: Exception should be added in NFS log for invalid separator in 
 allowed.hosts
 

 Key: HDFS-6455
 URL: https://issues.apache.org/jira/browse/HDFS-6455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
 Attachments: HDFS-6455.002.patch, HDFS-6455.patch


 The error for invalid separator in dfs.nfs.exports.allowed.hosts property 
 should be added in nfs log file instead nfs.out file.
 Steps to reproduce:
 1. Pass invalid separator in dfs.nfs.exports.allowed.hosts
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1  ro:host2 
 rw/value/property
 {noformat}
 2. restart NFS server. NFS server fails to start and print exception console.
 {noformat}
 [hrt_qa@host1 hwqe]$ ssh -o StrictHostKeyChecking=no -o 
 UserKnownHostsFile=/dev/null host1 sudo su - -c 
 \/usr/lib/hadoop/sbin/hadoop-daemon.sh start nfs3\ hdfs
 starting nfs3, logging to /tmp/log/hadoop/hdfs/hadoop-hdfs-nfs3-horst1.out
 DEPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 Exception in thread main java.lang.IllegalArgumentException: Incorrectly 
 formatted line 'host1 ro:host2 rw'
   at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356)
   at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151)
   at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176)
   at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43)
   at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59)
 {noformat}
 NFS log does not print any error message. It directly shuts down. 
 {noformat}
 STARTUP_MSG:   java = 1.6.0_31
 /
 2014-05-27 18:47:13,972 INFO  nfs3.Nfs3Base (SignalLogger.java:register(91)) 
 - registered UNIX signal handlers for [TERM, HUP, INT]
 2014-05-27 18:47:14,169 INFO  nfs3.IdUserGroup 
 (IdUserGroup.java:updateMapInternal(159)) - Updated user map size:259
 2014-05-27 18:47:14,179 INFO  nfs3.IdUserGroup 
 (IdUserGroup.java:updateMapInternal(159)) - Updated group map size:73
 2014-05-27 18:47:14,192 INFO  nfs3.Nfs3Base (StringUtils.java:run(640)) - 
 SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down Nfs3 at 
 {noformat}
 NFS.out file has exception.
 {noformat}
 EPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 Exception in thread main java.lang.IllegalArgumentException: Incorrectly 
 formatted line 'host1 ro:host2 rw'
 at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356)
 at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151)
 at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176)
 at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43)
 at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59)
 ulimit -a for user hdfs
 core file size  (blocks, -c) 409600
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 188893
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 65536
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem

2014-07-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068965#comment-14068965
 ] 

Yongjun Zhang commented on HDFS-6707:
-

HI Colin, thanks for the comments. Same feelings on my side that I wish we 
don't have to depend on parsing shell output. 
If you look at the patch, you can see that I've already got rid of the 
dependency on quote type, I used - as the separator to find the link target, 
which hopefully is more robust.
Thanks.


 Intermittent failure of Symlink tests 
 TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem 
 -

 Key: HDFS-6707
 URL: https://issues.apache.org/jira/browse/HDFS-6707
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: symlinks
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, 
 HDFS-6707.003.dbg.patch, HDFS-6707.004.patch


 Symlink tests failure happened from time to time,
 https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/
 https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/
 {code}
 Failed
 org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink
 Failing for the past 1 build (Since Failed#7376 )
 Took 83 ms.
 Error Message
 Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
 Stacktrace
 java.io.IOException: Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
   at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266)
   at 
 org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 Standard Output
 2014-07-17 23:31:37,770 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile':
  No such file or directory
 2014-07-17 23:31:38,109 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile':
  File exists
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-327) DataNode should warn about unknown files in storage

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-327:
--

Labels: newbie  (was: )

 DataNode should warn about unknown files in storage
 ---

 Key: HDFS-327
 URL: https://issues.apache.org/jira/browse/HDFS-327
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Raghu Angadi
Assignee: Jakob Homan
  Labels: newbie

 DataNode currently just ignores the files it does not know about. There could 
 be a lot of files left in DataNode's storage that never get noticed or 
 deleted. These files could be left because of bugs or by a misconfiguration. 
 E.g. while upgrading from 0.17, DN left a lot of metada files that were not 
 named in correct format for 0.18 (HADOOP-4663).
 The proposal here is simply to make DN print a warning for each of the 
 unknown files at the start up. This at least gives a way to list all the 
 unknown files and  (equally importantly) forces a notion of known and 
 unknown files in the storage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-327) DataNode should warn about unknown files in storage

2014-07-21 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-327:
-

Assignee: (was: Jakob Homan)

 DataNode should warn about unknown files in storage
 ---

 Key: HDFS-327
 URL: https://issues.apache.org/jira/browse/HDFS-327
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Raghu Angadi
  Labels: newbie

 DataNode currently just ignores the files it does not know about. There could 
 be a lot of files left in DataNode's storage that never get noticed or 
 deleted. These files could be left because of bugs or by a misconfiguration. 
 E.g. while upgrading from 0.17, DN left a lot of metada files that were not 
 named in correct format for 0.18 (HADOOP-4663).
 The proposal here is simply to make DN print a warning for each of the 
 unknown files at the start up. This at least gives a way to list all the 
 unknown files and  (equally importantly) forces a notion of known and 
 unknown files in the storage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068990#comment-14068990
 ] 

Colin Patrick McCabe commented on HDFS-6707:


+1 once you add a unit test to TestStat.java (and pending Jenkins, of course)

Thanks, Yongjun.

 Intermittent failure of Symlink tests 
 TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem 
 -

 Key: HDFS-6707
 URL: https://issues.apache.org/jira/browse/HDFS-6707
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: symlinks
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, 
 HDFS-6707.003.dbg.patch, HDFS-6707.004.patch


 Symlink tests failure happened from time to time,
 https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/
 https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/
 {code}
 Failed
 org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink
 Failing for the past 1 build (Since Failed#7376 )
 Took 83 ms.
 Error Message
 Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
 Stacktrace
 java.io.IOException: Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
   at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266)
   at 
 org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 Standard Output
 2014-07-17 23:31:37,770 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile':
  No such file or directory
 2014-07-17 23:31:38,109 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile':
  File exists
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-286) Move Datanode packet IO logging to its own log

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-286:
--

Labels: newbie  (was: )

 Move Datanode packet IO logging to its own log
 --

 Key: HDFS-286
 URL: https://issues.apache.org/jira/browse/HDFS-286
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Steve Loughran
Priority: Minor
  Labels: newbie

 If the Datanode is set to log at info, then the log fills up with lots of 
 details about packet sending and receiving
 [sf-startdaemon-debug] 09/01/28 13:15:42 
 [org.apache.hadoop.hdfs.server.datanode.DataXceiver@83efe6e] INFO 
 datanode.DataNode : Receiving block blk_-3185775405544105186_1757 src: 
 /127.0.0.1:41218 dest: /127.0.0.1:48017
 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block 
 blk_-3185775405544105186_1757] INFO datanode.DataNode : Received block 
 blk_-3185775405544105186_1757 of size 3647 from /127.0.0.1:41218
 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block 
 blk_-3185775405544105186_1757] INFO datanode.DataNode : PacketResponder 0 for 
 block blk_-3185775405544105186_1757 terminating
 [sf-startdaemon-debug] 09/01/28 13:15:42 
 [org.apache.hadoop.hdfs.server.datanode.DataXceiver@83f0029] INFO 
 datanode.DataNode : Receiving block blk_-1511363731410268168_1758 src: 
 /127.0.0.1:41219 dest: /127.0.0.1:48017
 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block 
 blk_-1511363731410268168_1758] INFO datanode.DataNode : Received block 
 blk_-1511363731410268168_1758 of size 940 from /127.0.0.1:41219
 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block 
 blk_-1511363731410268168_1758] INFO datanode.DataNode : PacketResponder 0 for 
 block blk_-1511363731410268168_1758 terminating
 [sf-startdaemon-debug] 09/01/28 13:15:42 
 [org.apache.hadoop.hdfs.server.datanode.DataXceiver@83f01e4] INFO 
 datanode.DataNode : Receiving block blk_-967265843864311176_1759 src: 
 /127.0.0.1:41220 dest: /127.0.0.1:48017
 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block 
 blk_-967265843864311176_1759] INFO datanode.DataNode : Received block 
 blk_-967265843864311176_1759 of size 36948 from /127.0.0.1:41220
 It would be convenient for those people who only want to see errors in 
 communication but monitor other DataNode operations to have a separate logger 
 for DataNode communications; one to view at a separate log level.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met

2014-07-21 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069031#comment-14069031
 ] 

Brandon Li commented on HDFS-6706:
--

With further investigation, we found this is due to a misconfiguration. Closing 
as invalid.

 ZKFailoverController failed to recognize the quorum is not met
 --

 Key: HDFS-6706
 URL: https://issues.apache.org/jira/browse/HDFS-6706
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brandon Li
Assignee: Brandon Li

 Thanks Kenny Zhang for finding this problem.
 The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc 
 -format doesn't log the real problem. And then user will see the error 
 message instead of the real issue when starting zkfc:
 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. 
 Parent znode does not exist.
 Run with -formatZK flag to initialize ZooKeeper.
 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received 
 create error from Zookeeper. code:NONODE for path 
 /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 
 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when 
 processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 
 txntype:-1 reqpath:n/a Error 
 Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = 
 NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 To reproduce the problem:
 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum 
 to 3.
 2. start two zookeeper servers.
 3. do hdfs zkfc -format, and then hdfs zkfc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount

2014-07-21 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6703:
-

Affects Version/s: (was: 2.6.0)
   2.2.0

 NFS: Files can be deleted from a read-only mount
 

 Key: HDFS-6703
 URL: https://issues.apache.org/jira/browse/HDFS-6703
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Abhiraj Butala
Assignee: Srikanth Upputuri

   
 As reported by bigdatagroup bigdatagr...@itecons.it on hadoop-users mailing 
 list:
 {code}
 We exported our distributed filesystem with the following configuration 
 (Managed by Cloudera Manager over CDH 5.0.1):
  property
 namedfs.nfs.exports.allowed.hosts/name
 value192.168.0.153 ro/value
   /property
 As you can see, we expect the exported FS to be read-only, but in fact we are 
 able to delete files and folders stored on it (where the user has the correct 
 permissions), from  the client machine that mounted the FS.
 Other writing operations are correctly blocked.
 Hadoop Version in use: 2.3.0+cdh5.0.1+567
 {code}
 I was able to reproduce the issue on latest hadoop trunk. Though I could only 
 delete files, deleting directories were correctly blocked:
 {code}
 abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127
 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1)
 abutala@abutala-vBox:/mnt/hdfs$ ls -lh
 total 512
 -rw-r--r-- 1 abutala supergroup  0 Jul 17 18:51 abc.txt
 drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp
 abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt
 abutala@abutala-vBox:/mnt/hdfs$ ls
 temp
 abutala@abutala-vBox:/mnt/hdfs$ rm -r temp
 rm: cannot remove `temp': Permission denied
 abutala@abutala-vBox:/mnt/hdfs$ ls
 temp
 abutala@abutala-vBox:/mnt/hdfs$
 {code}
 Contents of hdfs-site.xml:
 {code}
 configuration
 property
 namedfs.nfs3.dump.dir/name
 value/tmp/.hdfs-nfs3/value
 /property
 property
 namedfs.nfs.exports.allowed.hosts/name
 valuelocalhost ro/value
 /property
 /configuration
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount

2014-07-21 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6703:
-

Affects Version/s: 2.6.0

 NFS: Files can be deleted from a read-only mount
 

 Key: HDFS-6703
 URL: https://issues.apache.org/jira/browse/HDFS-6703
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Abhiraj Butala
Assignee: Srikanth Upputuri

   
 As reported by bigdatagroup bigdatagr...@itecons.it on hadoop-users mailing 
 list:
 {code}
 We exported our distributed filesystem with the following configuration 
 (Managed by Cloudera Manager over CDH 5.0.1):
  property
 namedfs.nfs.exports.allowed.hosts/name
 value192.168.0.153 ro/value
   /property
 As you can see, we expect the exported FS to be read-only, but in fact we are 
 able to delete files and folders stored on it (where the user has the correct 
 permissions), from  the client machine that mounted the FS.
 Other writing operations are correctly blocked.
 Hadoop Version in use: 2.3.0+cdh5.0.1+567
 {code}
 I was able to reproduce the issue on latest hadoop trunk. Though I could only 
 delete files, deleting directories were correctly blocked:
 {code}
 abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127
 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1)
 abutala@abutala-vBox:/mnt/hdfs$ ls -lh
 total 512
 -rw-r--r-- 1 abutala supergroup  0 Jul 17 18:51 abc.txt
 drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp
 abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt
 abutala@abutala-vBox:/mnt/hdfs$ ls
 temp
 abutala@abutala-vBox:/mnt/hdfs$ rm -r temp
 rm: cannot remove `temp': Permission denied
 abutala@abutala-vBox:/mnt/hdfs$ ls
 temp
 abutala@abutala-vBox:/mnt/hdfs$
 {code}
 Contents of hdfs-site.xml:
 {code}
 configuration
 property
 namedfs.nfs3.dump.dir/name
 value/tmp/.hdfs-nfs3/value
 /property
 property
 namedfs.nfs.exports.allowed.hosts/name
 valuelocalhost ro/value
 /property
 /configuration
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met

2014-07-21 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li resolved HDFS-6706.
--

Resolution: Invalid

 ZKFailoverController failed to recognize the quorum is not met
 --

 Key: HDFS-6706
 URL: https://issues.apache.org/jira/browse/HDFS-6706
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brandon Li
Assignee: Brandon Li

 Thanks Kenny Zhang for finding this problem.
 The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc 
 -format doesn't log the real problem. And then user will see the error 
 message instead of the real issue when starting zkfc:
 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. 
 Parent znode does not exist.
 Run with -formatZK flag to initialize ZooKeeper.
 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received 
 create error from Zookeeper. code:NONODE for path 
 /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 
 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when 
 processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 
 txntype:-1 reqpath:n/a Error 
 Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = 
 NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 To reproduce the problem:
 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum 
 to 3.
 2. start two zookeeper servers.
 3. do hdfs zkfc -format, and then hdfs zkfc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem

2014-07-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069032#comment-14069032
 ] 

Yongjun Zhang commented on HDFS-6707:
-

Thanks Colin, will do!


 Intermittent failure of Symlink tests 
 TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem 
 -

 Key: HDFS-6707
 URL: https://issues.apache.org/jira/browse/HDFS-6707
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: symlinks
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, 
 HDFS-6707.003.dbg.patch, HDFS-6707.004.patch


 Symlink tests failure happened from time to time,
 https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/
 https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/
 {code}
 Failed
 org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink
 Failing for the past 1 build (Since Failed#7376 )
 Took 83 ms.
 Error Message
 Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
 Stacktrace
 java.io.IOException: Path 
 file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile
  is not a symbolic link
   at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266)
   at 
 org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 Standard Output
 2014-07-17 23:31:37,770 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile':
  No such file or directory
 2014-07-17 23:31:38,109 WARN  fs.FileUtil (FileUtil.java:symLink(829)) - 
 Command 'ln -s 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file
  
 /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile'
  failed 1 with: ln: failed to create symbolic link 
 '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile':
  File exists
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069061#comment-14069061
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6680:
---

{code}
-  for (int i = 0; i  Math.min(favoredNodes.size(), numOfReplicas); i++) {
+  for (int i = 0; i  favoredNodes.size()  results.size()  
numOfReplicas; i++) {
{code}
I found out this bug by the new test.  Consider favoredNodes.size() == 4 and 
numOfReplicas == 3.  So min is 3 and only the first 3 datanodes will be tried 
before the change.  If the one of these three datanodes is not chosen, it won't 
try the 4th datanode.

 BlockPlacementPolicyDefault does not choose favored nodes correctly
 ---

 Key: HDFS-6680
 URL: https://issues.apache.org/jira/browse/HDFS-6680
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6680_20140714.patch, h6680_20140716.patch


 In one of the chooseTarget(..) methods, it tries all the favoredNodes to 
 chooseLocalNode(..).  It expects chooseLocalNode to return null if the local 
 node is not a good target.  Unfortunately, chooseLocalNode will fallback to 
 chooseLocalRack but not returning null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-301) Provide better error messages when fs.default.name is invalid

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-301:
--

Labels: newbie  (was: )

 Provide better error messages when fs.default.name is invalid
 -

 Key: HDFS-301
 URL: https://issues.apache.org/jira/browse/HDFS-301
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
  Labels: newbie
 Attachments: HADOOP-5095-1.patch


 this the followon to HADOOP-5687 - its not enough to detect bad uris, we need 
 good error messages and a set of tests to make sure everything works as 
 intended.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-196) File length not reported correctly after application crash

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069093#comment-14069093
 ] 

Allen Wittenauer commented on HDFS-196:
---

Ping!

I'm tempted to close this as stale, but it would be good for someone more 
familiar with the issue to do that.

 File length not reported correctly after application crash
 --

 Key: HDFS-196
 URL: https://issues.apache.org/jira/browse/HDFS-196
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Doug Judd

 Our application (Hypertable) creates a transaction log in HDFS.  This log is 
 written with the following pattern:
 out_stream.write(header, 0, 7);
 out_stream.sync()
 out_stream.write(data, 0, amount);
 out_stream.sync()
 [...]
 However, if the application crashes and then comes back up again, the 
 following statement
 length = mFilesystem.getFileStatus(new Path(fileName)).getLen();
 returns the wrong length.  Apparently this is because this method fetches 
 length information from the NameNode which is stale.  Ideally, a call to 
 getFileStatus() would return the accurate file length by fetching the size of 
 the last block from the primary datanode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069099#comment-14069099
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6441:
---

[~benoyantony], you may simply use the code in HDFS-6010.

 Add ability to exclude/include few datanodes while balancing
 

 Key: HDFS-6441
 URL: https://issues.apache.org/jira/browse/HDFS-6441
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer
Affects Versions: 2.4.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
 HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, 
 HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch


 In some use cases, it is desirable to ignore a few data nodes  while 
 balancing. The administrator should be able to specify a list of data nodes 
 in a file similar to the hosts file and the balancer should ignore these data 
 nodes while balancing so that no blocks are added/removed on these nodes.
 Similarly it will be beneficial to specify that only a particular list of 
 datanodes should be considered for balancing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-402) Display the server version in dfsadmin -report

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069096#comment-14069096
 ] 

Allen Wittenauer commented on HDFS-402:
---

Is this still a viable idea, especially as we move towards rolling upgrades?  
What is the version at that point?

 Display the server version in dfsadmin -report
 --

 Key: HDFS-402
 URL: https://issues.apache.org/jira/browse/HDFS-402
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: newbie
 Attachments: HDFS-402.patch, HDFS-402.patch, HDFS-402.patch, 
 hdfs-402.txt


 As part of HADOOP-5094, it was requested to include the server version in the 
 dfsadmin -report, to avoid the need to screen scrape to get this information:
 bq. Please do provide the server version, so there is a quick and non-taxing 
 way of determine what is the current running version on the namenode.
 Currently there is nothing in the dfs client protocol to query this 
 information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-196) File length not reported correctly after application crash

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-196.
--

Resolution: Not a Problem

sync() does not update the length in NN.  So getFileSatus() will return the 
correct length immediately as Dhruba mentioned.

Anyway, sync() is already removed from trunk (HDFS-3034).  hsync(..) with 
UPDATE_LENGTH flag could be used instead.  So this becomes not-a-problem 
anymore.  Resolving ...

 File length not reported correctly after application crash
 --

 Key: HDFS-196
 URL: https://issues.apache.org/jira/browse/HDFS-196
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Doug Judd

 Our application (Hypertable) creates a transaction log in HDFS.  This log is 
 written with the following pattern:
 out_stream.write(header, 0, 7);
 out_stream.sync()
 out_stream.write(data, 0, amount);
 out_stream.sync()
 [...]
 However, if the application crashes and then comes back up again, the 
 following statement
 length = mFilesystem.getFileStatus(new Path(fileName)).getLen();
 returns the wrong length.  Apparently this is because this method fetches 
 length information from the NameNode which is stale.  Ideally, a call to 
 getFileStatus() would return the accurate file length by fetching the size of 
 the last block from the primary datanode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-196) File length not reported correctly after application crash

2014-07-21 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069112#comment-14069112
 ] 

Tsz Wo Nicholas Sze commented on HDFS-196:
--

 ... getFileSatus() will return the correct length ...

Oops, it should be ... getFileSatus() will NOT return the correct length 

 File length not reported correctly after application crash
 --

 Key: HDFS-196
 URL: https://issues.apache.org/jira/browse/HDFS-196
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Doug Judd

 Our application (Hypertable) creates a transaction log in HDFS.  This log is 
 written with the following pattern:
 out_stream.write(header, 0, 7);
 out_stream.sync()
 out_stream.write(data, 0, amount);
 out_stream.sync()
 [...]
 However, if the application crashes and then comes back up again, the 
 following statement
 length = mFilesystem.getFileStatus(new Path(fileName)).getLen();
 returns the wrong length.  Apparently this is because this method fetches 
 length information from the NameNode which is stale.  Ideally, a call to 
 getFileStatus() would return the accurate file length by fetching the size of 
 the last block from the primary datanode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-219) Add md5sum facility in dfsshell

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-219:
--

Labels: newbie  (was: )

 Add md5sum facility in dfsshell
 ---

 Key: HDFS-219
 URL: https://issues.apache.org/jira/browse/HDFS-219
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: zhangwei
  Labels: newbie

 I think it would be usefull to add md5sum (or anyone else) to dfsshell ,and 
 the facility can verify the file on hdfs.It can confirm the file is integrity 
 after copyFromLocal or copyToLocal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-374) HDFS needs to support a very large number of open files.

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069139#comment-14069139
 ] 

Colin Patrick McCabe commented on HDFS-374:
---

oh, and also, using short-circuit reads mitigates this somewhat as well.  We 
can share the same file descriptor across multiple instances of a short-circuit 
block file being opened, as well.

 HDFS needs to support a very large number of open files.
 

 Key: HDFS-374
 URL: https://issues.apache.org/jira/browse/HDFS-374
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jim Kellerman

 Currently, DFSClient maintains one socket per open file. For most map/reduce 
 operations, this is not a problem because there just aren't many open files.
 However, HBase has a very different usage model in which a single region 
 region server could have thousands (10**3 but less than 10**4) open files. 
 This can cause both datanodes and region servers to run out of file handles.
 What I would like to see is one connection for each dfsClient, datanode pair. 
 This would reduce the number of connections to hundreds or tens of sockets.
 The intent is not to process requests totally asychronously (overlapping 
 block reads and forcing the client to reassemble a whole message out of a 
 bunch of fragments), but rather to queue requests from the client to the 
 datanode and process them serially, differing from the current implementation 
 in that rather than use an exclusive socket for each file, only one socket is 
 in use between the client and a particular datanode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-374) HDFS needs to support a very large number of open files.

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069136#comment-14069136
 ] 

Colin Patrick McCabe commented on HDFS-374:
---

This is less of an issue than it once was, due to HBase using pread (positional 
read) more.  pread creates a new RemoteBlockReader each time, but closes it 
immediately after, returning the socket to the client-side socket cache 
(PeerCache).  However, running out of open file descriptors could still be an 
issue with some HBase configurations.  Anyway, I agree with closing this since 
in all the cases I've seen, fixing the configuration resolved the issue.  I 
also don't like the queueing idea presented here since it would increase 
latency.

 HDFS needs to support a very large number of open files.
 

 Key: HDFS-374
 URL: https://issues.apache.org/jira/browse/HDFS-374
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jim Kellerman

 Currently, DFSClient maintains one socket per open file. For most map/reduce 
 operations, this is not a problem because there just aren't many open files.
 However, HBase has a very different usage model in which a single region 
 region server could have thousands (10**3 but less than 10**4) open files. 
 This can cause both datanodes and region servers to run out of file handles.
 What I would like to see is one connection for each dfsClient, datanode pair. 
 This would reduce the number of connections to hundreds or tens of sockets.
 The intent is not to process requests totally asychronously (overlapping 
 block reads and forcing the client to reassemble a whole message out of a 
 bunch of fragments), but rather to queue requests from the client to the 
 datanode and process them serially, differing from the current implementation 
 in that rather than use an exclusive socket for each file, only one socket is 
 in use between the client and a particular datanode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-405) Several unit tests failing on Windows frequently

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-405.
---

Resolution: Fixed

Closing as stale.


 Several unit tests failing on Windows frequently
 

 Key: HDFS-405
 URL: https://issues.apache.org/jira/browse/HDFS-405
 Project: Hadoop HDFS
  Issue Type: Test
 Environment: Windows
Reporter: Ramya Sunil
Priority: Minor

 This issue is similar to HADOOP-5114. A huge number of unit tests are failing 
 on Windows on branch  18 consistently. 0.21 is showing the maximum number of 
 failures. Failures on other branches are a subset of failures observed in 
 0.21. Below is the list of failures observed on 0.21.
 * java.io.IOException: Job failed!
 ** TestJobName - testComplexNameWithRegex
 ** TestJobStatusPersistency - testNonPersistency, testPersistency
 ** TestJobSysDirWithDFS - testWithDFS
 ** TestKillCompletedJob - testKillCompJob
 ** TestMiniMRClasspath - testClassPath, testExternalWritable
 ** TestMiniMRDFSCaching - testWithDFS
 ** TestMiniMRDFSSort - testMapReduceSort, testMapReduceSortWithJvmReuse
 ** TestMiniMRLocalFS - testWithLocal
 ** TestMiniMRWithDFS - testWithDFS, testWithDFSWithDefaultPort
 ** TestMiniMRWithDFSWithDistinctUsers - testDistinctUsers
 ** TestMultipleLevelCaching - testMultiLevelCaching
 ** TestQueueManager - testAllEnabledACLForJobSubmission, 
 testEnabledACLForNonDefaultQueue,  testUserEnabledACLForJobSubmission,  
 testGroupsEnabledACLForJobSubmission
 ** TestRackAwareTaskPlacement - testTaskPlacement
 ** TestReduceFetch - testReduceFromDisk, testReduceFromPartialMem, 
 testReduceFromMem
 ** TestSpecialCharactersInOutputPath - testJobWithDFS
 ** TestTTMemoryReporting - testDefaultMemoryValues, testConfiguredMemoryValues
 ** TestTrackerBlacklistAcrossJobs - testBlacklistAcrossJobs
 ** TestUserDefinedCounters - testMapReduceJob
 ** TestDBJob - testRun
 ** TestServiceLevelAuthorization - testServiceLevelAuthorization
 ** TestNoDefaultsJobConf - testNoDefaults
 ** TestBadRecords - testBadMapRed
 ** TestClusterMRNotification - testMR
 ** TestClusterMapReduceTestCase - testMapReduce, testMapReduceRestarting
 ** TestCommandLineJobSubmission - testJobShell
 ** TestCompressedEmptyMapOutputs - 
 testMapReduceSortWithCompressedEmptyMapOutputs
 ** TestCustomOutputCommitter - testCommitter
 ** TestJavaSerialization - testMapReduceJob, testWriteToSequencefile
 ** TestJobClient - testGetCounter, testJobList, testChangingJobPriority
 ** TestJobName - testComplexName
 * java.lang.IllegalArgumentException: Pathname /path from Cpath is not a 
 valid DFS filename.
 ** TestJobQueueInformation - testJobQueues
 ** TestJobInProgress - testRunningTaskCount
 ** TestJobTrackerRestart - testJobTrackerRestart
 * Timeout
 ** TestKillSubProcesses - testJobKill
 ** TestMiniMRMapRedDebugScript - testMapDebugScript
 ** TestControlledMapReduceJob - testControlledMapReduceJob
 ** TestJobInProgressListener - testJobQueueChanges
 ** TestJobKillAndFail - testJobFailAndKill
 * junit.framework.AssertionFailedError
 ** TestMRServerPorts - testJobTrackerPorts, testTaskTrackerPorts
 ** TestMiniMRTaskTempDir - testTaskTempDir
 ** TestTaskFail - testWithDFS
 ** TestTaskLimits - testTaskLimits
 ** TestMapReduceLocal - testWithLocal
 ** TestCLI - testAll
 ** TestHarFileSystem - testArchives
 ** TestTrash - testTrash, testNonDefaultFS
 ** TestHDFSServerPorts - testNameNodePorts, testDataNodePorts, 
 testSecondaryNodePorts
 ** TestHDFSTrash - testNonDefaultFS
 ** TestFileOutputFormat - testCustomFile
 * org.apache.hadoop.ipc.RemoteException: 
 org.apache.hadoop.security.authorize.AuthorizationException: 
 java.security.AccessControlException: access denied 
 ConnectionPermission(org.apache.hadoop.security.authorize.RefreshAuthorizationPolicyProtocol)
 ** TestServiceLevelAuthorization - testRefresh
 * junit.framework.ComparisonFailure
 ** TestDistCh - testDistCh
 * java.io.FileNotFoundException
 ** TestCopyFiles - testMapCount



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-402) Display the server version in dfsadmin -report

2014-07-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069176#comment-14069176
 ] 

Hadoop QA commented on HDFS-402:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12531921/hdfs-402.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7410//console

This message is automatically generated.

 Display the server version in dfsadmin -report
 --

 Key: HDFS-402
 URL: https://issues.apache.org/jira/browse/HDFS-402
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: newbie
 Attachments: HDFS-402.patch, HDFS-402.patch, HDFS-402.patch, 
 hdfs-402.txt


 As part of HADOOP-5094, it was requested to include the server version in the 
 dfsadmin -report, to avoid the need to screen scrape to get this information:
 bq. Please do provide the server version, so there is a quick and non-taxing 
 way of determine what is the current running version on the namenode.
 Currently there is nothing in the dfs client protocol to query this 
 information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-465) in branch-1, libhdfs makes jni lib calls after setting errno in some places

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069184#comment-14069184
 ] 

Colin Patrick McCabe commented on HDFS-465:
---

Yeah.  This was fixed in HDFS-3579.  This JIRA was just a placeholder in case 
people wanted to fix it in branch-1 and earlier.

 in branch-1, libhdfs makes jni lib calls after setting errno in some places
 ---

 Key: HDFS-465
 URL: https://issues.apache.org/jira/browse/HDFS-465
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.1.2
Reporter: Pete Wyckoff
Assignee: Pete Wyckoff
 Attachments: HADOOP-4636.txt


 errno can be affected by other lib calls, so should always be set right 
 before return stmt and never before making other library calls.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-332) hadoop fs -put should return different code for different failures

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-332.
---

Resolution: Won't Fix

Returning -1 for failure at the shell is pretty normal. Won't fix.

 hadoop fs -put should return different code for different failures
 --

 Key: HDFS-332
 URL: https://issues.apache.org/jira/browse/HDFS-332
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Runping Qi
Assignee: Ravi Phulari

 hadoop fs -put may fail due to different reasons, such as the source file 
 does not exist, the destination file already exists, permission denied, or 
 exceptions during writing.
 However, it returns the same code (-1), making it impossible to tell what is 
 the actual cause of the failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-97) DFS should detect slow links(nodes) and avoid them

2014-07-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069192#comment-14069192
 ] 

Colin Patrick McCabe commented on HDFS-97:
--

This is also related to hedged reads.  Even if we blacklist a datanode after a 
timeout / latency spike has happened, the damage is done.  Hedged reads can 
avoid the latency spike in the first place.

 DFS should detect slow links(nodes) and avoid them
 --

 Key: HDFS-97
 URL: https://issues.apache.org/jira/browse/HDFS-97
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Runping Qi

 The current DFS does not detect slow links (nodes).
 Thus, when a node or its network link is slow, it may affect the overall 
 system performance significantly.
 Specifically, when a map job needs to read data from such a node, it may 
 progress 10X slower.
 And when a DFS data node pipeline consists of such a node, the write 
 performance degrades significantly.
 This may lead to some long tails for map/reduce jobs. We have experienced 
 such behaviors quite often.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met

2014-07-21 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069226#comment-14069226
 ] 

Yongjun Zhang commented on HDFS-6706:
-

HI [~brandonli], 

Thanks for reporting and addressing the issue.  I have some questions here.  
The original report seems to indicate that the reported error message doesn't 
indicate the real reason of failure. My questions are,
1. In the case reported initially, the real problem was said to be The zkfc 
cannot be startup due to ha.zookeeper.quorum is not met. With your last 
update, can we say the real problem is a misconfiguration?
2. What kind of misconfiguration caused the symptom?
3. When misconfigured, user will still see the reported error message. Should 
we have the error message to tell that the symptom is caused by the possible 
misconfiguration?

Thanks.
.


 ZKFailoverController failed to recognize the quorum is not met
 --

 Key: HDFS-6706
 URL: https://issues.apache.org/jira/browse/HDFS-6706
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brandon Li
Assignee: Brandon Li

 Thanks Kenny Zhang for finding this problem.
 The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc 
 -format doesn't log the real problem. And then user will see the error 
 message instead of the real issue when starting zkfc:
 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. 
 Parent znode does not exist.
 Run with -formatZK flag to initialize ZooKeeper.
 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received 
 create error from Zookeeper. code:NONODE for path 
 /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 
 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when 
 processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 
 txntype:-1 reqpath:n/a Error 
 Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = 
 NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 To reproduce the problem:
 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum 
 to 3.
 2. start two zookeeper servers.
 3. do hdfs zkfc -format, and then hdfs zkfc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6696) Name node cannot start if the path of a file under construction contains .snapshot

2014-07-21 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069235#comment-14069235
 ] 

Mit Desai commented on HDFS-6696:
-

[~andrew.wang], we were trying to upgrade 0.21.11 to 2.4.0

 Name node cannot start if the path of a file under construction contains 
 .snapshot
 

 Key: HDFS-6696
 URL: https://issues.apache.org/jira/browse/HDFS-6696
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Andrew Wang
Priority: Blocker

 Using {{-renameReserved}} to rename .snapshot in a pre-hdfs-snapshot 
 feature fsimage during upgrade only works, if there is nothing under 
 construction under the renamed directory.  I am not sure whether it takes 
 care of edits containing .snapshot properly.
 The workaround is to identify these directories and rename, then do 
 {{saveNamespace}} before performing upgrade.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met

2014-07-21 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069247#comment-14069247
 ] 

Brandon Li commented on HDFS-6706:
--

[~yzhangal], I should have added more explanation. 
In the case, the 3 zookeeper was not correctly configured as an ensemble. 
Basically none of them was in an ensemble. However, all of them were configured 
in core-site.xml in ha.zookeeper.quorum. 
When zkfs is started, it talked to a different zk server which is not the 
previously formatted one.

 ZKFailoverController failed to recognize the quorum is not met
 --

 Key: HDFS-6706
 URL: https://issues.apache.org/jira/browse/HDFS-6706
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Brandon Li
Assignee: Brandon Li

 Thanks Kenny Zhang for finding this problem.
 The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc 
 -format doesn't log the real problem. And then user will see the error 
 message instead of the real issue when starting zkfc:
 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. 
 Parent znode does not exist.
 Run with -formatZK flag to initialize ZooKeeper.
 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController 
 (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received 
 create error from Zookeeper. code:NONODE for path 
 /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 
 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when 
 processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 
 txntype:-1 reqpath:n/a Error 
 Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = 
 NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock
 To reproduce the problem:
 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum 
 to 3.
 2. start two zookeeper servers.
 3. do hdfs zkfc -format, and then hdfs zkfc



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-333) A State Machine for name-node blocks.

2014-07-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069249#comment-14069249
 ] 

Konstantin Shvachko commented on HDFS-333:
--

It is still valid. It would formalize and simplify block life cycle management.

 A State Machine for name-node blocks.
 -

 Key: HDFS-333
 URL: https://issues.apache.org/jira/browse/HDFS-333
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Konstantin Shvachko

 Blocks on the name-node can belong to different collections like the 
 blocksMap, under-replicated, over-replicated lists, etc.
 It is getting more and more complicated to keep the lists consistent.
 It would be good to formalize the movement of the blocks between the 
 collections using a state machine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-253) Method to retrieve all quotas active on HDFS

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069252#comment-14069252
 ] 

Allen Wittenauer commented on HDFS-253:
---

One of my favorite JIRAs.  Still open!  I think I'll add the newbie tag.


 Method to retrieve all quotas active on HDFS
 

 Key: HDFS-253
 URL: https://issues.apache.org/jira/browse/HDFS-253
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Reporter: Marco Nicosia
  Labels: newbie

 Currently the only way to view quota information on an HDFS is via dfs -count 
 -q, which is fine when an admin is examining a specific directory for quota 
 status.
 It would also be good to do full HDFS quota audits, by pulling all HDFS 
 quotas currently set on the system. This is especially important when trying 
 to do capacity management (OK, how much quota have we allotted so far?). I 
 think the only way to do this now is via lsr | count -q, which is pretty 
 cumbersome.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-253) Method to retrieve all quotas active on HDFS

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-253:
--

Component/s: namenode

 Method to retrieve all quotas active on HDFS
 

 Key: HDFS-253
 URL: https://issues.apache.org/jira/browse/HDFS-253
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Reporter: Marco Nicosia
  Labels: newbie

 Currently the only way to view quota information on an HDFS is via dfs -count 
 -q, which is fine when an admin is examining a specific directory for quota 
 status.
 It would also be good to do full HDFS quota audits, by pulling all HDFS 
 quotas currently set on the system. This is especially important when trying 
 to do capacity management (OK, how much quota have we allotted so far?). I 
 think the only way to do this now is via lsr | count -q, which is pretty 
 cumbersome.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-253) Method to retrieve all quotas active on HDFS

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-253:
--

Labels: newbie  (was: )

 Method to retrieve all quotas active on HDFS
 

 Key: HDFS-253
 URL: https://issues.apache.org/jira/browse/HDFS-253
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Reporter: Marco Nicosia
  Labels: newbie

 Currently the only way to view quota information on an HDFS is via dfs -count 
 -q, which is fine when an admin is examining a specific directory for quota 
 status.
 It would also be good to do full HDFS quota audits, by pulling all HDFS 
 quotas currently set on the system. This is especially important when trying 
 to do capacity management (OK, how much quota have we allotted so far?). I 
 think the only way to do this now is via lsr | count -q, which is pretty 
 cumbersome.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist

2014-07-21 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6422:
---

Attachment: HDFS-6422.007.patch

bq. logAuditEvent(false, getXAttr, src); -- logAuditEvent(false, 
getXAttrs, src);

fixed.

{code}
} else {
+throw new IOException(No matching attributes found);
{code}
Changed to No matching attributes found for remove operation

bq.  And this condition makes me to think on retryCache. I hope it is done here 
let me see there.  Example first call may succeed internally but 
restarted/disconnected, in that case idempotent API will be retried from 
client. So, next call may fail as it was already removed. Do you think, we need 
to mark this as ATMostOnce?

Good catch. You're right that my changes require removeXAttr to
become AtMostOnce. I've changed the code to reflect that.

bq. I think below exception message can be refined something like Some/all 
attributes does not match to get?

I've chnages this to At least one of the attributes provided was not found.

TestDFSShell.java:

bq. From the below code, we don't need out.toString as we did not asserted 
anything.

removed.

bq. We need to shutdown the mini cluster as well.

Done.

FSXAttrBaseTest.java:

bq. Please handle only specific exceptions. If it throws unexpected exception 
let it throwout, we need not assert and throw.

All of this is due to WebHDFS throwing a different exception from the regular 
path. WebHDFS throws a RemoteException which wraps a 
HadoopIllegalArgumentException. In other words, the WebHDFS client does not 
unwrap the exception. You'll see in the diff that I've changed the exception 
handling to catch both RemoteException and HadoopIllegalArgumentException. In 
the former case, I check to see that the underlying exception is a HIAE.

XattrNameParam.java:

{code}
private static Domain DOMAIN = new Domain(NAME,
+  Pattern.compile(.*));
{code}

bq. I understand that we try to eliminate the client validation as we will not 
have flexibility to add more namespaces in future. But that pattern can be same 
as Namespace. right. So, how about validating pattern? Please check with 
Andrew as well what he says. But I have no strong feeling on that. It is a 
suggestion.

I understand your concern. The problem is that WebHDFS would then be doing 
client side checking and the exception would be generated and thrown from two 
different places. We wanted to unify all of the xattr Namespace checking into 
one place on the server side so that there would only be one place where the 
exception would be generated. I talked to Andrew and he's ok with leaving it 
like it is in the patch.


 getfattr in CLI doesn't throw exception or return non-0 return code when 
 xattr doesn't exist
 

 Key: HDFS-6422
 URL: https://issues.apache.org/jira/browse/HDFS-6422
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Blocker
 Attachments: HDFS-6422.005.patch, HDFS-6422.006.patch, 
 HDFS-6422.007.patch, HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch, 
 HDFS-6474.4.patch


 If you do
 hdfs dfs -getfattr -n user.blah /foo
 and user.blah doesn't exist, the command prints
 # file: /foo
 and a 0 return code.
 It should print an exception and return a non-0 return code instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-136) Not able to run randomwriter/sort on hdfs if all the nodes of same rack are killed.

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-136.
---

Resolution: Incomplete

Probably stale.

 Not able to run randomwriter/sort on hdfs if all the nodes of same rack are 
 killed.
 ---

 Key: HDFS-136
 URL: https://issues.apache.org/jira/browse/HDFS-136
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suman Sehgal

 Not able to run randomwriter if all the datanodes of any one of the racks are 
 killed. (replication factor : 3)
 Randomwriter job gets failed and following error message is displayed in log:
 java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2398)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2354)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1744)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1927)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6715) webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode

2014-07-21 Thread Arpit Gupta (JIRA)
Arpit Gupta created HDFS-6715:
-

 Summary: webhdfs wont fail over when it gets java.io.IOException: 
Namenode is in startup mode
 Key: HDFS-6715
 URL: https://issues.apache.org/jira/browse/HDFS-6715
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.2.0
Reporter: Arpit Gupta


Noticed in our HA testing when we run MR job with webhdfs file system we some 
times run into 

{code}
2014-04-17 05:08:06,346 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1397710493213_0001_r_08_0: Container killed by the 
ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

2014-04-17 05:08:10,205 ERROR [CommitterEvent Processor #1] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not 
commit job
java.io.IOException: Namenode is in startup mode
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6715) webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-6715:
---

Component/s: webhdfs

 webhdfs wont fail over when it gets java.io.IOException: Namenode is in 
 startup mode
 

 Key: HDFS-6715
 URL: https://issues.apache.org/jira/browse/HDFS-6715
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, webhdfs
Affects Versions: 2.2.0
Reporter: Arpit Gupta

 Noticed in our HA testing when we run MR job with webhdfs file system we some 
 times run into 
 {code}
 2014-04-17 05:08:06,346 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1397710493213_0001_r_08_0: Container killed by the 
 ApplicationMaster.
 Container killed on request. Exit code is 143
 Container exited with a non-zero exit code 143
 2014-04-17 05:08:10,205 ERROR [CommitterEvent Processor #1] 
 org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not 
 commit job
 java.io.IOException: Namenode is in startup mode
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly

2014-07-21 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069320#comment-14069320
 ] 

Devaraj Das commented on HDFS-6680:
---

I see.. good catch. Looks good to me.

 BlockPlacementPolicyDefault does not choose favored nodes correctly
 ---

 Key: HDFS-6680
 URL: https://issues.apache.org/jira/browse/HDFS-6680
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6680_20140714.patch, h6680_20140716.patch


 In one of the chooseTarget(..) methods, it tries all the favoredNodes to 
 chooseLocalNode(..).  It expects chooseLocalNode to return null if the local 
 node is not a good target.  Unfortunately, chooseLocalNode will fallback to 
 chooseLocalRack but not returning null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-279) Generation stamp value should be validated when creating a Block

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069331#comment-14069331
 ] 

Allen Wittenauer commented on HDFS-279:
---

Stale issue?

 Generation stamp value should be validated when creating a Block
 

 Key: HDFS-279
 URL: https://issues.apache.org/jira/browse/HDFS-279
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze

 In hdfs, generation stamps  GenerationStamp.FIRST_VALID_STAMP are reserved 
 values but not valid generation stamps.  Incorrect uses of the reserved 
 values may cause unexpected behavior.  We should  validate generation stamp 
 when creating a Block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-88) Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-88.
--

Resolution: Incomplete

I'm going to close this as stale. I suspect this issue has gone away with the 
two fixes referenced.

 Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite
 ---

 Key: HDFS-88
 URL: https://issues.apache.org/jira/browse/HDFS-88
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack

 We've seen this hang rare enough but when it happens it locks up the 
 application.  We've seen it at least in 0.18.x and 0.19.x (we don't have much 
 experience with 0.20.x hdfs yet).
 Here we're doing a sequencefile#append
 {code}
 IPC Server handler 9 on 60020 daemon prio=10 tid=0x7fef1c3f0400 
 nid=0x7470 waiting for monitor entry [0x42d18000..0x42d189f0]
java.lang.Thread.State: BLOCKED (on object monitor)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2486)
   - waiting to lock 0x7fef38ecc138 (a java.util.LinkedList)
   - locked 0x7fef38ecbdb8 (a 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at 
 org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
   at 
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
   - locked 0x7fef38ecbdb8 (a 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at 
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
   - locked 0x7fef38ecbdb8 (a 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
   - locked 0x7fef38ecbdb8 (a 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
   at java.io.DataOutputStream.write(DataOutputStream.java:107)
   - locked 0x7fef38e09fc0 (a 
 org.apache.hadoop.fs.FSDataOutputStream)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016)
   - locked 0x7fef38e09f30 (a 
 org.apache.hadoop.io.SequenceFile$Writer)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:980)
   - locked 0x7fef38e09f30 (a 
 org.apache.hadoop.io.SequenceFile$Writer)
   at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:461)
   at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:421)
   - locked 0x7fef29ad9588 (a java.lang.Integer)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1676)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1439)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1378)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1184)
   at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:622)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
 {code}
 The DataStreamer that is supposed to servicing the above writeChunk is stuck 
 here:
 {code}
 DataStreamer for file 
 /hbase/log_72.34.249.212_1225407466779_60020/hlog.dat.1227075571390 block 
 blk_-7436808403424765554_553837 daemon prio=10 tid=0x01c84c00 
 nid=0x7125 in Object.wait() [0x409b3000..0x409b3d70]
java.lang.Thread.State: WAITING (on object monitor)
   at java.lang.Object.wait(Native Method)
   at java.lang.Object.wait(Object.java:502)
   at org.apache.hadoop.ipc.Client.call(Client.java:709)
   - locked 0x7fef39520bb8 (a org.apache.hadoop.ipc.Client$Call)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
   at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
   at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
   at 
 org.apache.hadoop.dfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:139)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2185)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735)
   at 
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889)
   - locked 0x7fef38ecc138 (a 

[jira] [Resolved] (HDFS-205) HDFS Tmpreaper

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-205.
---

Resolution: Duplicate

Closing this as a dupe of HDFS-6382.

 HDFS Tmpreaper
 --

 Key: HDFS-205
 URL: https://issues.apache.org/jira/browse/HDFS-205
 Project: Hadoop HDFS
  Issue Type: New Feature
 Environment: CentOs 4/5, Java 1.5, Hadoop 0.17.3
Reporter: Michael Andrews
Priority: Minor
 Attachments: DateDelta.java, TmpReaper.java


 Java implementation of tmpreaper utility for HDFS.  Helps when you expect 
 processes to die before they can clean up.  I have perl unit tests that can 
 be ported over to java or groovy if the hadoop team is interested in this 
 utility.  One issue is that the unit tests set the modification time of test 
 files, which is unsupported in HDFS (as far as I can tell). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6716) Update usage of KeyProviderCryptoExtension APIs on NameNode

2014-07-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069355#comment-14069355
 ] 

Andrew Wang commented on HDFS-6716:
---

I'll add that this is somewhat urgent, as I simply mocked out the new API calls 
so tests are currently broken.

 Update usage of KeyProviderCryptoExtension APIs on NameNode
 ---

 Key: HDFS-6716
 URL: https://issues.apache.org/jira/browse/HDFS-6716
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Andrew Wang
Assignee: Andrew Wang

 Some recent changes have landed to the KeyProviderCryptoExtension APIs, need 
 to update the usage in HDFS to reflect this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6716) Update usage of KeyProviderCryptoExtension APIs on NameNode

2014-07-21 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-6716:
-

 Summary: Update usage of KeyProviderCryptoExtension APIs on 
NameNode
 Key: HDFS-6716
 URL: https://issues.apache.org/jira/browse/HDFS-6716
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Andrew Wang
Assignee: Andrew Wang


Some recent changes have landed to the KeyProviderCryptoExtension APIs, need to 
update the usage in HDFS to reflect this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-351) Could FSEditLog report problems more elegantly than with System.exit(-1)

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-351.
---

Resolution: Incomplete

edits logging got majorly reworked. closing as stale.

 Could FSEditLog report problems more elegantly than with System.exit(-1)
 

 Key: HDFS-351
 URL: https://issues.apache.org/jira/browse/HDFS-351
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Steve Loughran
Priority: Minor

 When FSEdit encounters problems, it prints something and then exits.
 It would be better for any in-JVM deployments of FSEdit for these to be 
 raised in some other way (such as throwing an exception), rather than taking 
 down the whole JVM. That could be in JUnit tests, or it could be inside other 
 applications. Test runners and the like can intercept those System.exit() 
 calls with their own Security Manager -often turning the System.exit() 
 operation into an exception there and then. If FSEdit did that itself, it may 
 be easier to stay in control. 
 The current approach has some benefits -it can exit regardless of which 
 thread has encountered problems, but it is tricky to test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-197) du fails on Cygwin

2014-07-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-197.
---

Resolution: Fixed

MS has fixed Windows support. Closing as stale.

 du fails on Cygwin
 

 Key: HDFS-197
 URL: https://issues.apache.org/jira/browse/HDFS-197
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Windows + Cygwin
Reporter: Kohsuke Kawaguchi
 Attachments: HADOOP-5486


 When I try to run a datanode on Windows, I get the following exception:
 {noformat}
 java.io.IOException: Expecting a line not the end of stream
   at org.apache.hadoop.fs.DU.parseExecResult(DU.java:181)
   at org.apache.hadoop.util.Shell.runCommand(Shell.java:179)
   at org.apache.hadoop.util.Shell.run(Shell.java:134)
   at org.apache.hadoop.fs.DU.init(DU.java:53)
   at org.apache.hadoop.fs.DU.init(DU.java:63)
   at 
 org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.init(FSDataset.java:325)
   at 
 org.apache.hadoop.hdfs.server.datanode.FSDataset.init(FSDataset.java:681)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:291)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:205)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1238)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1193)
 {noformat}
 This is because Hadoop execs du -sk C:\tmp\hadoop-SYSTEM\dfs\data with a 
 Windows path representation, which cygwin du doesn't understand.
 {noformat}
 C:\hudsondu -sk C:\tmp\hadoop-SYSTEM\dfs\data
 du -sk C:\tmp\hadoop-SYSTEM\dfs\data
 du: cannot access `C:\\tmp\\hadoop-SYSTEM\\dfs\\data': No such file or 
 directory
 {noformat}
 For this to work correctly, Hadoop would have to run cygpath first to get a 
 Unix path representation, then to call DU.
 Also, I had to use the debugger to get this information. Shell.runCommand 
 should catch IOException from parseExecResult and add the buffered stderr to 
 simplify the error diagnostics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-346) Version file in name-node image directory should include role field.

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069384#comment-14069384
 ] 

Allen Wittenauer commented on HDFS-346:
---

Is this still valid?  Actually, what would be the expected outcome of a node 
seeing a different role on startup?

 Version file in name-node image directory should include role field.
 --

 Key: HDFS-346
 URL: https://issues.apache.org/jira/browse/HDFS-346
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Konstantin Shvachko

 It would be useful to have name-node role field in the {{VERSION}} file in 
 name-node's image and edits directories so that one could see what type of 
 node created the image. Role was introduced by HADOOP-4539



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-308) Improve TransferFsImage

2014-07-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069385#comment-14069385
 ] 

Allen Wittenauer commented on HDFS-308:
---

I seem to recall this got reworked, but I'm not sure if these particular issues 
have been dealt with and/or are still relevant.

 Improve TransferFsImage
 ---

 Key: HDFS-308
 URL: https://issues.apache.org/jira/browse/HDFS-308
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Konstantin Shvachko
Assignee: Jakob Homan

 {{TransferFsImage}} transfers name-node image and edits files during 
 checkpoint process.
 # {{TransferFsImage}} should *always* pass and verify CheckpointSignature. 
 Now we send it only when image is uploaded back to name-node.
 # {{getFileClient()}} should use {{CollectionFile}} rather than {{File[]}} 
 as the third parameter.
 # Rather than sending port and address separately {{port= + port + 
 machine= + addr}} it should send the entire address at once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >