[jira] [Reopened] (HDFS-257) Does a big delete starve other clients?
[ https://issues.apache.org/jira/browse/HDFS-257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milind Bhandarkar reopened HDFS-257: Yes, it does. Reopening. Does a big delete starve other clients? --- Key: HDFS-257 URL: https://issues.apache.org/jira/browse/HDFS-257 Project: Hadoop HDFS Issue Type: Task Reporter: Robert Chansler Or, more generally, is there _any_ operation that has the potential to severely starve other clients? The speculation is that deleting a directory with 50,000 files might starve other client for several seconds. Is that true? Is that necessary? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion
[ https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068254#comment-14068254 ] Vinayakumar B commented on HDFS-6710: - Patch looks good. Nit: I can see trailing spaces in some places. +1, on addressing that. Archival Storage: Consider block storage policy in replica deletion --- Key: HDFS-6710 URL: https://issues.apache.org/jira/browse/HDFS-6710 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6710_20140720.patch Replica deletion should be modified in a way that the deletion won't break the block storage policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6676) KMS throws AuthenticationException when enabling kerberos authentication
[ https://issues.apache.org/jira/browse/HDFS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068314#comment-14068314 ] liyunzhang commented on HDFS-6676: -- you can enable kerberos http spnego authentication of KMS in following steps 1.config kdc server successfully 2.generate keberos ticket in client ex.kinit HTTP/liyunzhangcentos.sh.intel@sh.intel.com -kt /home/zly/http.keytab 3.edit kms-site.xml, edit following property item property namehadoop.kms.authentication.type/name valuekerberos/value description simple or kerberos /description /property property namehadoop.kms.authentication.kerberos.keytab/name value/home/zly/hadoop-3.0.0-SNAPSHOT/etc/hadoop/kerberos/HTTP.keytab/value description /description /property property namehadoop.kms.authentication.kerberos.principal/name valueHTTP/liyunzhangcentos.sh.intel@sh.intel.com/value description /description /property 4.start kms server 5.use curl to test kms functions like create key ex:#curl -i --negotiate -u: -X POST -d @createkey.json http://liyunzhangcentos.sh.intel.com:16000/kms/v1/keys --header Content-Type:application/json HTTP/1.1 401 Unauthorized Server: Apache-Coyote/1.1 WWW-Authenticate: Negotiate Set-Cookie: hadoop.auth=; Expires=Thu, 01-Jan-1970 00:00:00 GMT; HttpOnly Content-Type: text/html;charset=utf-8 Content-Length: 997 Date: Mon, 21 Jul 2014 06:27:59 GMT HTTP/1.1 201 Created Server: Apache-Coyote/1.1 Set-Cookie: hadoop.auth=u=HTTPp=HTTP/liyunzhangcentos.sh.intel@sh.intel.comt=kerberose=1405960084208s=UgeM6AwoHo46HDntyVXB/OLK6u8=; Expires=Mon, 21-Jul-2014 16:28:04 GMT; HttpOnly Location: http://liyunzhangcentos.sh.intel.com:16000/kms/v1/keys/v1/key/k1 Content-Type: application/json Content-Length: 55 Date: Mon, 21 Jul 2014 06:28:33 GMT Res {versionName : k1@0, material : 12345w== } KMS throws AuthenticationException when enabling kerberos authentication - Key: HDFS-6676 URL: https://issues.apache.org/jira/browse/HDFS-6676 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.4.0 Reporter: liyunzhang Priority: Minor When I made a request http://server-1941.novalocal:16000/kms/v1/names in firefox. (before, i set configs in firefox according https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/sso-config-firefox.html), following info was found in logs/kms.log. 2014-07-14 19:18:30,461 WARN AuthenticationFilter - Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: Failure unspecified at GSS-API level (Mechanism levelis of type NULL) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:380) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:357) at org.apache.hadoop.crypto.key.kms.server.KMSAuthenticationFilter.doFilter(KMSAuthenticationFilter.java:100) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:745) Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) at
[jira] [Resolved] (HDFS-6676) KMS throws AuthenticationException when enabling kerberos authentication
[ https://issues.apache.org/jira/browse/HDFS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang resolved HDFS-6676. -- Resolution: Not a Problem KMS throws AuthenticationException when enabling kerberos authentication - Key: HDFS-6676 URL: https://issues.apache.org/jira/browse/HDFS-6676 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.4.0 Reporter: liyunzhang Priority: Minor When I made a request http://server-1941.novalocal:16000/kms/v1/names in firefox. (before, i set configs in firefox according https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/sso-config-firefox.html), following info was found in logs/kms.log. 2014-07-14 19:18:30,461 WARN AuthenticationFilter - Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: Failure unspecified at GSS-API level (Mechanism levelis of type NULL) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:380) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:357) at org.apache.hadoop.crypto.key.kms.server.KMSAuthenticationFilter.doFilter(KMSAuthenticationFilter.java:100) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:745) Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) at sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875) at sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:347) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:329) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:329) ... 14 more Caused by: KrbException: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:169) at sun.security.krb5.KrbCred.init(KrbCred.java:131) at sun.security.jgss.krb5.InitialToken$OverloadedChecksum.init(InitialToken.java:282) at sun.security.jgss.krb5.InitSecContextToken.init(InitSecContextToken.java:130) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771) ... 25 more Kerberos is enabled successful in my environment: klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: HTTP/server-1941.novalocal@NOVALOCAL Valid starting ExpiresService principal
[jira] [Commented] (HDFS-6637) Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer
[ https://issues.apache.org/jira/browse/HDFS-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068350#comment-14068350 ] Vinayakumar B commented on HDFS-6637: - I think, the configuration dfs.ha.standby.checkpoints added to disable the checkpoints for tests, since the HTTP ports could be specified ephemeral. In this case Standby NN will not know the Active NN's http port to do checkpoint. See the comment in MiniDFSCluster {code} // In an HA cluster, in order for the StandbyNode to perform checkpoints, // it needs to know the HTTP port of the Active. So, if ephemeral ports // are chosen, disable checkpoints for the test. if (!nnTopology.allHttpPortsSpecified() nnTopology.isHA()) { LOG.info(MiniDFSCluster disabling checkpointing in the Standby node + since no HTTP ports have been specified.); conf.setBoolean(DFS_HA_STANDBY_CHECKPOINTS_KEY, false); }{code} Even though I agree that user also could disable checkpointing, but I don't think this is possible in real HA cluster. What user will do with checkpoints disabled ..? ;) Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer - Key: HDFS-6637 URL: https://issues.apache.org/jira/browse/HDFS-6637 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Dian Fu Attachments: HDFS-6637.patch, HDFS-6637.patch.1 In HA setup cluster, for rolling upgrade, the image file fsimage_rollback is generated by StandbyCheckpointer thread of SBN. While if configuration dfs.ha.standby.checkpoints is set false, there will be no StandbyCheckpointer thread in SBN. This will lead to the rolling upgrade never finish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6698: Attachment: HDFS-6698.txt try to optimize DFSInputStream.getFileLength() -- Key: HDFS-6698 URL: https://issues.apache.org/jira/browse/HDFS-6698 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6698.txt HBase prefers to invoke read() serving scan request, and invoke pread() serving get reqeust. Because pread() almost holds no lock. Let's image there's a read() running, because the definition is: {code} public synchronized int read {code} so no other read() request could run concurrently, this is known, but pread() also could not run... because: {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { throw new IOException(Stream closed); } failures = 0; long filelen = getFileLength(); {code} the getFileLength() also needs lock. so we need to figure out a no lock impl for getFileLength() before HBase multi stream feature done. [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HDFS-6698: Status: Patch Available (was: Open) try to optimize DFSInputStream.getFileLength() -- Key: HDFS-6698 URL: https://issues.apache.org/jira/browse/HDFS-6698 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6698.txt HBase prefers to invoke read() serving scan request, and invoke pread() serving get reqeust. Because pread() almost holds no lock. Let's image there's a read() running, because the definition is: {code} public synchronized int read {code} so no other read() request could run concurrently, this is known, but pread() also could not run... because: {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { throw new IOException(Stream closed); } failures = 0; long filelen = getFileLength(); {code} the getFileLength() also needs lock. so we need to figure out a no lock impl for getFileLength() before HBase multi stream feature done. [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6637) Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer
[ https://issues.apache.org/jira/browse/HDFS-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068353#comment-14068353 ] Dian Fu commented on HDFS-6637: --- Thanks very much for the comments, Vinay. The explanation make sense. Thanks very much. Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer - Key: HDFS-6637 URL: https://issues.apache.org/jira/browse/HDFS-6637 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Dian Fu Attachments: HDFS-6637.patch, HDFS-6637.patch.1 In HA setup cluster, for rolling upgrade, the image file fsimage_rollback is generated by StandbyCheckpointer thread of SBN. While if configuration dfs.ha.standby.checkpoints is set false, there will be no StandbyCheckpointer thread in SBN. This will lead to the rolling upgrade never finish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
Vinayakumar B created HDFS-6714: --- Summary: TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster Key: HDFS-6714 URL: https://issues.apache.org/jira/browse/HDFS-6714 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the cluster after test. This could lead to errors in windows while running non-forked tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
[ https://issues.apache.org/jira/browse/HDFS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6714: Attachment: HDFS-6714.patch Attached the patch. Please review TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster - Key: HDFS-6714 URL: https://issues.apache.org/jira/browse/HDFS-6714 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor Attachments: HDFS-6714.patch TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the cluster after test. This could lead to errors in windows while running non-forked tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
[ https://issues.apache.org/jira/browse/HDFS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6714: Status: Patch Available (was: Open) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster - Key: HDFS-6714 URL: https://issues.apache.org/jira/browse/HDFS-6714 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor Attachments: HDFS-6714.patch TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the cluster after test. This could lead to errors in windows while running non-forked tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068361#comment-14068361 ] Liang Xie commented on HDFS-6698: - In normal situation, The HFiles which all of HBase (p)reads against to should be immuable, so i assumed the attached patch per [~saint@gmail.com]'s suggestion is enough to relieve the pread(s) were blocked by read request in HBase issue. Let's see QA result... try to optimize DFSInputStream.getFileLength() -- Key: HDFS-6698 URL: https://issues.apache.org/jira/browse/HDFS-6698 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6698.txt HBase prefers to invoke read() serving scan request, and invoke pread() serving get reqeust. Because pread() almost holds no lock. Let's image there's a read() running, because the definition is: {code} public synchronized int read {code} so no other read() request could run concurrently, this is known, but pread() also could not run... because: {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { throw new IOException(Stream closed); } failures = 0; long filelen = getFileLength(); {code} the getFileLength() also needs lock. so we need to figure out a no lock impl for getFileLength() before HBase multi stream feature done. [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6637) Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer
[ https://issues.apache.org/jira/browse/HDFS-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-6637: Resolution: Invalid Status: Resolved (was: Patch Available) Thanks [~dian.fu], Making this as invalid due to invalid (Un-real) configuration. Feel free to re-open if you feel this has to be fixed. Rolling upgrade won't finish if SBN is configured without StandbyCheckpointer - Key: HDFS-6637 URL: https://issues.apache.org/jira/browse/HDFS-6637 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Dian Fu Attachments: HDFS-6637.patch, HDFS-6637.patch.1 In HA setup cluster, for rolling upgrade, the image file fsimage_rollback is generated by StandbyCheckpointer thread of SBN. While if configuration dfs.ha.standby.checkpoints is set false, there will be no StandbyCheckpointer thread in SBN. This will lead to the rolling upgrade never finish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing
[ https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068447#comment-14068447 ] Yu Li commented on HDFS-6441: - Hi [~szetszwo] [~aagarwal] and [~benoyantony], Sorry for the late response, really didn't expect a response after 1 month or so :-P Sure I don't mind if we contribute the feature here, I'm glad if only the feature could be added, no matter how we get it done. :-) About the patch, I could see the advantage of using a file to pass the node-list of include/exclude nodes especially when the list is long, meanwhile I'd say it would be great if we also support passing the servers through parameter, which is much easier to invoke the tool from another program(so we could still complete the HDFS-6009 work :-)) Add ability to exclude/include few datanodes while balancing Key: HDFS-6441 URL: https://issues.apache.org/jira/browse/HDFS-6441 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.4.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch In some use cases, it is desirable to ignore a few data nodes while balancing. The administrator should be able to specify a list of data nodes in a file similar to the hosts file and the balancer should ignore these data nodes while balancing so that no blocks are added/removed on these nodes. Similarly it will be beneficial to specify that only a particular list of datanodes should be considered for balancing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6702) DFSClient should create blocks using StorageType
[ https://issues.apache.org/jira/browse/HDFS-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068456#comment-14068456 ] Vinayakumar B commented on HDFS-6702: - Patch looks good One nit. {code} +final ListStorageTypeProto protos = new ArrayListStorageTypeProto( +types.length); +for (int i = startIdx; i types.length; ++i) { + protos.add(convertStorageType(types[i])); +} {code} Here, initialCapacity can be given as {{types.length-startIdx}}, otherwise extra allocation will happen for non-zero startIdx. +1 on addressing this DFSClient should create blocks using StorageType - Key: HDFS-6702 URL: https://issues.apache.org/jira/browse/HDFS-6702 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6702_20140719.patch, h6702_20140721.patch, h6702_20140721b.patch When DFSClient asks NN for a new block (via addBlock), NN returns a LocatedBlock with storage type information. However, DFSClient does not use StorageType to create blocks with DN. As a result, the block replicas could possibly be created with a different storage type. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068461#comment-14068461 ] Hadoop QA commented on HDFS-6698: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656843/HDFS-6698.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.TestDatanodeConfig org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7408//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7408//console This message is automatically generated. try to optimize DFSInputStream.getFileLength() -- Key: HDFS-6698 URL: https://issues.apache.org/jira/browse/HDFS-6698 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6698.txt HBase prefers to invoke read() serving scan request, and invoke pread() serving get reqeust. Because pread() almost holds no lock. Let's image there's a read() running, because the definition is: {code} public synchronized int read {code} so no other read() request could run concurrently, this is known, but pread() also could not run... because: {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { throw new IOException(Stream closed); } failures = 0; long filelen = getFileLength(); {code} the getFileLength() also needs lock. so we need to figure out a no lock impl for getFileLength() before HBase multi stream feature done. [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068467#comment-14068467 ] Liang Xie commented on HDFS-6698: - Those three failure cases are not related with current patch. try to optimize DFSInputStream.getFileLength() -- Key: HDFS-6698 URL: https://issues.apache.org/jira/browse/HDFS-6698 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6698.txt HBase prefers to invoke read() serving scan request, and invoke pread() serving get reqeust. Because pread() almost holds no lock. Let's image there's a read() running, because the definition is: {code} public synchronized int read {code} so no other read() request could run concurrently, this is known, but pread() also could not run... because: {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { throw new IOException(Stream closed); } failures = 0; long filelen = getFileLength(); {code} the getFileLength() also needs lock. so we need to figure out a no lock impl for getFileLength() before HBase multi stream feature done. [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6714) TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster
[ https://issues.apache.org/jira/browse/HDFS-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068470#comment-14068470 ] Hadoop QA commented on HDFS-6714: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656844/HDFS-6714.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager org.apache.hadoop.hdfs.server.namenode.TestNamenodeCapacityReport {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7409//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7409//console This message is automatically generated. TestBlocksScheduledCounter#testBlocksScheduledCounter should shutdown cluster - Key: HDFS-6714 URL: https://issues.apache.org/jira/browse/HDFS-6714 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Minor Attachments: HDFS-6714.patch TestBlocksScheduledCounter#testBlocksScheduledCounter() should shutdown the cluster after test. This could lead to errors in windows while running non-forked tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6702) DFSClient should create blocks using StorageType
[ https://issues.apache.org/jira/browse/HDFS-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068496#comment-14068496 ] Tsz Wo Nicholas Sze commented on HDFS-6702: --- Thanks Vinay. I do have new ArrayList with types.length-startIdx in my previous patch (h6702_20140721.patch). However, TestDiskError fails so that I change it to types.length. DFSClient should create blocks using StorageType - Key: HDFS-6702 URL: https://issues.apache.org/jira/browse/HDFS-6702 Project: Hadoop HDFS Issue Type: Bug Components: datanode, hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6702_20140719.patch, h6702_20140721.patch, h6702_20140721b.patch When DFSClient asks NN for a new block (via addBlock), NN returns a LocatedBlock with storage type information. However, DFSClient does not use StorageType to create blocks with DN. As a result, the block replicas could possibly be created with a different storage type. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion
[ https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6710: -- Attachment: h6710_20140721.patch h6710_20140721.patch: removes trailing spaces. Archival Storage: Consider block storage policy in replica deletion --- Key: HDFS-6710 URL: https://issues.apache.org/jira/browse/HDFS-6710 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6710_20140720.patch, h6710_20140721.patch Replica deletion should be modified in a way that the deletion won't break the block storage policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion
[ https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-6710. --- Resolution: Fixed Hadoop Flags: Reviewed Thank Vinay for reviewing the patch. I have committed this. Archival Storage: Consider block storage policy in replica deletion --- Key: HDFS-6710 URL: https://issues.apache.org/jira/browse/HDFS-6710 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6710_20140720.patch, h6710_20140721.patch Replica deletion should be modified in a way that the deletion won't break the block storage policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6679) Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files
[ https://issues.apache.org/jira/browse/HDFS-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6679: -- Hadoop Flags: Reviewed +1 patch looks good. Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files -- Key: HDFS-6679 URL: https://issues.apache.org/jira/browse/HDFS-6679 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Vinayakumar B Attachments: HDFS-6679.patch, HDFS-6679.patch, HDFS-6679.patch, editsStored HDFS-6677 changed fsimage for storing storage policy IDs. We should bump the NameNodeLayoutVersion and as well fix the tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6679) Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files
[ https://issues.apache.org/jira/browse/HDFS-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-6679. --- Resolution: Fixed Fix Version/s: Archival Storage (HDFS-6584) I have committed this. Thanks, Vinay! Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files -- Key: HDFS-6679 URL: https://issues.apache.org/jira/browse/HDFS-6679 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Vinayakumar B Fix For: Archival Storage (HDFS-6584) Attachments: HDFS-6679.patch, HDFS-6679.patch, HDFS-6679.patch, editsStored HDFS-6677 changed fsimage for storing storage policy IDs. We should bump the NameNodeLayoutVersion and as well fix the tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6710) Archival Storage: Consider block storage policy in replica deletion
[ https://issues.apache.org/jira/browse/HDFS-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6710: -- Fix Version/s: Archival Storage (HDFS-6584) Archival Storage: Consider block storage policy in replica deletion --- Key: HDFS-6710 URL: https://issues.apache.org/jira/browse/HDFS-6710 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Fix For: Archival Storage (HDFS-6584) Attachments: h6710_20140720.patch, h6710_20140721.patch Replica deletion should be modified in a way that the deletion won't break the block storage policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6686) Archival Storage: Use fallback storage types
[ https://issues.apache.org/jira/browse/HDFS-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6686: -- Attachment: h6710_20140721.patch Vinay, you are right that the patch depends on HDFS-6710 which is now committed. Thanks for trying the patch. h6710_20140721.patch: updates with the branch. Archival Storage: Use fallback storage types Key: HDFS-6686 URL: https://issues.apache.org/jira/browse/HDFS-6686 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6686_20140721.patch, h6710_20140721.patch HDFS-6671 changes replication monitor to use block storage policy for replication. It should also use the fallback storage types when a particular type of storage is full. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-49) MiniDFSCluster.stopDataNode will always shut down a node in the cluster if a matching name is not found
[ https://issues.apache.org/jira/browse/HDFS-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068635#comment-14068635 ] Steve Loughran commented on HDFS-49: nope, [Still There|https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java#L1798] MiniDFSCluster.stopDataNode will always shut down a node in the cluster if a matching name is not found --- Key: HDFS-49 URL: https://issues.apache.org/jira/browse/HDFS-49 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.20.204.0, 0.20.205.0, 1.1.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Labels: codereview, newbie Attachments: hdfs-49.patch Original Estimate: 0.5h Remaining Estimate: 0.5h The stopDataNode method will shut down the last node in the list of nodes, if one matching a specific name is not found This is possibly not what was intended. Better to return false or fail in some other manner if the named node was not located synchronized boolean stopDataNode(String name) { int i; for (i = 0; i dataNodes.size(); i++) { DataNode dn = dataNodes.get(i).datanode; if (dn.dnRegistration.getName().equals(name)) { break; } } return stopDataNode(i); } -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-169) dfshealth.jsp: sorting on remaining doesn't actually sort
[ https://issues.apache.org/jira/browse/HDFS-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-169. --- Resolution: Incomplete hdfs gui has been rewritten. closing as stale. dfshealth.jsp: sorting on remaining doesn't actually sort --- Key: HDFS-169 URL: https://issues.apache.org/jira/browse/HDFS-169 Project: Hadoop HDFS Issue Type: Bug Reporter: Michael Bieniosek When I try to sort by remaining in dfshealth.jsp, I get sent to an url that looks like: http://example.com:50070/dfshealth.jsp?sorter/field=remainingsorter/order=ASC But, the resultant table doesn't seem to be sorted at all (though the order is different). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2251) Namenode does not recognize incorrectly sized blocks
[ https://issues.apache.org/jira/browse/HDFS-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068749#comment-14068749 ] Allen Wittenauer commented on HDFS-2251: Despite Brian saying that we could close this out, it'd be good if we could verify that this is fixed. Namenode does not recognize incorrectly sized blocks Key: HDFS-2251 URL: https://issues.apache.org/jira/browse/HDFS-2251 Project: Hadoop HDFS Issue Type: Bug Reporter: Brian Bockelman We had a lot of file system corruption resulting in incorrectly sized blocks (on disk, they're truncated to 192KB when they should be 64MB). However, I cannot make Hadoop realize that these blocks are incorrectly sized. When I try to drain off the node, I get the following messages: 2008-10-29 18:46:51,293 WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for block blk_-4403534125663454855_9937 reported from 172.16.1.150:50010 current size is 67108864 reported size is 196608 Here 172.16.1.150 is not the node which has the problematic block, but the destination of the file transfer. I propose that Hadoop should either: a) Upon startup, make sure that all blocks are properly sized (pro: rather cheap check; con: doesn't catch any truncations which happen while on disk) b) Upon detecting the incorrectly sized copy, Hadoop should ask the source of the block to perform a block verification. Thanks, Brian -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-867) Add a PowerTopology class to aid replica placement and enhance availability of blocks
[ https://issues.apache.org/jira/browse/HDFS-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-867. --- Resolution: Fixed I'm going to close this as fixed: using either a multi-level topology or just within the network topology, one can build the power topology into the system. (And, in practice, this is what many of us do...) Add a PowerTopology class to aid replica placement and enhance availability of blocks -- Key: HDFS-867 URL: https://issues.apache.org/jira/browse/HDFS-867 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jeff Hammerbacher Priority: Minor Power outages are a common reason for a DataNode to become unavailable. Having a data structure to represent to the power topology of your data center can be used to implement a power-aware replica placement policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-333) A State Machine for name-node blocks.
[ https://issues.apache.org/jira/browse/HDFS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068760#comment-14068760 ] Allen Wittenauer commented on HDFS-333: --- Given the changes with HA, etc, I wonder if this is still valid. Ping! A State Machine for name-node blocks. - Key: HDFS-333 URL: https://issues.apache.org/jira/browse/HDFS-333 Project: Hadoop HDFS Issue Type: Improvement Reporter: Konstantin Shvachko Blocks on the name-node can belong to different collections like the blocksMap, under-replicated, over-replicated lists, etc. It is getting more and more complicated to keep the lists consistent. It would be good to formalize the movement of the blocks between the collections using a state machine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-201) Spring and OSGi support
[ https://issues.apache.org/jira/browse/HDFS-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-201. --- Resolution: Duplicate Spring and OSGi support --- Key: HDFS-201 URL: https://issues.apache.org/jira/browse/HDFS-201 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jon Brisbin Assignee: Jean-Baptiste Onofré Attachments: HDFS-201.patch I was able to compile 0.18.2 in eclipse into a new OSGi bundle using eclipse PDE. Using Spring to control the HDFS nodes, however, seems out of the question for the time being because of inter-dependencies between packages that should be separate OSGi bundles (for example, SecondaryNameNode includes direct references to StatusHttpServer, which should be in a bundle with a web personality that is separate from Hadoop Core). Looking through the code that starts the daemons, it would seem code changes are necessary to allow for components to be dependency-injected. Rather than instantiating a StatusHttpServer inside the SecondaryNameNode, that reference should (at the very least) be able to be dependency-injected (for example from an OSGi service from another bundle). Adding setters for infoServer would allow that reference to be injected by Spring. This is just an example of the changes that would need to be made to get Hadoop to live happily inside an OSGi container. As a starting point, it would be nice if Hadoop core was able to be split into a client bundle that could be deployed into OSGi containers that would provide client-only access to HDFS clusters. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-331) hadoop fsck should ignore lost+found
[ https://issues.apache.org/jira/browse/HDFS-331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-331: -- Labels: newbie (was: ) hadoop fsck should ignore lost+found Key: HDFS-331 URL: https://issues.apache.org/jira/browse/HDFS-331 Project: Hadoop HDFS Issue Type: Improvement Environment: All Reporter: Lohit Vijayarenu Priority: Minor Labels: newbie hadoop fsck / would check state of entire filesystem. It would be good to have an option to ignore lost+found directory. Better yet, a way to specify ignore list. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6446) NFS: Different error messages for appending/writing data from read only mount
[ https://issues.apache.org/jira/browse/HDFS-6446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068789#comment-14068789 ] Yesha Vora commented on HDFS-6446: -- [~abutala], The issue was not tested for trunk. It was tested with Hadoop 2.2.0. [~brandonli], do you have any idea if this Jira is fixed recently? NFS: Different error messages for appending/writing data from read only mount - Key: HDFS-6446 URL: https://issues.apache.org/jira/browse/HDFS-6446 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora steps: 1) set dfs.nfs.exports.allowed.hosts = nfs_client ro 2) Restart nfs server 3) Append data on file present on hdfs from read only mount point Append data {noformat} bash$ cat /tmp/tmp_10MB.txt /tmp/tmp_mnt/expected_data_stream cat: write error: Input/output error {noformat} 4) Write data from read only mount point Copy data {noformat} bash$ cp /tmp/tmp_10MB.txt /tmp/tmp_mnt/tmp/ cp: cannot create regular file `/tmp/tmp_mnt/tmp/tmp_10MB.txt': Permission denied {noformat} Both operations are treated differently. Copying data returns valid error message: 'Permission denied' . Though append data does not return valid error message -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-48) NN should check a block's length even if the block is not a new block when processing a blockreport
[ https://issues.apache.org/jira/browse/HDFS-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-48. -- Resolution: Duplicate I'm going to close this as a dupe of HDFS-2251. NN should check a block's length even if the block is not a new block when processing a blockreport --- Key: HDFS-48 URL: https://issues.apache.org/jira/browse/HDFS-48 Project: Hadoop HDFS Issue Type: Bug Reporter: Hairong Kuang Assignee: Hairong Kuang If the block length does not match the one in the blockMap, we should mark the block as corrupted. This could help clearing the polluted replicas caused by HADOOP-4810 and also help detect the on-disk block gets truncated/enlarged manually by accident. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-425) fuse has major performance drop on slower machines
[ https://issues.apache.org/jira/browse/HDFS-425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-425: -- Summary: fuse has major performance drop on slower machines (was: Major performance drop on slower machines) fuse has major performance drop on slower machines -- Key: HDFS-425 URL: https://issues.apache.org/jira/browse/HDFS-425 Project: Hadoop HDFS Issue Type: Improvement Components: fuse-dfs Reporter: Marc-Olivier Fleury Priority: Minor When running fuse_dfs on machines that have different CPU characteristics, I noticed that the performance of fuse_dfs is very sensitive to the machine power. The command I used was simply a cat over a rather large amount of data stored on HDFS. Here are the comparative times for the different types of machines: Intel(R) Pentium(R) 4 CPU 2.40GHz :2 min 40 s Intel(R) Pentium(R) 4 CPU 3.06GHz: 1 min 50 s 2 x Intel(R) Pentium(R) 4 CPU 3.00GHz: 0 min 40 s 2 x Intel(R) Xeon(TM) MP CPU 3.33GHz: 0 min 28 s Intel(R) Core(TM)2 Quad CPUQ6600 @ 2.40GHz 0 min 15 s I tried to find other explanations for the drop in performance, such as network configuration, or data locality, but the faster machines are the ones that are further away from the others considering the network configuration, and that don't run datanodes. top shows that the CPU usage of fuse_dfs is between 80-90% on the slower machines, and about 40% on the fastest one. This leads me to the conclusion that fuse_dfs consumes a lot of CPU resources, much more than expected. Any help or insight concerning this issue will be greatly appreciated, since these difference actually result in days of computations for a given job. Thank you -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-221) Trigger block scans for datanode
[ https://issues.apache.org/jira/browse/HDFS-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-221. --- Resolution: Duplicate I'm going to close this a dupe of HDFS-366. Trigger block scans for datanode Key: HDFS-221 URL: https://issues.apache.org/jira/browse/HDFS-221 Project: Hadoop HDFS Issue Type: New Feature Reporter: Brian Bockelman Assignee: Lei (Eddy) Xu Attachments: manual_block_scan.patch, manual_fsck_scan.patch Provide a mechanism to trigger block scans in a datanode upon request. Support interfaces for commands sent by the namenode and through the HTTP interface. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-366) Support manually fsck in DataNode
[ https://issues.apache.org/jira/browse/HDFS-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068838#comment-14068838 ] Allen Wittenauer commented on HDFS-366: --- I wonder how this works in a post-security world. Support manually fsck in DataNode - Key: HDFS-366 URL: https://issues.apache.org/jira/browse/HDFS-366 Project: Hadoop HDFS Issue Type: Improvement Reporter: Lei (Eddy) Xu Attachments: HADOOP-4763.patch, fsck.patch, fsck.patch Now DataNode only support scan all blocks periodically. Our site need a tool to check some blocks and files manually. My current design is to add a parameter to DFSck to indicate deeply and manually fsck request, then let NameNode collect property block identifies and sent them to associated DataNode. I'll let DataBlockScanner runs in two ways: periodically ( original one ) and manually. Any suggestions on this are welcome. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-166) NameNode#invalidateBlock's requirement on more than 1 valid replica exists before scheduling a replica to delete is too strict
[ https://issues.apache.org/jira/browse/HDFS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068861#comment-14068861 ] Allen Wittenauer commented on HDFS-166: --- Ping! I bet this has been fixed. NameNode#invalidateBlock's requirement on more than 1 valid replica exists before scheduling a replica to delete is too strict -- Key: HDFS-166 URL: https://issues.apache.org/jira/browse/HDFS-166 Project: Hadoop HDFS Issue Type: Bug Reporter: Hairong Kuang Currently invalideBlock does not allow to delete a replica only if at least two valid replicas exist before deletion is scheduled. This is too restrictive if the replica to delete is a corrupt one. NameNode could delete a corrupt replica as long as at least one copy (no matter valid or corrupt) will be left. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing
[ https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068869#comment-14068869 ] Benoy Antony commented on HDFS-6441: Thanks [~carp84], [~szetszwo] and [~arpitagarwal]. I'll rebase to the current trunk and will try to include Yu's suggestions. Add ability to exclude/include few datanodes while balancing Key: HDFS-6441 URL: https://issues.apache.org/jira/browse/HDFS-6441 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.4.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch In some use cases, it is desirable to ignore a few data nodes while balancing. The administrator should be able to specify a list of data nodes in a file similar to the hosts file and the balancer should ignore these data nodes while balancing so that no blocks are added/removed on these nodes. Similarly it will be beneficial to specify that only a particular list of datanodes should be considered for balancing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly
[ https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068895#comment-14068895 ] Devaraj Das commented on HDFS-6680: --- I am not sure if that loop needs to change. Seems to me that chooseTarget will have the same result with/without the change. I am missing something i guess.. BlockPlacementPolicyDefault does not choose favored nodes correctly --- Key: HDFS-6680 URL: https://issues.apache.org/jira/browse/HDFS-6680 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6680_20140714.patch, h6680_20140716.patch In one of the chooseTarget(..) methods, it tries all the favoredNodes to chooseLocalNode(..). It expects chooseLocalNode to return null if the local node is not a good target. Unfortunately, chooseLocalNode will fallback to chooseLocalRack but not returning null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6570) add api that enables checking if a user has certain permissions on a file
[ https://issues.apache.org/jira/browse/HDFS-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068894#comment-14068894 ] Colin Patrick McCabe commented on HDFS-6570: bq. acl.proto: I'm not sure it's backwards-compatible to take the existing FsActionProto nested inside AclEntryProto and move it to top level. If protobuf encodes the message name now as AclEntryProto.FsActionProto, then it might break interop. It would be interesting to test hdfs dfs -getfacl on files with ACLs using a mix of old client + new server or new client + old server. If there is a problem, then we might need to find a way to refer to the nested definition, or if all else fails maintain duplicate definitions (nested and top-level) just for comaptibility. Protobuf doesn't encode field names. It just assumes that the data you're giving it fits the schema you're giving it. As far as I know, moving the enum from nested to top-level will not change its representation.Enums are just represented as varints in protobuf... i.e. the same as uint32s is represented. Unless you're changing the value of the enum constants, it shouldn't change anything. So I believe this part is OK. add api that enables checking if a user has certain permissions on a file - Key: HDFS-6570 URL: https://issues.apache.org/jira/browse/HDFS-6570 Project: Hadoop HDFS Issue Type: Bug Reporter: Thejas M Nair Assignee: Jitendra Nath Pandey Attachments: HDFS-6570-prototype.1.patch, HDFS-6570.2.patch For some of the authorization modes in Hive, the servers in Hive check if a given user has permissions on a certain file or directory. For example, the storage based authorization mode allows hive table metadata to be modified only when the user has access to the corresponding table directory on hdfs. There are likely to be such use cases outside of Hive as well. HDFS does not provide an api for such checks. As a result, the logic to check if a user has permissions on a directory gets replicated in Hive. This results in duplicate logic and there introduces possibilities for inconsistencies in the interpretation of the permission model. This becomes a bigger problem with the complexity of ACL logic. HDFS should provide an api that provides functionality that is similar to access function in unistd.h - http://linux.die.net/man/2/access . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-313) Threads in servers should not die silently.
[ https://issues.apache.org/jira/browse/HDFS-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068896#comment-14068896 ] Allen Wittenauer commented on HDFS-313: --- This probably needs to get revisited so PING! Threads in servers should not die silently. --- Key: HDFS-313 URL: https://issues.apache.org/jira/browse/HDFS-313 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze If there is an uncaught exception, some threads in a server may die silently. The corresponding error message does not show up in the log. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-89) Datanode should verify block sizes vs metadata on startup
[ https://issues.apache.org/jira/browse/HDFS-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-89. -- Resolution: Duplicate Not only was it previously reported Brian, it was done by you! :D Closing as a dupe, as I found the JIRA. Datanode should verify block sizes vs metadata on startup - Key: HDFS-89 URL: https://issues.apache.org/jira/browse/HDFS-89 Project: Hadoop HDFS Issue Type: Bug Reporter: Brian Bockelman I could have sworn this bug had been reported by someone else already, but I can't find it on JIRA after searching apologies if this is a duplicate. The datanode, upon starting up, should check and make sure that all block sizes as reported via `stat` are the same as the block sizes as reported via the block's metadata. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-215) Offline Namenode fsImage verification
[ https://issues.apache.org/jira/browse/HDFS-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-215. --- Resolution: Fixed As Jakob points out, oiv does this. Closing as fixed. Offline Namenode fsImage verification - Key: HDFS-215 URL: https://issues.apache.org/jira/browse/HDFS-215 Project: Hadoop HDFS Issue Type: New Feature Reporter: Brian Bockelman Currently, there is no way to verify that a copy of the fsImage is not corrupt. I propose that we should have an offline tool that loads the fsImage into memory to see if it is usable. This will allow us to automate backup testing to some extent. One can start a namenode process on the fsImage to see if it can be loaded, but this is not easy to automate. To use HDFS in production, it is greatly desired to have both checkpoints - and have some idea that the checkpoints are valid! No one wants to see the day where they reload from backup only to find that the fsImage in the backup wasn't usable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-275) FSNamesystem should have an InvalidateBlockMap class to manage blocks scheduled to remove
[ https://issues.apache.org/jira/browse/HDFS-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068913#comment-14068913 ] Allen Wittenauer commented on HDFS-275: --- I'm going to link these, since they seem like competing goals. FSNamesystem should have an InvalidateBlockMap class to manage blocks scheduled to remove - Key: HDFS-275 URL: https://issues.apache.org/jira/browse/HDFS-275 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: invalidateBlocksMap.patch This jira intends to move the code that handles recentInvalideSet to a separate class InvalidateBlockMap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-207) add querying block's info in the fsck facility
[ https://issues.apache.org/jira/browse/HDFS-207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-207: -- Component/s: namenode add querying block's info in the fsck facility -- Key: HDFS-207 URL: https://issues.apache.org/jira/browse/HDFS-207 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Reporter: zhangwei Assignee: zhangwei Priority: Minor Attachments: HADOOP-5019-2.patch, HADOOP-5019.patch Original Estimate: 24h Remaining Estimate: 24h As now the fsck can do pretty well,but when the developer happened to the log such Block blk_28622148 is not valid.etc We wish to know which file and the datanodes the block belongs to.It can be solved by running bin/hadoop fsck -files -blocks -locations / | grep blockid ,but as mentioned early in the HADOOP-4945 ,it's not an effective way in a big product cluster. so maybe we could do something to let the fsck more convenience . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-207) add querying block's info in the fsck facility
[ https://issues.apache.org/jira/browse/HDFS-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068919#comment-14068919 ] Allen Wittenauer commented on HDFS-207: --- Ping! add querying block's info in the fsck facility -- Key: HDFS-207 URL: https://issues.apache.org/jira/browse/HDFS-207 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Reporter: zhangwei Assignee: zhangwei Priority: Minor Attachments: HADOOP-5019-2.patch, HADOOP-5019.patch Original Estimate: 24h Remaining Estimate: 24h As now the fsck can do pretty well,but when the developer happened to the log such Block blk_28622148 is not valid.etc We wish to know which file and the datanodes the block belongs to.It can be solved by running bin/hadoop fsck -files -blocks -locations / | grep blockid ,but as mentioned early in the HADOOP-4945 ,it's not an effective way in a big product cluster. so maybe we could do something to let the fsck more convenience . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem
[ https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068935#comment-14068935 ] Colin Patrick McCabe commented on HDFS-6707: Good find, Yongjun. Parsing the output of this shell command is definitely worrisome. It has been the source of bunch of bugs in the past. Unfortunately, even if we came up with a JNI method to do this, not everyone would use it, so here we are. Can we do this by skipping over the first code point following the comma, rather than by looking for a specific quote type? That seems more flexible in case there are any more wacky variants. Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem - Key: HDFS-6707 URL: https://issues.apache.org/jira/browse/HDFS-6707 Project: Hadoop HDFS Issue Type: Bug Components: symlinks Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, HDFS-6707.003.dbg.patch, HDFS-6707.004.patch Symlink tests failure happened from time to time, https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/ https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/ {code} Failed org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink Failing for the past 1 build (Since Failed#7376 ) Took 83 ms. Error Message Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link Stacktrace java.io.IOException: Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266) at org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Standard Output 2014-07-17 23:31:37,770 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile': No such file or directory 2014-07-17 23:31:38,109 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile': File exists {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6570) add api that enables checking if a user has certain permissions on a file
[ https://issues.apache.org/jira/browse/HDFS-6570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068931#comment-14068931 ] Chris Nauroth commented on HDFS-6570: - Thanks, Colin. add api that enables checking if a user has certain permissions on a file - Key: HDFS-6570 URL: https://issues.apache.org/jira/browse/HDFS-6570 Project: Hadoop HDFS Issue Type: Bug Reporter: Thejas M Nair Assignee: Jitendra Nath Pandey Attachments: HDFS-6570-prototype.1.patch, HDFS-6570.2.patch For some of the authorization modes in Hive, the servers in Hive check if a given user has permissions on a certain file or directory. For example, the storage based authorization mode allows hive table metadata to be modified only when the user has access to the corresponding table directory on hdfs. There are likely to be such use cases outside of Hive as well. HDFS does not provide an api for such checks. As a result, the logic to check if a user has permissions on a directory gets replicated in Hive. This results in duplicate logic and there introduces possibilities for inconsistencies in the interpretation of the permission model. This becomes a bigger problem with the complexity of ACL logic. HDFS should provide an api that provides functionality that is similar to access function in unistd.h - http://linux.die.net/man/2/access . -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6701) Make seed optional in NetworkTopology#sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-6701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068943#comment-14068943 ] Andrew Wang commented on HDFS-6701: --- Hi Ashwin, Just a nitty thing, otherwise +1: {code} +property + namedfs.namenode.randomize-block-locations-per-block/name + valuefalse/value + descriptionWhen there is no node local block, the default behavior +while getting block locations is that - block locations of a block +are not randomized,so requests for a block go to same replica to take +advantage of page cache effects. +However, in some network topologies,hitting the same replica may cause +issues like container taking a long time to download from hdfs and eventually +failing. In these cases, we could make this property true and randomize +block locations of a block, which in turn would load balance requests +among replicas. + /description +/property {code} * that - block locations remove dash * randomized,so needs space * topologies,hitting needs space * hdfs should be HDFS * Since this is XML, quotes need to be escaped. Or you can just remove them. Line breaks are also not going to show up. Recommend something like the following (feel free to copy paste): When fetching replica locations of a block, the replicas are sorted based on network distance. This configuration parameter determines whether the replicas at the same network distance are randomly shuffled. By default, this is false, such that repeated requests for a block's replicas always result in the same order. This potentially improves page cache behavior. However, for some network topologies, it is desirable to shuffle this order for better load balancing. Make seed optional in NetworkTopology#sortByDistance Key: HDFS-6701 URL: https://issues.apache.org/jira/browse/HDFS-6701 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Attachments: HDFS-6701-v1.txt, HDFS-6701-v3-branch2.txt, HDFS-6701-v3.txt Currently seed in NetworkTopology#sortByDistance is set to the blockid which causes the RNG to generate same pseudo random order for each block. If no node local block location is present,this causes the same rack local replica to be hit for a particular block. It'll be good to make the seed optional, so that one could turn it off if they want block locations of a block to be randomized. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-118) Namenode clients should recover from connection or Namenode restarts
[ https://issues.apache.org/jira/browse/HDFS-118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-118. --- Resolution: Fixed HA! As in highly available has been added to the NN which helps mitigate this problem so closing as fixed. Namenode clients should recover from connection or Namenode restarts Key: HDFS-118 URL: https://issues.apache.org/jira/browse/HDFS-118 Project: Hadoop HDFS Issue Type: Bug Reporter: Suresh Srinivas Assignee: Suresh Srinivas This Jira discusses the client side recovery from namenode restarts, fail overs and network connectivity issues. This does not address Namenode high availability and tracks only the client side recovery. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-125) Consistency of different replicas of the same block is not checked.
[ https://issues.apache.org/jira/browse/HDFS-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068950#comment-14068950 ] Allen Wittenauer commented on HDFS-125: --- This feels like a special case of the problems mentioned in HDFS-366 . Consistency of different replicas of the same block is not checked. --- Key: HDFS-125 URL: https://issues.apache.org/jira/browse/HDFS-125 Project: Hadoop HDFS Issue Type: Bug Reporter: Konstantin Shvachko HDFS currently detects corrupted replicas by verifying that its contents matches the checksum stored in the block meta-file. This is done independently for each replica of the block on the data-node it belongs to. But we do not check that the replicas are identical across data-nodes as long as they have the same size. This is not common but can happen as a result of a software bug or an operator mismanagement. And in this case different clients will read different data from the same file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-125) Consistency of different replicas of the same block is not checked.
[ https://issues.apache.org/jira/browse/HDFS-125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068950#comment-14068950 ] Allen Wittenauer edited comment on HDFS-125 at 7/21/14 6:12 PM: This feels like a special case of the problems mentioned in HDFS-366 and friends. was (Author: aw): This feels like a special case of the problems mentioned in HDFS-366 . Consistency of different replicas of the same block is not checked. --- Key: HDFS-125 URL: https://issues.apache.org/jira/browse/HDFS-125 Project: Hadoop HDFS Issue Type: Bug Reporter: Konstantin Shvachko HDFS currently detects corrupted replicas by verifying that its contents matches the checksum stored in the block meta-file. This is done independently for each replica of the block on the data-node it belongs to. But we do not check that the replicas are identical across data-nodes as long as they have the same size. This is not common but can happen as a result of a software bug or an operator mismanagement. And in this case different clients will read different data from the same file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6455) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068962#comment-14068962 ] Brandon Li commented on HDFS-6455: -- Thank you, [~abutala]. The patch looks good. It would be nice to add a unit test to validate the fix. You can use TestReaddir as a reference to add the unit test. NFS: Exception should be added in NFS log for invalid separator in allowed.hosts Key: HDFS-6455 URL: https://issues.apache.org/jira/browse/HDFS-6455 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Attachments: HDFS-6455.002.patch, HDFS-6455.patch The error for invalid separator in dfs.nfs.exports.allowed.hosts property should be added in nfs log file instead nfs.out file. Steps to reproduce: 1. Pass invalid separator in dfs.nfs.exports.allowed.hosts {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1 ro:host2 rw/value/property {noformat} 2. restart NFS server. NFS server fails to start and print exception console. {noformat} [hrt_qa@host1 hwqe]$ ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null host1 sudo su - -c \/usr/lib/hadoop/sbin/hadoop-daemon.sh start nfs3\ hdfs starting nfs3, logging to /tmp/log/hadoop/hdfs/hadoop-hdfs-nfs3-horst1.out DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) {noformat} NFS log does not print any error message. It directly shuts down. {noformat} STARTUP_MSG: java = 1.6.0_31 / 2014-05-27 18:47:13,972 INFO nfs3.Nfs3Base (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT] 2014-05-27 18:47:14,169 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated user map size:259 2014-05-27 18:47:14,179 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated group map size:73 2014-05-27 18:47:14,192 INFO nfs3.Nfs3Base (StringUtils.java:run(640)) - SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down Nfs3 at {noformat} NFS.out file has exception. {noformat} EPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) ulimit -a for user hdfs core file size (blocks, -c) 409600 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 188893 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem
[ https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068965#comment-14068965 ] Yongjun Zhang commented on HDFS-6707: - HI Colin, thanks for the comments. Same feelings on my side that I wish we don't have to depend on parsing shell output. If you look at the patch, you can see that I've already got rid of the dependency on quote type, I used - as the separator to find the link target, which hopefully is more robust. Thanks. Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem - Key: HDFS-6707 URL: https://issues.apache.org/jira/browse/HDFS-6707 Project: Hadoop HDFS Issue Type: Bug Components: symlinks Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, HDFS-6707.003.dbg.patch, HDFS-6707.004.patch Symlink tests failure happened from time to time, https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/ https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/ {code} Failed org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink Failing for the past 1 build (Since Failed#7376 ) Took 83 ms. Error Message Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link Stacktrace java.io.IOException: Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266) at org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Standard Output 2014-07-17 23:31:37,770 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile': No such file or directory 2014-07-17 23:31:38,109 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile': File exists {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-327) DataNode should warn about unknown files in storage
[ https://issues.apache.org/jira/browse/HDFS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-327: -- Labels: newbie (was: ) DataNode should warn about unknown files in storage --- Key: HDFS-327 URL: https://issues.apache.org/jira/browse/HDFS-327 Project: Hadoop HDFS Issue Type: Improvement Reporter: Raghu Angadi Assignee: Jakob Homan Labels: newbie DataNode currently just ignores the files it does not know about. There could be a lot of files left in DataNode's storage that never get noticed or deleted. These files could be left because of bugs or by a misconfiguration. E.g. while upgrading from 0.17, DN left a lot of metada files that were not named in correct format for 0.18 (HADOOP-4663). The proposal here is simply to make DN print a warning for each of the unknown files at the start up. This at least gives a way to list all the unknown files and (equally importantly) forces a notion of known and unknown files in the storage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-327) DataNode should warn about unknown files in storage
[ https://issues.apache.org/jira/browse/HDFS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-327: - Assignee: (was: Jakob Homan) DataNode should warn about unknown files in storage --- Key: HDFS-327 URL: https://issues.apache.org/jira/browse/HDFS-327 Project: Hadoop HDFS Issue Type: Improvement Reporter: Raghu Angadi Labels: newbie DataNode currently just ignores the files it does not know about. There could be a lot of files left in DataNode's storage that never get noticed or deleted. These files could be left because of bugs or by a misconfiguration. E.g. while upgrading from 0.17, DN left a lot of metada files that were not named in correct format for 0.18 (HADOOP-4663). The proposal here is simply to make DN print a warning for each of the unknown files at the start up. This at least gives a way to list all the unknown files and (equally importantly) forces a notion of known and unknown files in the storage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem
[ https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068990#comment-14068990 ] Colin Patrick McCabe commented on HDFS-6707: +1 once you add a unit test to TestStat.java (and pending Jenkins, of course) Thanks, Yongjun. Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem - Key: HDFS-6707 URL: https://issues.apache.org/jira/browse/HDFS-6707 Project: Hadoop HDFS Issue Type: Bug Components: symlinks Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, HDFS-6707.003.dbg.patch, HDFS-6707.004.patch Symlink tests failure happened from time to time, https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/ https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/ {code} Failed org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink Failing for the past 1 build (Since Failed#7376 ) Took 83 ms. Error Message Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link Stacktrace java.io.IOException: Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266) at org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Standard Output 2014-07-17 23:31:37,770 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile': No such file or directory 2014-07-17 23:31:38,109 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile': File exists {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-286) Move Datanode packet IO logging to its own log
[ https://issues.apache.org/jira/browse/HDFS-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-286: -- Labels: newbie (was: ) Move Datanode packet IO logging to its own log -- Key: HDFS-286 URL: https://issues.apache.org/jira/browse/HDFS-286 Project: Hadoop HDFS Issue Type: Improvement Reporter: Steve Loughran Priority: Minor Labels: newbie If the Datanode is set to log at info, then the log fills up with lots of details about packet sending and receiving [sf-startdaemon-debug] 09/01/28 13:15:42 [org.apache.hadoop.hdfs.server.datanode.DataXceiver@83efe6e] INFO datanode.DataNode : Receiving block blk_-3185775405544105186_1757 src: /127.0.0.1:41218 dest: /127.0.0.1:48017 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block blk_-3185775405544105186_1757] INFO datanode.DataNode : Received block blk_-3185775405544105186_1757 of size 3647 from /127.0.0.1:41218 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block blk_-3185775405544105186_1757] INFO datanode.DataNode : PacketResponder 0 for block blk_-3185775405544105186_1757 terminating [sf-startdaemon-debug] 09/01/28 13:15:42 [org.apache.hadoop.hdfs.server.datanode.DataXceiver@83f0029] INFO datanode.DataNode : Receiving block blk_-1511363731410268168_1758 src: /127.0.0.1:41219 dest: /127.0.0.1:48017 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block blk_-1511363731410268168_1758] INFO datanode.DataNode : Received block blk_-1511363731410268168_1758 of size 940 from /127.0.0.1:41219 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block blk_-1511363731410268168_1758] INFO datanode.DataNode : PacketResponder 0 for block blk_-1511363731410268168_1758 terminating [sf-startdaemon-debug] 09/01/28 13:15:42 [org.apache.hadoop.hdfs.server.datanode.DataXceiver@83f01e4] INFO datanode.DataNode : Receiving block blk_-967265843864311176_1759 src: /127.0.0.1:41220 dest: /127.0.0.1:48017 [sf-startdaemon-debug] 09/01/28 13:15:42 [PacketResponder 0 for Block blk_-967265843864311176_1759] INFO datanode.DataNode : Received block blk_-967265843864311176_1759 of size 36948 from /127.0.0.1:41220 It would be convenient for those people who only want to see errors in communication but monitor other DataNode operations to have a separate logger for DataNode communications; one to view at a separate log level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met
[ https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069031#comment-14069031 ] Brandon Li commented on HDFS-6706: -- With further investigation, we found this is due to a misconfiguration. Closing as invalid. ZKFailoverController failed to recognize the quorum is not met -- Key: HDFS-6706 URL: https://issues.apache.org/jira/browse/HDFS-6706 Project: Hadoop HDFS Issue Type: Bug Reporter: Brandon Li Assignee: Brandon Li Thanks Kenny Zhang for finding this problem. The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc -format doesn't log the real problem. And then user will see the error message instead of the real issue when starting zkfc: 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. Parent znode does not exist. Run with -formatZK flag to initialize ZooKeeper. 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received create error from Zookeeper. code:NONODE for path /hadoop-ha/prodcluster/ActiveStandbyElectorLock 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 txntype:-1 reqpath:n/a Error Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock To reproduce the problem: 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum to 3. 2. start two zookeeper servers. 3. do hdfs zkfc -format, and then hdfs zkfc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount
[ https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6703: - Affects Version/s: (was: 2.6.0) 2.2.0 NFS: Files can be deleted from a read-only mount Key: HDFS-6703 URL: https://issues.apache.org/jira/browse/HDFS-6703 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Abhiraj Butala Assignee: Srikanth Upputuri As reported by bigdatagroup bigdatagr...@itecons.it on hadoop-users mailing list: {code} We exported our distributed filesystem with the following configuration (Managed by Cloudera Manager over CDH 5.0.1): property namedfs.nfs.exports.allowed.hosts/name value192.168.0.153 ro/value /property As you can see, we expect the exported FS to be read-only, but in fact we are able to delete files and folders stored on it (where the user has the correct permissions), from the client machine that mounted the FS. Other writing operations are correctly blocked. Hadoop Version in use: 2.3.0+cdh5.0.1+567 {code} I was able to reproduce the issue on latest hadoop trunk. Though I could only delete files, deleting directories were correctly blocked: {code} abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1) abutala@abutala-vBox:/mnt/hdfs$ ls -lh total 512 -rw-r--r-- 1 abutala supergroup 0 Jul 17 18:51 abc.txt drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt abutala@abutala-vBox:/mnt/hdfs$ ls temp abutala@abutala-vBox:/mnt/hdfs$ rm -r temp rm: cannot remove `temp': Permission denied abutala@abutala-vBox:/mnt/hdfs$ ls temp abutala@abutala-vBox:/mnt/hdfs$ {code} Contents of hdfs-site.xml: {code} configuration property namedfs.nfs3.dump.dir/name value/tmp/.hdfs-nfs3/value /property property namedfs.nfs.exports.allowed.hosts/name valuelocalhost ro/value /property /configuration {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6703) NFS: Files can be deleted from a read-only mount
[ https://issues.apache.org/jira/browse/HDFS-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6703: - Affects Version/s: 2.6.0 NFS: Files can be deleted from a read-only mount Key: HDFS-6703 URL: https://issues.apache.org/jira/browse/HDFS-6703 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Abhiraj Butala Assignee: Srikanth Upputuri As reported by bigdatagroup bigdatagr...@itecons.it on hadoop-users mailing list: {code} We exported our distributed filesystem with the following configuration (Managed by Cloudera Manager over CDH 5.0.1): property namedfs.nfs.exports.allowed.hosts/name value192.168.0.153 ro/value /property As you can see, we expect the exported FS to be read-only, but in fact we are able to delete files and folders stored on it (where the user has the correct permissions), from the client machine that mounted the FS. Other writing operations are correctly blocked. Hadoop Version in use: 2.3.0+cdh5.0.1+567 {code} I was able to reproduce the issue on latest hadoop trunk. Though I could only delete files, deleting directories were correctly blocked: {code} abutala@abutala-vBox:/mnt/hdfs$ mount | grep 127 127.0.1.1:/ on /mnt/hdfs type nfs (rw,vers=3,proto=tcp,nolock,addr=127.0.1.1) abutala@abutala-vBox:/mnt/hdfs$ ls -lh total 512 -rw-r--r-- 1 abutala supergroup 0 Jul 17 18:51 abc.txt drwxr-xr-x 2 abutala supergroup 64 Jul 17 18:31 temp abutala@abutala-vBox:/mnt/hdfs$ rm abc.txt abutala@abutala-vBox:/mnt/hdfs$ ls temp abutala@abutala-vBox:/mnt/hdfs$ rm -r temp rm: cannot remove `temp': Permission denied abutala@abutala-vBox:/mnt/hdfs$ ls temp abutala@abutala-vBox:/mnt/hdfs$ {code} Contents of hdfs-site.xml: {code} configuration property namedfs.nfs3.dump.dir/name value/tmp/.hdfs-nfs3/value /property property namedfs.nfs.exports.allowed.hosts/name valuelocalhost ro/value /property /configuration {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met
[ https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li resolved HDFS-6706. -- Resolution: Invalid ZKFailoverController failed to recognize the quorum is not met -- Key: HDFS-6706 URL: https://issues.apache.org/jira/browse/HDFS-6706 Project: Hadoop HDFS Issue Type: Bug Reporter: Brandon Li Assignee: Brandon Li Thanks Kenny Zhang for finding this problem. The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc -format doesn't log the real problem. And then user will see the error message instead of the real issue when starting zkfc: 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. Parent znode does not exist. Run with -formatZK flag to initialize ZooKeeper. 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received create error from Zookeeper. code:NONODE for path /hadoop-ha/prodcluster/ActiveStandbyElectorLock 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 txntype:-1 reqpath:n/a Error Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock To reproduce the problem: 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum to 3. 2. start two zookeeper servers. 3. do hdfs zkfc -format, and then hdfs zkfc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6707) Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem
[ https://issues.apache.org/jira/browse/HDFS-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069032#comment-14069032 ] Yongjun Zhang commented on HDFS-6707: - Thanks Colin, will do! Intermittent failure of Symlink tests TestSymlinkLocalFSFileContext,TestSymlinkLocalFSFileSystem - Key: HDFS-6707 URL: https://issues.apache.org/jira/browse/HDFS-6707 Project: Hadoop HDFS Issue Type: Bug Components: symlinks Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6707.001.patch, HDFS-6707.002.dbg.patch, HDFS-6707.003.dbg.patch, HDFS-6707.004.patch Symlink tests failure happened from time to time, https://builds.apache.org/job/PreCommit-HDFS-Build/7383//testReport/ https://builds.apache.org/job/PreCommit-HDFS-Build/7376/testReport/ {code} Failed org.apache.hadoop.fs.TestSymlinkLocalFSFileContext.testDanglingLink Failing for the past 1 build (Since Failed#7376 ) Took 83 ms. Error Message Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link Stacktrace java.io.IOException: Path file:/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile is not a symbolic link at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:266) at org.apache.hadoop.fs.TestSymlinkLocalFS.testDanglingLink(TestSymlinkLocalFS.java:163) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Standard Output 2014-07-17 23:31:37,770 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test2/linkToFile': No such file or directory 2014-07-17 23:31:38,109 WARN fs.FileUtil (FileUtil.java:symLink(829)) - Command 'ln -s /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/file /home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile' failed 1 with: ln: failed to create symbolic link '/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-common-project/hadoop-common/target/test/data/RtGBheUh4y/test1/linkToFile': File exists {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly
[ https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069061#comment-14069061 ] Tsz Wo Nicholas Sze commented on HDFS-6680: --- {code} - for (int i = 0; i Math.min(favoredNodes.size(), numOfReplicas); i++) { + for (int i = 0; i favoredNodes.size() results.size() numOfReplicas; i++) { {code} I found out this bug by the new test. Consider favoredNodes.size() == 4 and numOfReplicas == 3. So min is 3 and only the first 3 datanodes will be tried before the change. If the one of these three datanodes is not chosen, it won't try the 4th datanode. BlockPlacementPolicyDefault does not choose favored nodes correctly --- Key: HDFS-6680 URL: https://issues.apache.org/jira/browse/HDFS-6680 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6680_20140714.patch, h6680_20140716.patch In one of the chooseTarget(..) methods, it tries all the favoredNodes to chooseLocalNode(..). It expects chooseLocalNode to return null if the local node is not a good target. Unfortunately, chooseLocalNode will fallback to chooseLocalRack but not returning null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-301) Provide better error messages when fs.default.name is invalid
[ https://issues.apache.org/jira/browse/HDFS-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-301: -- Labels: newbie (was: ) Provide better error messages when fs.default.name is invalid - Key: HDFS-301 URL: https://issues.apache.org/jira/browse/HDFS-301 Project: Hadoop HDFS Issue Type: Improvement Reporter: Steve Loughran Assignee: Steve Loughran Priority: Minor Labels: newbie Attachments: HADOOP-5095-1.patch this the followon to HADOOP-5687 - its not enough to detect bad uris, we need good error messages and a set of tests to make sure everything works as intended. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-196) File length not reported correctly after application crash
[ https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069093#comment-14069093 ] Allen Wittenauer commented on HDFS-196: --- Ping! I'm tempted to close this as stale, but it would be good for someone more familiar with the issue to do that. File length not reported correctly after application crash -- Key: HDFS-196 URL: https://issues.apache.org/jira/browse/HDFS-196 Project: Hadoop HDFS Issue Type: Bug Reporter: Doug Judd Our application (Hypertable) creates a transaction log in HDFS. This log is written with the following pattern: out_stream.write(header, 0, 7); out_stream.sync() out_stream.write(data, 0, amount); out_stream.sync() [...] However, if the application crashes and then comes back up again, the following statement length = mFilesystem.getFileStatus(new Path(fileName)).getLen(); returns the wrong length. Apparently this is because this method fetches length information from the NameNode which is stale. Ideally, a call to getFileStatus() would return the accurate file length by fetching the size of the last block from the primary datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6441) Add ability to exclude/include few datanodes while balancing
[ https://issues.apache.org/jira/browse/HDFS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069099#comment-14069099 ] Tsz Wo Nicholas Sze commented on HDFS-6441: --- [~benoyantony], you may simply use the code in HDFS-6010. Add ability to exclude/include few datanodes while balancing Key: HDFS-6441 URL: https://issues.apache.org/jira/browse/HDFS-6441 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.4.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch, HDFS-6441.patch In some use cases, it is desirable to ignore a few data nodes while balancing. The administrator should be able to specify a list of data nodes in a file similar to the hosts file and the balancer should ignore these data nodes while balancing so that no blocks are added/removed on these nodes. Similarly it will be beneficial to specify that only a particular list of datanodes should be considered for balancing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-402) Display the server version in dfsadmin -report
[ https://issues.apache.org/jira/browse/HDFS-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069096#comment-14069096 ] Allen Wittenauer commented on HDFS-402: --- Is this still a viable idea, especially as we move towards rolling upgrades? What is the version at that point? Display the server version in dfsadmin -report -- Key: HDFS-402 URL: https://issues.apache.org/jira/browse/HDFS-402 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jakob Homan Assignee: Uma Maheswara Rao G Priority: Minor Labels: newbie Attachments: HDFS-402.patch, HDFS-402.patch, HDFS-402.patch, hdfs-402.txt As part of HADOOP-5094, it was requested to include the server version in the dfsadmin -report, to avoid the need to screen scrape to get this information: bq. Please do provide the server version, so there is a quick and non-taxing way of determine what is the current running version on the namenode. Currently there is nothing in the dfs client protocol to query this information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-196) File length not reported correctly after application crash
[ https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze resolved HDFS-196. -- Resolution: Not a Problem sync() does not update the length in NN. So getFileSatus() will return the correct length immediately as Dhruba mentioned. Anyway, sync() is already removed from trunk (HDFS-3034). hsync(..) with UPDATE_LENGTH flag could be used instead. So this becomes not-a-problem anymore. Resolving ... File length not reported correctly after application crash -- Key: HDFS-196 URL: https://issues.apache.org/jira/browse/HDFS-196 Project: Hadoop HDFS Issue Type: Bug Reporter: Doug Judd Our application (Hypertable) creates a transaction log in HDFS. This log is written with the following pattern: out_stream.write(header, 0, 7); out_stream.sync() out_stream.write(data, 0, amount); out_stream.sync() [...] However, if the application crashes and then comes back up again, the following statement length = mFilesystem.getFileStatus(new Path(fileName)).getLen(); returns the wrong length. Apparently this is because this method fetches length information from the NameNode which is stale. Ideally, a call to getFileStatus() would return the accurate file length by fetching the size of the last block from the primary datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-196) File length not reported correctly after application crash
[ https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069112#comment-14069112 ] Tsz Wo Nicholas Sze commented on HDFS-196: -- ... getFileSatus() will return the correct length ... Oops, it should be ... getFileSatus() will NOT return the correct length File length not reported correctly after application crash -- Key: HDFS-196 URL: https://issues.apache.org/jira/browse/HDFS-196 Project: Hadoop HDFS Issue Type: Bug Reporter: Doug Judd Our application (Hypertable) creates a transaction log in HDFS. This log is written with the following pattern: out_stream.write(header, 0, 7); out_stream.sync() out_stream.write(data, 0, amount); out_stream.sync() [...] However, if the application crashes and then comes back up again, the following statement length = mFilesystem.getFileStatus(new Path(fileName)).getLen(); returns the wrong length. Apparently this is because this method fetches length information from the NameNode which is stale. Ideally, a call to getFileStatus() would return the accurate file length by fetching the size of the last block from the primary datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-219) Add md5sum facility in dfsshell
[ https://issues.apache.org/jira/browse/HDFS-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-219: -- Labels: newbie (was: ) Add md5sum facility in dfsshell --- Key: HDFS-219 URL: https://issues.apache.org/jira/browse/HDFS-219 Project: Hadoop HDFS Issue Type: New Feature Reporter: zhangwei Labels: newbie I think it would be usefull to add md5sum (or anyone else) to dfsshell ,and the facility can verify the file on hdfs.It can confirm the file is integrity after copyFromLocal or copyToLocal. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-374) HDFS needs to support a very large number of open files.
[ https://issues.apache.org/jira/browse/HDFS-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069139#comment-14069139 ] Colin Patrick McCabe commented on HDFS-374: --- oh, and also, using short-circuit reads mitigates this somewhat as well. We can share the same file descriptor across multiple instances of a short-circuit block file being opened, as well. HDFS needs to support a very large number of open files. Key: HDFS-374 URL: https://issues.apache.org/jira/browse/HDFS-374 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jim Kellerman Currently, DFSClient maintains one socket per open file. For most map/reduce operations, this is not a problem because there just aren't many open files. However, HBase has a very different usage model in which a single region region server could have thousands (10**3 but less than 10**4) open files. This can cause both datanodes and region servers to run out of file handles. What I would like to see is one connection for each dfsClient, datanode pair. This would reduce the number of connections to hundreds or tens of sockets. The intent is not to process requests totally asychronously (overlapping block reads and forcing the client to reassemble a whole message out of a bunch of fragments), but rather to queue requests from the client to the datanode and process them serially, differing from the current implementation in that rather than use an exclusive socket for each file, only one socket is in use between the client and a particular datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-374) HDFS needs to support a very large number of open files.
[ https://issues.apache.org/jira/browse/HDFS-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069136#comment-14069136 ] Colin Patrick McCabe commented on HDFS-374: --- This is less of an issue than it once was, due to HBase using pread (positional read) more. pread creates a new RemoteBlockReader each time, but closes it immediately after, returning the socket to the client-side socket cache (PeerCache). However, running out of open file descriptors could still be an issue with some HBase configurations. Anyway, I agree with closing this since in all the cases I've seen, fixing the configuration resolved the issue. I also don't like the queueing idea presented here since it would increase latency. HDFS needs to support a very large number of open files. Key: HDFS-374 URL: https://issues.apache.org/jira/browse/HDFS-374 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jim Kellerman Currently, DFSClient maintains one socket per open file. For most map/reduce operations, this is not a problem because there just aren't many open files. However, HBase has a very different usage model in which a single region region server could have thousands (10**3 but less than 10**4) open files. This can cause both datanodes and region servers to run out of file handles. What I would like to see is one connection for each dfsClient, datanode pair. This would reduce the number of connections to hundreds or tens of sockets. The intent is not to process requests totally asychronously (overlapping block reads and forcing the client to reassemble a whole message out of a bunch of fragments), but rather to queue requests from the client to the datanode and process them serially, differing from the current implementation in that rather than use an exclusive socket for each file, only one socket is in use between the client and a particular datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-405) Several unit tests failing on Windows frequently
[ https://issues.apache.org/jira/browse/HDFS-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-405. --- Resolution: Fixed Closing as stale. Several unit tests failing on Windows frequently Key: HDFS-405 URL: https://issues.apache.org/jira/browse/HDFS-405 Project: Hadoop HDFS Issue Type: Test Environment: Windows Reporter: Ramya Sunil Priority: Minor This issue is similar to HADOOP-5114. A huge number of unit tests are failing on Windows on branch 18 consistently. 0.21 is showing the maximum number of failures. Failures on other branches are a subset of failures observed in 0.21. Below is the list of failures observed on 0.21. * java.io.IOException: Job failed! ** TestJobName - testComplexNameWithRegex ** TestJobStatusPersistency - testNonPersistency, testPersistency ** TestJobSysDirWithDFS - testWithDFS ** TestKillCompletedJob - testKillCompJob ** TestMiniMRClasspath - testClassPath, testExternalWritable ** TestMiniMRDFSCaching - testWithDFS ** TestMiniMRDFSSort - testMapReduceSort, testMapReduceSortWithJvmReuse ** TestMiniMRLocalFS - testWithLocal ** TestMiniMRWithDFS - testWithDFS, testWithDFSWithDefaultPort ** TestMiniMRWithDFSWithDistinctUsers - testDistinctUsers ** TestMultipleLevelCaching - testMultiLevelCaching ** TestQueueManager - testAllEnabledACLForJobSubmission, testEnabledACLForNonDefaultQueue, testUserEnabledACLForJobSubmission, testGroupsEnabledACLForJobSubmission ** TestRackAwareTaskPlacement - testTaskPlacement ** TestReduceFetch - testReduceFromDisk, testReduceFromPartialMem, testReduceFromMem ** TestSpecialCharactersInOutputPath - testJobWithDFS ** TestTTMemoryReporting - testDefaultMemoryValues, testConfiguredMemoryValues ** TestTrackerBlacklistAcrossJobs - testBlacklistAcrossJobs ** TestUserDefinedCounters - testMapReduceJob ** TestDBJob - testRun ** TestServiceLevelAuthorization - testServiceLevelAuthorization ** TestNoDefaultsJobConf - testNoDefaults ** TestBadRecords - testBadMapRed ** TestClusterMRNotification - testMR ** TestClusterMapReduceTestCase - testMapReduce, testMapReduceRestarting ** TestCommandLineJobSubmission - testJobShell ** TestCompressedEmptyMapOutputs - testMapReduceSortWithCompressedEmptyMapOutputs ** TestCustomOutputCommitter - testCommitter ** TestJavaSerialization - testMapReduceJob, testWriteToSequencefile ** TestJobClient - testGetCounter, testJobList, testChangingJobPriority ** TestJobName - testComplexName * java.lang.IllegalArgumentException: Pathname /path from Cpath is not a valid DFS filename. ** TestJobQueueInformation - testJobQueues ** TestJobInProgress - testRunningTaskCount ** TestJobTrackerRestart - testJobTrackerRestart * Timeout ** TestKillSubProcesses - testJobKill ** TestMiniMRMapRedDebugScript - testMapDebugScript ** TestControlledMapReduceJob - testControlledMapReduceJob ** TestJobInProgressListener - testJobQueueChanges ** TestJobKillAndFail - testJobFailAndKill * junit.framework.AssertionFailedError ** TestMRServerPorts - testJobTrackerPorts, testTaskTrackerPorts ** TestMiniMRTaskTempDir - testTaskTempDir ** TestTaskFail - testWithDFS ** TestTaskLimits - testTaskLimits ** TestMapReduceLocal - testWithLocal ** TestCLI - testAll ** TestHarFileSystem - testArchives ** TestTrash - testTrash, testNonDefaultFS ** TestHDFSServerPorts - testNameNodePorts, testDataNodePorts, testSecondaryNodePorts ** TestHDFSTrash - testNonDefaultFS ** TestFileOutputFormat - testCustomFile * org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.authorize.AuthorizationException: java.security.AccessControlException: access denied ConnectionPermission(org.apache.hadoop.security.authorize.RefreshAuthorizationPolicyProtocol) ** TestServiceLevelAuthorization - testRefresh * junit.framework.ComparisonFailure ** TestDistCh - testDistCh * java.io.FileNotFoundException ** TestCopyFiles - testMapCount -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-402) Display the server version in dfsadmin -report
[ https://issues.apache.org/jira/browse/HDFS-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069176#comment-14069176 ] Hadoop QA commented on HDFS-402: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12531921/hdfs-402.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7410//console This message is automatically generated. Display the server version in dfsadmin -report -- Key: HDFS-402 URL: https://issues.apache.org/jira/browse/HDFS-402 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jakob Homan Assignee: Uma Maheswara Rao G Priority: Minor Labels: newbie Attachments: HDFS-402.patch, HDFS-402.patch, HDFS-402.patch, hdfs-402.txt As part of HADOOP-5094, it was requested to include the server version in the dfsadmin -report, to avoid the need to screen scrape to get this information: bq. Please do provide the server version, so there is a quick and non-taxing way of determine what is the current running version on the namenode. Currently there is nothing in the dfs client protocol to query this information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-465) in branch-1, libhdfs makes jni lib calls after setting errno in some places
[ https://issues.apache.org/jira/browse/HDFS-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069184#comment-14069184 ] Colin Patrick McCabe commented on HDFS-465: --- Yeah. This was fixed in HDFS-3579. This JIRA was just a placeholder in case people wanted to fix it in branch-1 and earlier. in branch-1, libhdfs makes jni lib calls after setting errno in some places --- Key: HDFS-465 URL: https://issues.apache.org/jira/browse/HDFS-465 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.1.2 Reporter: Pete Wyckoff Assignee: Pete Wyckoff Attachments: HADOOP-4636.txt errno can be affected by other lib calls, so should always be set right before return stmt and never before making other library calls. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-332) hadoop fs -put should return different code for different failures
[ https://issues.apache.org/jira/browse/HDFS-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-332. --- Resolution: Won't Fix Returning -1 for failure at the shell is pretty normal. Won't fix. hadoop fs -put should return different code for different failures -- Key: HDFS-332 URL: https://issues.apache.org/jira/browse/HDFS-332 Project: Hadoop HDFS Issue Type: Improvement Reporter: Runping Qi Assignee: Ravi Phulari hadoop fs -put may fail due to different reasons, such as the source file does not exist, the destination file already exists, permission denied, or exceptions during writing. However, it returns the same code (-1), making it impossible to tell what is the actual cause of the failure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-97) DFS should detect slow links(nodes) and avoid them
[ https://issues.apache.org/jira/browse/HDFS-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069192#comment-14069192 ] Colin Patrick McCabe commented on HDFS-97: -- This is also related to hedged reads. Even if we blacklist a datanode after a timeout / latency spike has happened, the damage is done. Hedged reads can avoid the latency spike in the first place. DFS should detect slow links(nodes) and avoid them -- Key: HDFS-97 URL: https://issues.apache.org/jira/browse/HDFS-97 Project: Hadoop HDFS Issue Type: Bug Reporter: Runping Qi The current DFS does not detect slow links (nodes). Thus, when a node or its network link is slow, it may affect the overall system performance significantly. Specifically, when a map job needs to read data from such a node, it may progress 10X slower. And when a DFS data node pipeline consists of such a node, the write performance degrades significantly. This may lead to some long tails for map/reduce jobs. We have experienced such behaviors quite often. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met
[ https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069226#comment-14069226 ] Yongjun Zhang commented on HDFS-6706: - HI [~brandonli], Thanks for reporting and addressing the issue. I have some questions here. The original report seems to indicate that the reported error message doesn't indicate the real reason of failure. My questions are, 1. In the case reported initially, the real problem was said to be The zkfc cannot be startup due to ha.zookeeper.quorum is not met. With your last update, can we say the real problem is a misconfiguration? 2. What kind of misconfiguration caused the symptom? 3. When misconfigured, user will still see the reported error message. Should we have the error message to tell that the symptom is caused by the possible misconfiguration? Thanks. . ZKFailoverController failed to recognize the quorum is not met -- Key: HDFS-6706 URL: https://issues.apache.org/jira/browse/HDFS-6706 Project: Hadoop HDFS Issue Type: Bug Reporter: Brandon Li Assignee: Brandon Li Thanks Kenny Zhang for finding this problem. The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc -format doesn't log the real problem. And then user will see the error message instead of the real issue when starting zkfc: 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. Parent znode does not exist. Run with -formatZK flag to initialize ZooKeeper. 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received create error from Zookeeper. code:NONODE for path /hadoop-ha/prodcluster/ActiveStandbyElectorLock 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 txntype:-1 reqpath:n/a Error Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock To reproduce the problem: 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum to 3. 2. start two zookeeper servers. 3. do hdfs zkfc -format, and then hdfs zkfc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6696) Name node cannot start if the path of a file under construction contains .snapshot
[ https://issues.apache.org/jira/browse/HDFS-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069235#comment-14069235 ] Mit Desai commented on HDFS-6696: - [~andrew.wang], we were trying to upgrade 0.21.11 to 2.4.0 Name node cannot start if the path of a file under construction contains .snapshot Key: HDFS-6696 URL: https://issues.apache.org/jira/browse/HDFS-6696 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Andrew Wang Priority: Blocker Using {{-renameReserved}} to rename .snapshot in a pre-hdfs-snapshot feature fsimage during upgrade only works, if there is nothing under construction under the renamed directory. I am not sure whether it takes care of edits containing .snapshot properly. The workaround is to identify these directories and rename, then do {{saveNamespace}} before performing upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6706) ZKFailoverController failed to recognize the quorum is not met
[ https://issues.apache.org/jira/browse/HDFS-6706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069247#comment-14069247 ] Brandon Li commented on HDFS-6706: -- [~yzhangal], I should have added more explanation. In the case, the 3 zookeeper was not correctly configured as an ensemble. Basically none of them was in an ensemble. However, all of them were configured in core-site.xml in ha.zookeeper.quorum. When zkfs is started, it talked to a different zk server which is not the previously formatted one. ZKFailoverController failed to recognize the quorum is not met -- Key: HDFS-6706 URL: https://issues.apache.org/jira/browse/HDFS-6706 Project: Hadoop HDFS Issue Type: Bug Reporter: Brandon Li Assignee: Brandon Li Thanks Kenny Zhang for finding this problem. The zkfc cannot be startup due to ha.zookeeper.quorum is not met. zkfc -format doesn't log the real problem. And then user will see the error message instead of the real issue when starting zkfc: 2014-07-01 17:08:17,528 FATAL ha.ZKFailoverController (ZKFailoverController.java:doRun(213)) - Unable to start failover controller. Parent znode does not exist. Run with -formatZK flag to initialize ZooKeeper. 2014-07-01 16:00:48,678 FATAL ha.ZKFailoverController (ZKFailoverController.java:fatalError(365)) - Fatal error occurred:Received create error from Zookeeper. code:NONODE for path /hadoop-ha/prodcluster/ActiveStandbyElectorLock 2014-07-01 17:24:44,202 - INFO ProcessThread(sid:2 cport:-1)::PrepRequestProcessor@627 - Got user-level KeeperException when processing sessionid:0x346f36191250005 type:create cxid:0x2 zxid:0xf0033 txntype:-1 reqpath:n/a Error Path:/hadoop-ha/prodcluster/ActiveStandbyElectorLock Error:KeeperErrorCode = NodeExists for /hadoop-ha/prodcluster/ActiveStandbyElectorLock To reproduce the problem: 1. use HDFS cluster with automatic HA enable and set the ha.zookeeper.quorum to 3. 2. start two zookeeper servers. 3. do hdfs zkfc -format, and then hdfs zkfc -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-333) A State Machine for name-node blocks.
[ https://issues.apache.org/jira/browse/HDFS-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069249#comment-14069249 ] Konstantin Shvachko commented on HDFS-333: -- It is still valid. It would formalize and simplify block life cycle management. A State Machine for name-node blocks. - Key: HDFS-333 URL: https://issues.apache.org/jira/browse/HDFS-333 Project: Hadoop HDFS Issue Type: Improvement Reporter: Konstantin Shvachko Blocks on the name-node can belong to different collections like the blocksMap, under-replicated, over-replicated lists, etc. It is getting more and more complicated to keep the lists consistent. It would be good to formalize the movement of the blocks between the collections using a state machine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-253) Method to retrieve all quotas active on HDFS
[ https://issues.apache.org/jira/browse/HDFS-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069252#comment-14069252 ] Allen Wittenauer commented on HDFS-253: --- One of my favorite JIRAs. Still open! I think I'll add the newbie tag. Method to retrieve all quotas active on HDFS Key: HDFS-253 URL: https://issues.apache.org/jira/browse/HDFS-253 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Reporter: Marco Nicosia Labels: newbie Currently the only way to view quota information on an HDFS is via dfs -count -q, which is fine when an admin is examining a specific directory for quota status. It would also be good to do full HDFS quota audits, by pulling all HDFS quotas currently set on the system. This is especially important when trying to do capacity management (OK, how much quota have we allotted so far?). I think the only way to do this now is via lsr | count -q, which is pretty cumbersome. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-253) Method to retrieve all quotas active on HDFS
[ https://issues.apache.org/jira/browse/HDFS-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-253: -- Component/s: namenode Method to retrieve all quotas active on HDFS Key: HDFS-253 URL: https://issues.apache.org/jira/browse/HDFS-253 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Reporter: Marco Nicosia Labels: newbie Currently the only way to view quota information on an HDFS is via dfs -count -q, which is fine when an admin is examining a specific directory for quota status. It would also be good to do full HDFS quota audits, by pulling all HDFS quotas currently set on the system. This is especially important when trying to do capacity management (OK, how much quota have we allotted so far?). I think the only way to do this now is via lsr | count -q, which is pretty cumbersome. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-253) Method to retrieve all quotas active on HDFS
[ https://issues.apache.org/jira/browse/HDFS-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-253: -- Labels: newbie (was: ) Method to retrieve all quotas active on HDFS Key: HDFS-253 URL: https://issues.apache.org/jira/browse/HDFS-253 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Reporter: Marco Nicosia Labels: newbie Currently the only way to view quota information on an HDFS is via dfs -count -q, which is fine when an admin is examining a specific directory for quota status. It would also be good to do full HDFS quota audits, by pulling all HDFS quotas currently set on the system. This is especially important when trying to do capacity management (OK, how much quota have we allotted so far?). I think the only way to do this now is via lsr | count -q, which is pretty cumbersome. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6422: --- Attachment: HDFS-6422.007.patch bq. logAuditEvent(false, getXAttr, src); -- logAuditEvent(false, getXAttrs, src); fixed. {code} } else { +throw new IOException(No matching attributes found); {code} Changed to No matching attributes found for remove operation bq. And this condition makes me to think on retryCache. I hope it is done here let me see there. Example first call may succeed internally but restarted/disconnected, in that case idempotent API will be retried from client. So, next call may fail as it was already removed. Do you think, we need to mark this as ATMostOnce? Good catch. You're right that my changes require removeXAttr to become AtMostOnce. I've changed the code to reflect that. bq. I think below exception message can be refined something like Some/all attributes does not match to get? I've chnages this to At least one of the attributes provided was not found. TestDFSShell.java: bq. From the below code, we don't need out.toString as we did not asserted anything. removed. bq. We need to shutdown the mini cluster as well. Done. FSXAttrBaseTest.java: bq. Please handle only specific exceptions. If it throws unexpected exception let it throwout, we need not assert and throw. All of this is due to WebHDFS throwing a different exception from the regular path. WebHDFS throws a RemoteException which wraps a HadoopIllegalArgumentException. In other words, the WebHDFS client does not unwrap the exception. You'll see in the diff that I've changed the exception handling to catch both RemoteException and HadoopIllegalArgumentException. In the former case, I check to see that the underlying exception is a HIAE. XattrNameParam.java: {code} private static Domain DOMAIN = new Domain(NAME, + Pattern.compile(.*)); {code} bq. I understand that we try to eliminate the client validation as we will not have flexibility to add more namespaces in future. But that pattern can be same as Namespace. right. So, how about validating pattern? Please check with Andrew as well what he says. But I have no strong feeling on that. It is a suggestion. I understand your concern. The problem is that WebHDFS would then be doing client side checking and the exception would be generated and thrown from two different places. We wanted to unify all of the xattr Namespace checking into one place on the server side so that there would only be one place where the exception would be generated. I talked to Andrew and he's ok with leaving it like it is in the patch. getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist Key: HDFS-6422 URL: https://issues.apache.org/jira/browse/HDFS-6422 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Blocker Attachments: HDFS-6422.005.patch, HDFS-6422.006.patch, HDFS-6422.007.patch, HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch, HDFS-6474.4.patch If you do hdfs dfs -getfattr -n user.blah /foo and user.blah doesn't exist, the command prints # file: /foo and a 0 return code. It should print an exception and return a non-0 return code instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-136) Not able to run randomwriter/sort on hdfs if all the nodes of same rack are killed.
[ https://issues.apache.org/jira/browse/HDFS-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-136. --- Resolution: Incomplete Probably stale. Not able to run randomwriter/sort on hdfs if all the nodes of same rack are killed. --- Key: HDFS-136 URL: https://issues.apache.org/jira/browse/HDFS-136 Project: Hadoop HDFS Issue Type: Bug Reporter: Suman Sehgal Not able to run randomwriter if all the datanodes of any one of the racks are killed. (replication factor : 3) Randomwriter job gets failed and following error message is displayed in log: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2398) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2354) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1744) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1927) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6715) webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode
Arpit Gupta created HDFS-6715: - Summary: webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode Key: HDFS-6715 URL: https://issues.apache.org/jira/browse/HDFS-6715 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.2.0 Reporter: Arpit Gupta Noticed in our HA testing when we run MR job with webhdfs file system we some times run into {code} 2014-04-17 05:08:06,346 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1397710493213_0001_r_08_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 2014-04-17 05:08:10,205 ERROR [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not commit job java.io.IOException: Namenode is in startup mode at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6715) webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode
[ https://issues.apache.org/jira/browse/HDFS-6715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-6715: --- Component/s: webhdfs webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode Key: HDFS-6715 URL: https://issues.apache.org/jira/browse/HDFS-6715 Project: Hadoop HDFS Issue Type: Bug Components: ha, webhdfs Affects Versions: 2.2.0 Reporter: Arpit Gupta Noticed in our HA testing when we run MR job with webhdfs file system we some times run into {code} 2014-04-17 05:08:06,346 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1397710493213_0001_r_08_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 2014-04-17 05:08:10,205 ERROR [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not commit job java.io.IOException: Namenode is in startup mode at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly
[ https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069320#comment-14069320 ] Devaraj Das commented on HDFS-6680: --- I see.. good catch. Looks good to me. BlockPlacementPolicyDefault does not choose favored nodes correctly --- Key: HDFS-6680 URL: https://issues.apache.org/jira/browse/HDFS-6680 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6680_20140714.patch, h6680_20140716.patch In one of the chooseTarget(..) methods, it tries all the favoredNodes to chooseLocalNode(..). It expects chooseLocalNode to return null if the local node is not a good target. Unfortunately, chooseLocalNode will fallback to chooseLocalRack but not returning null. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-279) Generation stamp value should be validated when creating a Block
[ https://issues.apache.org/jira/browse/HDFS-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069331#comment-14069331 ] Allen Wittenauer commented on HDFS-279: --- Stale issue? Generation stamp value should be validated when creating a Block Key: HDFS-279 URL: https://issues.apache.org/jira/browse/HDFS-279 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze In hdfs, generation stamps GenerationStamp.FIRST_VALID_STAMP are reserved values but not valid generation stamps. Incorrect uses of the reserved values may cause unexpected behavior. We should validate generation stamp when creating a Block. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-88) Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite
[ https://issues.apache.org/jira/browse/HDFS-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-88. -- Resolution: Incomplete I'm going to close this as stale. I suspect this issue has gone away with the two fixes referenced. Hung on hdfs: writeChunk, DFSClient.java:2126, DataStreamer socketWrite --- Key: HDFS-88 URL: https://issues.apache.org/jira/browse/HDFS-88 Project: Hadoop HDFS Issue Type: Bug Reporter: stack We've seen this hang rare enough but when it happens it locks up the application. We've seen it at least in 0.18.x and 0.19.x (we don't have much experience with 0.20.x hdfs yet). Here we're doing a sequencefile#append {code} IPC Server handler 9 on 60020 daemon prio=10 tid=0x7fef1c3f0400 nid=0x7470 waiting for monitor entry [0x42d18000..0x42d189f0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2486) - waiting to lock 0x7fef38ecc138 (a java.util.LinkedList) - locked 0x7fef38ecbdb8 (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132) - locked 0x7fef38ecbdb8 (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121) - locked 0x7fef38ecbdb8 (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream) at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86) - locked 0x7fef38ecbdb8 (a org.apache.hadoop.dfs.DFSClient$DFSOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked 0x7fef38e09fc0 (a org.apache.hadoop.fs.FSDataOutputStream) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1016) - locked 0x7fef38e09f30 (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:980) - locked 0x7fef38e09f30 (a org.apache.hadoop.io.SequenceFile$Writer) at org.apache.hadoop.hbase.regionserver.HLog.doWrite(HLog.java:461) at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:421) - locked 0x7fef29ad9588 (a java.lang.Integer) at org.apache.hadoop.hbase.regionserver.HRegion.update(HRegion.java:1676) at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1439) at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1378) at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdates(HRegionServer.java:1184) at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:622) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) {code} The DataStreamer that is supposed to servicing the above writeChunk is stuck here: {code} DataStreamer for file /hbase/log_72.34.249.212_1225407466779_60020/hlog.dat.1227075571390 block blk_-7436808403424765554_553837 daemon prio=10 tid=0x01c84c00 nid=0x7125 in Object.wait() [0x409b3000..0x409b3d70] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.ipc.Client.call(Client.java:709) - locked 0x7fef39520bb8 (a org.apache.hadoop.ipc.Client$Call) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288) at org.apache.hadoop.dfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:139) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2185) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889) - locked 0x7fef38ecc138 (a
[jira] [Resolved] (HDFS-205) HDFS Tmpreaper
[ https://issues.apache.org/jira/browse/HDFS-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-205. --- Resolution: Duplicate Closing this as a dupe of HDFS-6382. HDFS Tmpreaper -- Key: HDFS-205 URL: https://issues.apache.org/jira/browse/HDFS-205 Project: Hadoop HDFS Issue Type: New Feature Environment: CentOs 4/5, Java 1.5, Hadoop 0.17.3 Reporter: Michael Andrews Priority: Minor Attachments: DateDelta.java, TmpReaper.java Java implementation of tmpreaper utility for HDFS. Helps when you expect processes to die before they can clean up. I have perl unit tests that can be ported over to java or groovy if the hadoop team is interested in this utility. One issue is that the unit tests set the modification time of test files, which is unsupported in HDFS (as far as I can tell). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6716) Update usage of KeyProviderCryptoExtension APIs on NameNode
[ https://issues.apache.org/jira/browse/HDFS-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069355#comment-14069355 ] Andrew Wang commented on HDFS-6716: --- I'll add that this is somewhat urgent, as I simply mocked out the new API calls so tests are currently broken. Update usage of KeyProviderCryptoExtension APIs on NameNode --- Key: HDFS-6716 URL: https://issues.apache.org/jira/browse/HDFS-6716 Project: Hadoop HDFS Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Andrew Wang Assignee: Andrew Wang Some recent changes have landed to the KeyProviderCryptoExtension APIs, need to update the usage in HDFS to reflect this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6716) Update usage of KeyProviderCryptoExtension APIs on NameNode
Andrew Wang created HDFS-6716: - Summary: Update usage of KeyProviderCryptoExtension APIs on NameNode Key: HDFS-6716 URL: https://issues.apache.org/jira/browse/HDFS-6716 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Andrew Wang Assignee: Andrew Wang Some recent changes have landed to the KeyProviderCryptoExtension APIs, need to update the usage in HDFS to reflect this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-351) Could FSEditLog report problems more elegantly than with System.exit(-1)
[ https://issues.apache.org/jira/browse/HDFS-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-351. --- Resolution: Incomplete edits logging got majorly reworked. closing as stale. Could FSEditLog report problems more elegantly than with System.exit(-1) Key: HDFS-351 URL: https://issues.apache.org/jira/browse/HDFS-351 Project: Hadoop HDFS Issue Type: Improvement Reporter: Steve Loughran Priority: Minor When FSEdit encounters problems, it prints something and then exits. It would be better for any in-JVM deployments of FSEdit for these to be raised in some other way (such as throwing an exception), rather than taking down the whole JVM. That could be in JUnit tests, or it could be inside other applications. Test runners and the like can intercept those System.exit() calls with their own Security Manager -often turning the System.exit() operation into an exception there and then. If FSEdit did that itself, it may be easier to stay in control. The current approach has some benefits -it can exit regardless of which thread has encountered problems, but it is tricky to test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-197) du fails on Cygwin
[ https://issues.apache.org/jira/browse/HDFS-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HDFS-197. --- Resolution: Fixed MS has fixed Windows support. Closing as stale. du fails on Cygwin Key: HDFS-197 URL: https://issues.apache.org/jira/browse/HDFS-197 Project: Hadoop HDFS Issue Type: Bug Environment: Windows + Cygwin Reporter: Kohsuke Kawaguchi Attachments: HADOOP-5486 When I try to run a datanode on Windows, I get the following exception: {noformat} java.io.IOException: Expecting a line not the end of stream at org.apache.hadoop.fs.DU.parseExecResult(DU.java:181) at org.apache.hadoop.util.Shell.runCommand(Shell.java:179) at org.apache.hadoop.util.Shell.run(Shell.java:134) at org.apache.hadoop.fs.DU.init(DU.java:53) at org.apache.hadoop.fs.DU.init(DU.java:63) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.init(FSDataset.java:325) at org.apache.hadoop.hdfs.server.datanode.FSDataset.init(FSDataset.java:681) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:291) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:205) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1238) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1193) {noformat} This is because Hadoop execs du -sk C:\tmp\hadoop-SYSTEM\dfs\data with a Windows path representation, which cygwin du doesn't understand. {noformat} C:\hudsondu -sk C:\tmp\hadoop-SYSTEM\dfs\data du -sk C:\tmp\hadoop-SYSTEM\dfs\data du: cannot access `C:\\tmp\\hadoop-SYSTEM\\dfs\\data': No such file or directory {noformat} For this to work correctly, Hadoop would have to run cygpath first to get a Unix path representation, then to call DU. Also, I had to use the debugger to get this information. Shell.runCommand should catch IOException from parseExecResult and add the buffered stderr to simplify the error diagnostics. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-346) Version file in name-node image directory should include role field.
[ https://issues.apache.org/jira/browse/HDFS-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069384#comment-14069384 ] Allen Wittenauer commented on HDFS-346: --- Is this still valid? Actually, what would be the expected outcome of a node seeing a different role on startup? Version file in name-node image directory should include role field. -- Key: HDFS-346 URL: https://issues.apache.org/jira/browse/HDFS-346 Project: Hadoop HDFS Issue Type: Improvement Reporter: Konstantin Shvachko It would be useful to have name-node role field in the {{VERSION}} file in name-node's image and edits directories so that one could see what type of node created the image. Role was introduced by HADOOP-4539 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-308) Improve TransferFsImage
[ https://issues.apache.org/jira/browse/HDFS-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14069385#comment-14069385 ] Allen Wittenauer commented on HDFS-308: --- I seem to recall this got reworked, but I'm not sure if these particular issues have been dealt with and/or are still relevant. Improve TransferFsImage --- Key: HDFS-308 URL: https://issues.apache.org/jira/browse/HDFS-308 Project: Hadoop HDFS Issue Type: Improvement Reporter: Konstantin Shvachko Assignee: Jakob Homan {{TransferFsImage}} transfers name-node image and edits files during checkpoint process. # {{TransferFsImage}} should *always* pass and verify CheckpointSignature. Now we send it only when image is uploaded back to name-node. # {{getFileClient()}} should use {{CollectionFile}} rather than {{File[]}} as the third parameter. # Rather than sending port and address separately {{port= + port + machine= + addr}} it should send the entire address at once. -- This message was sent by Atlassian JIRA (v6.2#6252)