[jira] [Commented] (HDFS-2668) Incorrect assertion in BlockManager when block report arrives shortly after invalidation decision
[ https://issues.apache.org/jira/browse/HDFS-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225086#comment-13225086 ] Uma Maheswara Rao G commented on HDFS-2668: --- I see that this particular assertion already commented in code by refering this issue. {code} /* TODO: following assertion is incorrect, see HDFS-2668 assert storedBlock.findDatanode(dn) < 0 : "Block " + block + " in recentInvalidatesSet should not appear in DN " + dn; */ {code} I think we can remove that commented code(assertion) completely right? > Incorrect assertion in BlockManager when block report arrives shortly after > invalidation decision > - > > Key: HDFS-2668 > URL: https://issues.apache.org/jira/browse/HDFS-2668 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Todd Lipcon > Attachments: TestToReproduceHDFS-2668.patch > > > I haven't written a test case to verify this yet, but I believe the following > assertion is incorrect: > {code} > // Ignore replicas already scheduled to be removed from the DN > if(invalidateBlocks.contains(dn.getStorageID(), block)) { >assert storedBlock.findDatanode(dn) < 0 : "Block " + block > + " in recentInvalidatesSet should not appear in DN " + dn; > {code} > The problem is that, when a block is invalidated due to over-replication, it > is not immediately removed from the block map. So, if a block report arrives > just after a block has been marked as invalidated, but before the block is > actually deleted, I think this assertion will trigger incorrectly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3059) ssl-server.xml causes NullPointer
[ https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evert Lammerts updated HDFS-3059: - Attachment: HDFS-3059.patch.2 Second try, with better feedback > ssl-server.xml causes NullPointer > - > > Key: HDFS-3059 > URL: https://issues.apache.org/jira/browse/HDFS-3059 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, security >Affects Versions: 0.20.205.0, 1.0.0 > Environment: in core-site.xml: > {code:xml} > > hadoop.security.authentication > kerberos > > > hadoop.security.authorization > true > > {code} > in hdfs-site.xml: > {code:xml} > > dfs.https.server.keystore.resource > /etc/hadoop/conf/ssl-server.xml > > > dfs.https.enable > true > > > ...other security props > > {code} >Reporter: Evert Lammerts >Priority: Minor > Attachments: HDFS-3059.patch, HDFS-3059.patch.2 > > > If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a > DN will crash during startup while setting up an SSL socket with a > NullPointerException: > {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: > useKerb = false, useCerts = true > jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO > mortbay.log: jetty-6.1.26.cloudera.1 > 12/03/07 17:08:36 INFO mortbay.log: Started > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new > KrbServerSocket for: 0.0.0.0 > 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed > Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: > !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: > java.io.IOException: !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 INFO mortbay.log: Stopped > Krb5AndCertsSslSocketConnector@0.0.0.0:50475 > 12/03/07 17:08:36 INFO mortbay.log: Stopped > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, > active threads is 0{noformat} > The same happens if I set an absolute path to an existing > dfs.https.server.keystore.resource - in this case the file cannot be found > but not even a WARN is given. > Since in dfs.https.server.keystore.resource we know we need to have 4 > properties specified (ssl.server.truststore.location, > ssl.server.keystore.location, ssl.server.keystore.password, and > ssl.server.keystore.keypassword) we should check if they are set and throw an > IOException if they are not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3059) ssl-server.xml causes NullPointer
[ https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evert Lammerts updated HDFS-3059: - Attachment: (was: HDFS-3059.patch) > ssl-server.xml causes NullPointer > - > > Key: HDFS-3059 > URL: https://issues.apache.org/jira/browse/HDFS-3059 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, security >Affects Versions: 0.20.205.0, 1.0.0 > Environment: in core-site.xml: > {code:xml} > > hadoop.security.authentication > kerberos > > > hadoop.security.authorization > true > > {code} > in hdfs-site.xml: > {code:xml} > > dfs.https.server.keystore.resource > /etc/hadoop/conf/ssl-server.xml > > > dfs.https.enable > true > > > ...other security props > > {code} >Reporter: Evert Lammerts >Priority: Minor > Attachments: HDFS-3059.patch, HDFS-3059.patch.2 > > > If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a > DN will crash during startup while setting up an SSL socket with a > NullPointerException: > {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: > useKerb = false, useCerts = true > jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO > mortbay.log: jetty-6.1.26.cloudera.1 > 12/03/07 17:08:36 INFO mortbay.log: Started > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new > KrbServerSocket for: 0.0.0.0 > 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed > Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: > !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: > java.io.IOException: !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 INFO mortbay.log: Stopped > Krb5AndCertsSslSocketConnector@0.0.0.0:50475 > 12/03/07 17:08:36 INFO mortbay.log: Stopped > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, > active threads is 0{noformat} > The same happens if I set an absolute path to an existing > dfs.https.server.keystore.resource - in this case the file cannot be found > but not even a WARN is given. > Since in dfs.https.server.keystore.resource we know we need to have 4 > properties specified (ssl.server.truststore.location, > ssl.server.keystore.location, ssl.server.keystore.password, and > ssl.server.keystore.keypassword) we should check if they are set and throw an > IOException if they are not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3059) ssl-server.xml causes NullPointer
[ https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Evert Lammerts updated HDFS-3059: - Attachment: HDFS-3059.patch Second try, with better feedback > ssl-server.xml causes NullPointer > - > > Key: HDFS-3059 > URL: https://issues.apache.org/jira/browse/HDFS-3059 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, security >Affects Versions: 0.20.205.0, 1.0.0 > Environment: in core-site.xml: > {code:xml} > > hadoop.security.authentication > kerberos > > > hadoop.security.authorization > true > > {code} > in hdfs-site.xml: > {code:xml} > > dfs.https.server.keystore.resource > /etc/hadoop/conf/ssl-server.xml > > > dfs.https.enable > true > > > ...other security props > > {code} >Reporter: Evert Lammerts >Priority: Minor > Attachments: HDFS-3059.patch, HDFS-3059.patch.2 > > > If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a > DN will crash during startup while setting up an SSL socket with a > NullPointerException: > {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: > useKerb = false, useCerts = true > jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO > mortbay.log: jetty-6.1.26.cloudera.1 > 12/03/07 17:08:36 INFO mortbay.log: Started > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new > KrbServerSocket for: 0.0.0.0 > 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed > Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: > !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: > java.io.IOException: !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 INFO mortbay.log: Stopped > Krb5AndCertsSslSocketConnector@0.0.0.0:50475 > 12/03/07 17:08:36 INFO mortbay.log: Stopped > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, > active threads is 0{noformat} > The same happens if I set an absolute path to an existing > dfs.https.server.keystore.resource - in this case the file cannot be found > but not even a WARN is given. > Since in dfs.https.server.keystore.resource we know we need to have 4 > properties specified (ssl.server.truststore.location, > ssl.server.keystore.location, ssl.server.keystore.password, and > ssl.server.keystore.keypassword) we should check if they are set and throw an > IOException if they are not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3059) ssl-server.xml causes NullPointer
[ https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225094#comment-13225094 ] Evert Lammerts commented on HDFS-3059: -- Added better feedback, this should work. The first patch was written for tag 0.20.205 - I'm on CDH3u3 and was assuming 0.20.205 is the version it is based on. I have only tested it against CDH using the source RPMs, since I don't know how to set up a development environment for a DN in an environment with Kerberos enabled. Any tips about that are welcome, this was not a very comfortable way of debugging: editing the source, building it, copying the core jar to the datanode and namenode, and starting both; way to get RSI ;-) > ssl-server.xml causes NullPointer > - > > Key: HDFS-3059 > URL: https://issues.apache.org/jira/browse/HDFS-3059 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, security >Affects Versions: 0.20.205.0, 1.0.0 > Environment: in core-site.xml: > {code:xml} > > hadoop.security.authentication > kerberos > > > hadoop.security.authorization > true > > {code} > in hdfs-site.xml: > {code:xml} > > dfs.https.server.keystore.resource > /etc/hadoop/conf/ssl-server.xml > > > dfs.https.enable > true > > > ...other security props > > {code} >Reporter: Evert Lammerts >Priority: Minor > Attachments: HDFS-3059.patch, HDFS-3059.patch.2 > > > If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a > DN will crash during startup while setting up an SSL socket with a > NullPointerException: > {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: > useKerb = false, useCerts = true > jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO > mortbay.log: jetty-6.1.26.cloudera.1 > 12/03/07 17:08:36 INFO mortbay.log: Started > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new > KrbServerSocket for: 0.0.0.0 > 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed > Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: > !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: > java.io.IOException: !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 INFO mortbay.log: Stopped > Krb5AndCertsSslSocketConnector@0.0.0.0:50475 > 12/03/07 17:08:36 INFO mortbay.log: Stopped > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, > active threads is 0{noformat} > The same happens if I set an absolute path to an existing > dfs.https.server.keystore.resource - in this case the file cannot be found > but not even a WARN is given. > Since in dfs.https.server.keystore.resource we know we need to have 4 > properties specified (ssl.server.truststore.location, > ssl.server.keystore.location, ssl.server.keystore.password, and > ssl.server.keystore.keypassword) we should check if they are set and throw an > IOException if they are not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3059) ssl-server.xml causes NullPointer
[ https://issues.apache.org/jira/browse/HDFS-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225095#comment-13225095 ] Evert Lammerts commented on HDFS-3059: -- This one's against trunk, btw > ssl-server.xml causes NullPointer > - > > Key: HDFS-3059 > URL: https://issues.apache.org/jira/browse/HDFS-3059 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, security >Affects Versions: 0.20.205.0, 1.0.0 > Environment: in core-site.xml: > {code:xml} > > hadoop.security.authentication > kerberos > > > hadoop.security.authorization > true > > {code} > in hdfs-site.xml: > {code:xml} > > dfs.https.server.keystore.resource > /etc/hadoop/conf/ssl-server.xml > > > dfs.https.enable > true > > > ...other security props > > {code} >Reporter: Evert Lammerts >Priority: Minor > Attachments: HDFS-3059.patch, HDFS-3059.patch.2 > > > If ssl is enabled (dfs.https.enable) but ssl-server.xml is not available, a > DN will crash during startup while setting up an SSL socket with a > NullPointerException: > {noformat}12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: > useKerb = false, useCerts = true > jetty.ssl.password : jetty.ssl.keypassword : 12/03/07 17:08:36 INFO > mortbay.log: jetty-6.1.26.cloudera.1 > 12/03/07 17:08:36 INFO mortbay.log: Started > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:36 DEBUG security.Krb5AndCertsSslSocketConnector: Creating new > KrbServerSocket for: 0.0.0.0 > 12/03/07 17:08:36 WARN mortbay.log: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed > Krb5AndCertsSslSocketConnector@0.0.0.0:50475: java.io.IOException: > !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 WARN mortbay.log: failed Server@604788d5: > java.io.IOException: !JsseListener: java.lang.NullPointerException > 12/03/07 17:08:36 INFO mortbay.log: Stopped > Krb5AndCertsSslSocketConnector@0.0.0.0:50475 > 12/03/07 17:08:36 INFO mortbay.log: Stopped > selectchannelconnec...@p-worker35.alley.sara.nl:1006 > 12/03/07 17:08:37 INFO datanode.DataNode: Waiting for threadgroup to exit, > active threads is 0{noformat} > The same happens if I set an absolute path to an existing > dfs.https.server.keystore.resource - in this case the file cannot be found > but not even a WARN is given. > Since in dfs.https.server.keystore.resource we know we need to have 4 > properties specified (ssl.server.truststore.location, > ssl.server.keystore.location, ssl.server.keystore.password, and > ssl.server.keystore.keypassword) we should check if they are set and throw an > IOException if they are not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3063) NameNode should validate all coming file path
[ https://issues.apache.org/jira/browse/HDFS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225302#comment-13225302 ] Daryn Sharp commented on HDFS-3063: --- Instead of adding the check to N-many places, is there a better choke point for a single check? From a maintenance perspective, someone is going to forget to add the check. > NameNode should validate all coming file path > - > > Key: HDFS-3063 > URL: https://issues.apache.org/jira/browse/HDFS-3063 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.205.0 >Reporter: Denny Ye >Priority: Minor > Labels: namenode > Attachments: HDFS-3063.patch > > > NameNode provides RPC service for not only DFS client but also user defined > program. A common case we always met is that user transfers file path > prefixed with HDFS protocol("hdfs://{namenode:{port}}/{folder}/{file}"). > NameNode cannot map node meta-data with this path and always throw NPE. In > user client, we only see the NullPointerException, no other tips for which > step it occurs. > Also, NameNode should validate all coming file path with regular format. > One exception I met: > Exception in thread "main" org.apache.hadoop.ipc.RemoteException: > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:329) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2303) jsvc needs to be recompilable
[ https://issues.apache.org/jira/browse/HDFS-2303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225327#comment-13225327 ] Eli Collins commented on HDFS-2303: --- Mingjie, - Have you verified you can set JSVC_HOME in hadoop-env.sh? Looks like you're setting JSVC_HOME before hadoop-env.sh is sourced (by hadoop-config.sh) so this wouldn't work. Please test both with a jsvc installed in /usr and via setting JSVC_HOME explicitly in hadoop-env.sh if you haven't already - We shouldn't assume JSVC_HOME has a bin directory as the jsvc download is just a directory with jsvc. I'd make the error message something like this: "JSVC_HOME is not set so jsvc can not be found. Jsvc is required to run secure datanodes. Please download jsvc for your platform from http://archive.apache.org/dist/commons/daemon/binaries and set JSVC_HOME to the directory containing the jsvc binary" - Let's add a commented out line in hadoop-env.sh like the following: {code} # The jsvc implementation to use #export JSVC_HOME=${JSVC_HOME} {code} Otherwise looks great. Thanks, Eli > jsvc needs to be recompilable > - > > Key: HDFS-2303 > URL: https://issues.apache.org/jira/browse/HDFS-2303 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, scripts >Affects Versions: 0.23.0, 0.24.0 >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik > Fix For: 0.24.0, 0.23.2 > > Attachments: HDFS-2303-2.patch.txt, HDFS-2303-3-trunk.patch, > HDFS-2303-4-trunk.patch, HDFS-2303.patch.txt > > > It would be nice to recompile jsvc as part of the native profile. This has a > number of benefits including an ability to re-generate all binary artifacts, > etc. Most of all, however, it will provide a way to generate jsvc on Linux > distributions that don't have matching libc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3050: --- Attachment: HDFS-3050.004.patch implement solution #4 > refactor OEV to share more code with the NameNode > - > > Key: HDFS-3050 > URL: https://issues.apache.org/jira/browse/HDFS-3050 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3050.004.patch > > > Current, OEV (the offline edits viewer) re-implements all of the opcode > parsing logic found in the NameNode. This duplicated code creates a > maintenance burden for us. > OEV should be refactored to simply use the normal EditLog parsing code, > rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3050: --- Attachment: (was: HDFS-3050.patch2) > refactor OEV to share more code with the NameNode > - > > Key: HDFS-3050 > URL: https://issues.apache.org/jira/browse/HDFS-3050 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3050.004.patch > > > Current, OEV (the offline edits viewer) re-implements all of the opcode > parsing logic found in the NameNode. This duplicated code creates a > maintenance burden for us. > OEV should be refactored to simply use the normal EditLog parsing code, > rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225331#comment-13225331 ] Colin Patrick McCabe commented on HDFS-3050: Hi Nicholas, I implemented solution #4 from the ones we talked about. I definitely agree with you that in the long term, it may make more sense to use the visitor pattern to deal with these FSEditLogOp classes. (This is solution #2.) However, that's a bigger change and probably something we should plan out later. Anyway, as always, thanks for taking the time to look at this. cheers, C. > refactor OEV to share more code with the NameNode > - > > Key: HDFS-3050 > URL: https://issues.apache.org/jira/browse/HDFS-3050 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe >Priority: Minor > Attachments: HDFS-3050.004.patch > > > Current, OEV (the offline edits viewer) re-implements all of the opcode > parsing logic found in the NameNode. This duplicated code creates a > maintenance burden for us. > OEV should be refactored to simply use the normal EditLog parsing code, > rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-2038: - Attachment: hdfs-2038.branch-0.23.2.supplemental.patch Sigh. It's already merged in 23.2. The attached patch applies to the current 23.2 (with the trunk patch merged) and fixes the test cases that are failing due to the differences in error messages. > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225344#comment-13225344 ] Kihwal Lee commented on HDFS-2038: -- The divergence of 0.23.2 is of no concern, as the PB changes won't go in there. > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3064) Allow datanodes to start with non-privileged ports for testing.
Allow datanodes to start with non-privileged ports for testing. --- Key: HDFS-3064 URL: https://issues.apache.org/jira/browse/HDFS-3064 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey HADOOP-8078 allows enabling security in unit tests. However, datanodes still can't be started because they require privileged ports. We should allow datanodes to come up on non-privileged ports ONLY for testing. This part of the code will be removed anyway, when HDFS-2856 is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3064) Allow datanodes to start with non-privileged ports for testing.
[ https://issues.apache.org/jira/browse/HDFS-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-3064: --- Attachment: HDFS-3064.trunk.patch Here is a preliminary patch, I will include a test too. > Allow datanodes to start with non-privileged ports for testing. > --- > > Key: HDFS-3064 > URL: https://issues.apache.org/jira/browse/HDFS-3064 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS-3064.trunk.patch > > > HADOOP-8078 allows enabling security in unit tests. However, datanodes still > can't be started because they require privileged ports. We should allow > datanodes to come up on non-privileged ports ONLY for testing. This part of > the code will be removed anyway, when HDFS-2856 is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3064) Allow datanodes to start with non-privileged ports for testing.
[ https://issues.apache.org/jira/browse/HDFS-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225351#comment-13225351 ] Todd Lipcon commented on HDFS-3064: --- FWIW, I suggested this a while back and people seemed to have concerns: https://issues.apache.org/jira/browse/HDFS-1150?focusedCommentId=12894436&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12894436 > Allow datanodes to start with non-privileged ports for testing. > --- > > Key: HDFS-3064 > URL: https://issues.apache.org/jira/browse/HDFS-3064 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS-3064.trunk.patch > > > HADOOP-8078 allows enabling security in unit tests. However, datanodes still > can't be started because they require privileged ports. We should allow > datanodes to come up on non-privileged ports ONLY for testing. This part of > the code will be removed anyway, when HDFS-2856 is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225371#comment-13225371 ] Uma Maheswara Rao G commented on HDFS-2976: --- I just committed this patch, Thanks Aaron for the review! > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225377#comment-13225377 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Hdfs-trunk-Commit #1927 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1927/]) HDFS-2976. Remove unnecessary method (tokenRefetchNeeded) in DFSClient. (Contributed by Uma Maheswara Rao G) (Revision 1298495) Result = FAILURE umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298495 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3064) Allow datanodes to start with non-privileged ports for testing.
[ https://issues.apache.org/jira/browse/HDFS-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225380#comment-13225380 ] Jitendra Nath Pandey commented on HDFS-3064: I went through the discussion only superficially but as I understood that the jira referred was a port from 20 and hence there were reservations to include this feature. But, I think it is worth having now because we do have a way of enabling security in unit tests and it is a blocker to bring up datanodes. > Allow datanodes to start with non-privileged ports for testing. > --- > > Key: HDFS-3064 > URL: https://issues.apache.org/jira/browse/HDFS-3064 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS-3064.trunk.patch > > > HADOOP-8078 allows enabling security in unit tests. However, datanodes still > can't be started because they require privileged ports. We should allow > datanodes to come up on non-privileged ports ONLY for testing. This part of > the code will be removed anyway, when HDFS-2856 is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225381#comment-13225381 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Common-trunk-Commit #1852 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1852/]) HDFS-2976. Remove unnecessary method (tokenRefetchNeeded) in DFSClient. (Contributed by Uma Maheswara Rao G) (Revision 1298495) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298495 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225383#comment-13225383 ] Aaron T. Myers commented on HDFS-2976: -- I think something may have gone wrong with this commit. Instead of removing the method from DFSClient, it's now included twice. Because of this, trunk does not compile. > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default
[ https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3044: --- Attachment: HDFS-3044.001.patch * Move fsck operations into a class * allow both -move and -delete to be specified, to both salvage bits of corrupted files, and delete the remainder. If only -move is specified, the operation is non-destructive > fsck move should be non-destructive by default > -- > > Key: HDFS-3044 > URL: https://issues.apache.org/jira/browse/HDFS-3044 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Eli Collins >Assignee: Colin Patrick McCabe > Attachments: HDFS-3044.001.patch > > > The fsck move behavior in the code and originally articulated in HADOOP-101 > is: > {quote}Current failure modes for DFS involve blocks that are completely > missing. The only way to "fix" them would be to recover chains of blocks and > put them into lost+found{quote} > A directory is created with the file name, the blocks that are accessible are > created as individual files in this directory, then the original file is > removed. > I suspect the rationale for this behavior was that you can't use files that > are missing locations, and copying the block as files at least makes part of > the files accessible. However this behavior can also result in permanent > dataloss. Eg: > - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster > startup, files with blocks where all replicas are on these set of datanodes > are marked corrupt > - Admin does fsck move, which deletes the "corrupt" files, saves whatever > blocks were available > - The HW issues with datanodes are resolved, they are started and join the > cluster. The NN tells them to delete their blocks for the corrupt files since > the file was deleted. > I think we should: > - Make fsck move non-destructive by default (eg just does a move into > lost+found) > - Make the destructive behavior optional (eg "--destructive" so admins think > about what they're doing) > - Provide better sanity checks and warnings, eg if you're running fsck and > not all the slaves have checked in (if using dfs.hosts) then fsck should > print a warning indicating this that an admin should have to override if they > want to do something destructive -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225391#comment-13225391 ] Uma Maheswara Rao G commented on HDFS-2976: --- Yes, Aaron, I noticed it and corrected in next commit. Sorry for the wrong update. > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225405#comment-13225405 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Hdfs-trunk-Commit #1928 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1928/]) HDFS-2976 removed the unused imports that were missed in previous commit. (Revision 1298508) HDFS-2976 corrected the previous wrong commit for this issue. (Revision 1298507) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298508 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298507 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225406#comment-13225406 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Common-trunk-Commit #1853 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1853/]) HDFS-2976 removed the unused imports that were missed in previous commit. (Revision 1298508) HDFS-2976 corrected the previous wrong commit for this issue. (Revision 1298507) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298508 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298507 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3065) HA: Newly active NameNode does not recognize decommissioning DataNode
HA: Newly active NameNode does not recognize decommissioning DataNode - Key: HDFS-3065 URL: https://issues.apache.org/jira/browse/HDFS-3065 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: HA branch (HDFS-1623) Reporter: Stephen Chu I'm working on a cluster where, originally, styx01 hosts the active NameNode and styx02 hosts the standby NameNode. In both styx01's and styx02's exclude file, I added the DataNode on styx03.I then ran _hdfs dfsadmin -refreshNodes_ and verified on styx01 NN web UI that the DN on styx03 was decommissioning. After waiting a few minutes, I checked the standby NN web UI (while the DN was decommissioning) and didn't see that the DN was marked as decommissioning. I executed manual failover, making styx02 NN active and styx01 NN standby. I checked the newly active NN web UI, and the DN was still not marked as decommissioning, even after a few minutes. However, the newly standby NN's web UI still showed the DN as decommissioning. I added another DN to the exclude file, and executed _hdfs dfsadmin -refreshNodes_, but the styx02 NN web UI still did not update with the decommissioning nodes. I failed back over to make styx01 NN active and styx02 NN standby. I checked the styx01 NN web UI and saw that it correctly marked 2 DNs as decommissioning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225409#comment-13225409 ] Aaron T. Myers commented on HDFS-2976: -- Great, things seem to be back to working now. Thanks for catching it so quickly, Uma. > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3045) fsck move should bail on a file if it can't create a block file
[ https://issues.apache.org/jira/browse/HDFS-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3045: --- Attachment: HDFS-3045.001.patch * fsck move should bail on a file if it can't create a block file Note: This is based on the patch I uploaded for HDFS-3044. So perhaps it would be best to review/apply that one first. However, I wanted to put this here before I forgot. > fsck move should bail on a file if it can't create a block file > --- > > Key: HDFS-3045 > URL: https://issues.apache.org/jira/browse/HDFS-3045 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Reporter: Eli Collins >Assignee: Colin Patrick McCabe > Attachments: HDFS-3045.001.patch > > > NamenodeFsck#lostFoundMove, when it fails to create a file for a block > continues on to the next block (There's a comment "perhaps we should bail out > here..." but it doesn't). It should instead fail the move for that particular > file (unwind the directory creation and not delete the original file). > Otherwise a transient failure speaking to the NN means this block is lost > forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225418#comment-13225418 ] Tsz Wo (Nicholas), SZE commented on HDFS-2038: -- With hdfs-2038.branch-0.23.2.supplemental.patch, there are still two tests failed. However, it looks like that the actual outputs are correct and the expected outputs are incorrect, i.e. the tests are incorrect. See below. Trunk has a similar problem except that both the expected and actual outputs are incorrect. So that the tests passed. {noformat} //0.23.2 2012-03-08 11:10:24,774 INFO ...(156)) - --- 2012-03-08 11:10:24,774 INFO ...(157)) - Test ID: [54] 2012-03-08 11:10:24,775 INFO ...(158)) -Test Description: [mv: file (absolute path) to file (relative path)] 2012-03-08 11:10:24,775 INFO ...(159)) - 2012-03-08 11:10:24,775 INFO ...(163)) - Test Commands: [-fs hdfs://localhost:62624 -touchz /file1] 2012-03-08 11:10:24,775 INFO ...(163)) - Test Commands: [-fs hdfs://localhost:62624 -mv /file1 file2] 2012-03-08 11:10:24,776 INFO ...(167)) - 2012-03-08 11:10:24,776 INFO ...(170)) -Cleanup Commands: [-fs hdfs://localhost:62624 -rm -r /file1] 2012-03-08 11:10:24,776 INFO ...(174)) - 2012-03-08 11:10:24,776 INFO ...(178)) - Comparator: [RegexpComparator] 2012-03-08 11:10:24,779 INFO ...(180)) - Comparision result: [fail] 2012-03-08 11:10:24,779 INFO ...(182)) - Expected output: [^mv: `file2': No such file or directory] 2012-03-08 11:10:24,779 INFO ...(184)) - Actual output: [mv: `/file1': Input/output error ] 2012-03-08 11:10:24,779 INFO ...(187)) - 2012-03-08 11:10:24,780 INFO ...(156)) - --- 2012-03-08 11:10:24,780 INFO ...(157)) - Test ID: [89] 2012-03-08 11:10:24,780 INFO ...(158)) -Test Description: [cp: copying non existent file (relative path)] 2012-03-08 11:10:24,780 INFO ...(159)) - 2012-03-08 11:10:24,780 INFO ...(163)) - Test Commands: [-fs hdfs://localhost:62624 -cp touchz test] 2012-03-08 11:10:24,781 INFO ...(163)) - Test Commands: [-fs hdfs://localhost:62624 -cp file1 file2] 2012-03-08 11:10:24,781 INFO ...(167)) - 2012-03-08 11:10:24,781 INFO ...(170)) -Cleanup Commands: [-fs hdfs://localhost:62624 -rm -r /user] 2012-03-08 11:10:24,781 INFO ...(174)) - 2012-03-08 11:10:24,781 INFO ...(178)) - Comparator: [RegexpComparator] 2012-03-08 11:10:24,781 INFO ...(180)) - Comparision result: [fail] 2012-03-08 11:10:24,782 INFO ...(182)) - Expected output: [^cp: `file2': No such file or directory] 2012-03-08 11:10:24,782 INFO ...(184)) - Actual output: [cp: `file1': No such file or directory ] {noformat} > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225424#comment-13225424 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1861 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1861/]) HDFS-2976. Remove unnecessary method (tokenRefetchNeeded) in DFSClient. (Contributed by Uma Maheswara Rao G) (Revision 1298495) Result = ABORTED umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298495 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2976: -- Resolution: Fixed Fix Version/s: 0.24.0 Target Version/s: 0.24.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Fix For: 0.24.0 > > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3065) HA: Newly active NameNode does not recognize decommissioning DataNode
[ https://issues.apache.org/jira/browse/HDFS-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225438#comment-13225438 ] Todd Lipcon commented on HDFS-3065: --- I think the solution here is that "refreshNodes" should be special-cased to send the RPC to all NNs in the cluster, instead of just the active one. Alternatively, we can improve the dfsadmin docs to indicate that you have to explicitly refresh on all NNs. > HA: Newly active NameNode does not recognize decommissioning DataNode > - > > Key: HDFS-3065 > URL: https://issues.apache.org/jira/browse/HDFS-3065 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: HA branch (HDFS-1623) >Reporter: Stephen Chu > > I'm working on a cluster where, originally, styx01 hosts the active NameNode > and styx02 hosts the standby NameNode. > In both styx01's and styx02's exclude file, I added the DataNode on styx03.I > then ran _hdfs dfsadmin -refreshNodes_ and verified on styx01 NN web UI that > the DN on styx03 was decommissioning. After waiting a few minutes, I checked > the standby NN web UI (while the DN was decommissioning) and didn't see that > the DN was marked as decommissioning. > I executed manual failover, making styx02 NN active and styx01 NN standby. I > checked the newly active NN web UI, and the DN was still not marked as > decommissioning, even after a few minutes. However, the newly standby NN's > web UI still showed the DN as decommissioning. > I added another DN to the exclude file, and executed _hdfs dfsadmin > -refreshNodes_, but the styx02 NN web UI still did not update with the > decommissioning nodes. > I failed back over to make styx01 NN active and styx02 NN standby. I checked > the styx01 NN web UI and saw that it correctly marked 2 DNs as > decommissioning. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3064) Allow datanodes to start with non-privileged ports for testing.
[ https://issues.apache.org/jira/browse/HDFS-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225442#comment-13225442 ] Todd Lipcon commented on HDFS-3064: --- Sure, I agree with you whole-heartedly :) But Jakob in particular seemed to be against it. I'll let him speak up if he still feels this way. > Allow datanodes to start with non-privileged ports for testing. > --- > > Key: HDFS-3064 > URL: https://issues.apache.org/jira/browse/HDFS-3064 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS-3064.trunk.patch > > > HADOOP-8078 allows enabling security in unit tests. However, datanodes still > can't be started because they require privileged ports. We should allow > datanodes to come up on non-privileged ports ONLY for testing. This part of > the code will be removed anyway, when HDFS-2856 is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2976) Remove unnecessary method (tokenRefetchNeeded) in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225455#comment-13225455 ] Hudson commented on HDFS-2976: -- Integrated in Hadoop-Mapreduce-trunk-Commit #1862 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1862/]) HDFS-2976 removed the unused imports that were missed in previous commit. (Revision 1298508) HDFS-2976 corrected the previous wrong commit for this issue. (Revision 1298507) Result = ABORTED umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298508 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1298507 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > Remove unnecessary method (tokenRefetchNeeded) in DFSClient > --- > > Key: HDFS-2976 > URL: https://issues.apache.org/jira/browse/HDFS-2976 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Trivial > Fix For: 0.24.0 > > Attachments: HDFS-2976.patch > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2288) Replicas awaiting recovery should return a full visible length
[ https://issues.apache.org/jira/browse/HDFS-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225461#comment-13225461 ] Tsz Wo (Nicholas), SZE commented on HDFS-2288: -- > Nicholas: given the above, do you think this patch is correct? Hi Todd, first sorry that I forgot to answer it earlier. I think the patch is incorrect. The data in RWR should not be visible since the RWR may be invalid. I agree with Konstantin that the reader should wait for recovery. Yes, it will take minutes. Why it takes so long? It is because the writer only writes to a single replica in the first place. > Replicas awaiting recovery should return a full visible length > -- > > Key: HDFS-2288 > URL: https://issues.apache.org/jira/browse/HDFS-2288 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 0.23.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Critical > Fix For: 0.24.0 > > Attachments: hdfs-2288.txt > > > Currently, if the client calls getReplicaVisibleLength for a RWR, it returns > a visible length of 0. This causes one of HBase's tests to fail, and I > believe it's incorrect behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225467#comment-13225467 ] Daryn Sharp commented on HDFS-2038: --- Agreed that test 89 appears to expect the wrong output, but test 54 looks very odd. It doesn't look like it should fail at all, let alone with I/O error. > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225480#comment-13225480 ] Kihwal Lee commented on HDFS-2038: -- That's strange. It passes on the current 0.23.2 with the supplemental patch for me. I reran multiple times. As for the error messages, they are like that since the path validation is done on destination first. In this case the parent directory of the destination doesn't exist. I think the error message can be more descriptive. I will discuss with Daryn and let him file a Jira for the improvement. In any case, would you retry the patch? > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225500#comment-13225500 ] Tsz Wo (Nicholas), SZE commented on HDFS-2038: -- I just have cleaned and re-run the test, still got the same output. > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225504#comment-13225504 ] Tsz Wo (Nicholas), SZE commented on HDFS-2038: -- Daryn, we did you get if you run the test on 0.23.2? > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225542#comment-13225542 ] Kihwal Lee commented on HDFS-2038: -- I even blew away .m2 and did everything fresh. It still passes. Are you building/running test from the root or under hadoop-hdfs project? If you are doing it under the project directory, the build might be picking up an old artifact, probably from old branch-0.23 builds. > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225554#comment-13225554 ] Tsz Wo (Nicholas), SZE commented on HDFS-2038: -- Hi Kihwal, you are right that it makes a difference if I run the test from the project root. It passes now. I have learned something from you. Thanks! > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2038) Update test to handle relative paths with globs
[ https://issues.apache.org/jira/browse/HDFS-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2038: - Resolution: Fixed Fix Version/s: 0.23.3 0.23.2 Status: Resolved (was: Patch Available) I have committed this to trunk, 0.23 and 0.23.2. Thanks, Kihwal and Daryn! > Update test to handle relative paths with globs > --- > > Key: HDFS-2038 > URL: https://issues.apache.org/jira/browse/HDFS-2038 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 0.23.0 >Reporter: Daryn Sharp >Assignee: Kihwal Lee >Priority: Critical > Fix For: 0.24.0, 0.23.2, 0.23.3 > > Attachments: HDFS-2038-2.patch, HDFS-2038.patch, > disable-TestHDFSCLI.patch, hdfs-2038.branch-0.23.2.supplemental.patch, > hdfs-2038.patch.txt > > > This is providing the test updates for FsShell to retain relativity for paths > with globs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3064) Allow datanodes to start with non-privileged ports for testing.
[ https://issues.apache.org/jira/browse/HDFS-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225577#comment-13225577 ] Allen Wittenauer commented on HDFS-3064: This looks like a band-aid over a bigger problem. > Allow datanodes to start with non-privileged ports for testing. > --- > > Key: HDFS-3064 > URL: https://issues.apache.org/jira/browse/HDFS-3064 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS-3064.trunk.patch > > > HADOOP-8078 allows enabling security in unit tests. However, datanodes still > can't be started because they require privileged ports. We should allow > datanodes to come up on non-privileged ports ONLY for testing. This part of > the code will be removed anyway, when HDFS-2856 is committed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3066) cap space usage of default log4j rolling policy (hdfs specific changes)
cap space usage of default log4j rolling policy (hdfs specific changes) --- Key: HDFS-3066 URL: https://issues.apache.org/jira/browse/HDFS-3066 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Reporter: Patrick Hunt Assignee: Patrick Hunt see HADOOP-8149 for background on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3066) cap space usage of default log4j rolling policy (hdfs specific changes)
[ https://issues.apache.org/jira/browse/HDFS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HDFS-3066: --- Attachment: HDFS-3066.patch Updated patch with hdfs only in this patch. I re-verified by running from tarball with DEBUG turned on and the maxfilesize set to just 10k. > cap space usage of default log4j rolling policy (hdfs specific changes) > --- > > Key: HDFS-3066 > URL: https://issues.apache.org/jira/browse/HDFS-3066 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts >Reporter: Patrick Hunt >Assignee: Patrick Hunt > Attachments: HDFS-3066.patch > > > see HADOOP-8149 for background on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3067) Null pointer in DFSInputStream.readBuffer if read is repeated on singly-replicated corrupted block
Null pointer in DFSInputStream.readBuffer if read is repeated on singly-replicated corrupted block -- Key: HDFS-3067 URL: https://issues.apache.org/jira/browse/HDFS-3067 Project: Hadoop HDFS Issue Type: Bug Reporter: Henry Robinson Assignee: Henry Robinson With a singly-replicated block that's corrupted, issuing a read against it twice in succession (e.g. if ChecksumException is caught by the client) gives a NullPointerException. Here's the body of a test that reproduces the problem: {code} final short REPL_FACTOR = 1; final long FILE_LENGTH = 512L; cluster.waitActive(); FileSystem fs = cluster.getFileSystem(); Path path = new Path("/corrupted"); DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); assertEquals("All replicas not corrupted", REPL_FACTOR, blockFilesCorrupted); InetSocketAddress nnAddr = new InetSocketAddress("localhost", cluster.getNameNodePort()); DFSClient client = new DFSClient(nnAddr, conf); DFSInputStream dis = client.open(path.toString()); byte[] arr = new byte[(int)FILE_LENGTH]; boolean sawException = false; try { dis.read(arr, 0, (int)FILE_LENGTH); } catch (ChecksumException ex) { sawException = true; } assertTrue(sawException); sawException = false; try { dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here } catch (ChecksumException ex) { sawException = true; } {code} The stack: {code} java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) [snip test stack] {code} and the problem is that currentNode is null. It's left at null after the first read, which fails, and then is never refreshed because the condition in read that protects blockSeekTo is only triggered if the current position is outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3066) cap space usage of default log4j rolling policy (hdfs specific changes)
[ https://issues.apache.org/jira/browse/HDFS-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated HDFS-3066: --- Status: Patch Available (was: Open) > cap space usage of default log4j rolling policy (hdfs specific changes) > --- > > Key: HDFS-3066 > URL: https://issues.apache.org/jira/browse/HDFS-3066 > Project: Hadoop HDFS > Issue Type: Improvement > Components: scripts >Reporter: Patrick Hunt >Assignee: Patrick Hunt > Attachments: HDFS-3066.patch > > > see HADOOP-8149 for background on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3067) Null pointer in DFSInputStream.readBuffer if read is repeated on singly-replicated corrupted block
[ https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3067: - Attachment: HDFS-3607.patch Patch + test. The problem was that the failing first read leaves currentNode == null when it can't find any non-corrupt replicas. However the next read doesn't check to see if currentNode == null but instead only checks to see if the current position is valid when deciding whether to open another block reader and update currentNode. Adding this condition fixes the bug. > Null pointer in DFSInputStream.readBuffer if read is repeated on > singly-replicated corrupted block > -- > > Key: HDFS-3067 > URL: https://issues.apache.org/jira/browse/HDFS-3067 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Henry Robinson >Assignee: Henry Robinson > Attachments: HDFS-3607.patch > > > With a singly-replicated block that's corrupted, issuing a read against it > twice in succession (e.g. if ChecksumException is caught by the client) gives > a NullPointerException. > Here's the body of a test that reproduces the problem: > {code} > final short REPL_FACTOR = 1; > final long FILE_LENGTH = 512L; > cluster.waitActive(); > FileSystem fs = cluster.getFileSystem(); > Path path = new Path("/corrupted"); > DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); > DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); > ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); > int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); > assertEquals("All replicas not corrupted", REPL_FACTOR, > blockFilesCorrupted); > InetSocketAddress nnAddr = > new InetSocketAddress("localhost", cluster.getNameNodePort()); > DFSClient client = new DFSClient(nnAddr, conf); > DFSInputStream dis = client.open(path.toString()); > byte[] arr = new byte[(int)FILE_LENGTH]; > boolean sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); > } catch (ChecksumException ex) { > sawException = true; > } > > assertTrue(sawException); > sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here > } catch (ChecksumException ex) { > sawException = true; > } > {code} > The stack: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) > [snip test stack] > {code} > and the problem is that currentNode is null. It's left at null after the > first read, which fails, and then is never refreshed because the condition in > read that protects blockSeekTo is only triggered if the current position is > outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3067) NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
[ https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-3067: - Summary: NPE in DFSInputStream.readBuffer if read is repeated on corrupted block (was: Null pointer in DFSInputStream.readBuffer if read is repeated on singly-replicated corrupted block) > NPE in DFSInputStream.readBuffer if read is repeated on corrupted block > --- > > Key: HDFS-3067 > URL: https://issues.apache.org/jira/browse/HDFS-3067 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Henry Robinson >Assignee: Henry Robinson > Attachments: HDFS-3607.patch > > > With a singly-replicated block that's corrupted, issuing a read against it > twice in succession (e.g. if ChecksumException is caught by the client) gives > a NullPointerException. > Here's the body of a test that reproduces the problem: > {code} > final short REPL_FACTOR = 1; > final long FILE_LENGTH = 512L; > cluster.waitActive(); > FileSystem fs = cluster.getFileSystem(); > Path path = new Path("/corrupted"); > DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); > DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); > ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); > int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); > assertEquals("All replicas not corrupted", REPL_FACTOR, > blockFilesCorrupted); > InetSocketAddress nnAddr = > new InetSocketAddress("localhost", cluster.getNameNodePort()); > DFSClient client = new DFSClient(nnAddr, conf); > DFSInputStream dis = client.open(path.toString()); > byte[] arr = new byte[(int)FILE_LENGTH]; > boolean sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); > } catch (ChecksumException ex) { > sawException = true; > } > > assertTrue(sawException); > sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here > } catch (ChecksumException ex) { > sawException = true; > } > {code} > The stack: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) > [snip test stack] > {code} > and the problem is that currentNode is null. It's left at null after the > first read, which fails, and then is never refreshed because the condition in > read that protects blockSeekTo is only triggered if the current position is > outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3068) RemoteBlockReader2 fails when using SocksSocketFactory
RemoteBlockReader2 fails when using SocksSocketFactory --- Key: HDFS-3068 URL: https://issues.apache.org/jira/browse/HDFS-3068 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Affects Versions: 0.23.1 Reporter: Tom White When hadoop.rpc.socket.factory.class.default is set to org.apache.hadoop.net.SocksSocketFactory, HDFS file reads fail with errors like {noformat} Socket Socket[addr=/10.12.185.132,port=50010,localport=55216] does not have an associated Channel. {noformat} The workaround is to set dfs.client.use.legacy.blockreader=true to use the old implementation of RemoteBlockReader. RemoteBlockReader should not be removed until this bug is fixed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-2834: - Attachment: HDFS-2834.9.patch New patch addressing review comments. Note - I've left out the libhdfs work and plan to address that in a separate ticket since I think the changes we talked about might need to be reflected in the existing read path, making this patch more cumbersome. > ByteBuffer-based read API for DFSInputStream > > > Key: HDFS-2834 > URL: https://issues.apache.org/jira/browse/HDFS-2834 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Henry Robinson >Assignee: Henry Robinson > Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, > HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, > HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, HDFS-2834.patch, > hdfs-2834-libhdfs-benchmark.png > > > The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated > {{byte[]}}. Although for many clients this is desired behaviour, in certain > situations, such as native-reads through libhdfs, this imposes an extra copy > penalty since the {{byte[]}} needs to be copied out again into a natively > readable memory area. > For these cases, it would be preferable to allow the client to supply its own > buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225740#comment-13225740 ] jirapos...@reviews.apache.org commented on HDFS-2834: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ --- (Updated 2012-03-09 00:47:24.765130) Review request for hadoop-hdfs and Todd Lipcon. Summary --- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834. http://issues.apache.org/jira/browse/HDFS-2834 Diffs - hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 Diff: https://reviews.apache.org/r/4212/diff Testing --- Thanks, Henry > ByteBuffer-based read API for DFSInputStream > > > Key: HDFS-2834 > URL: https://issues.apache.org/jira/browse/HDFS-2834 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Henry Robinson >Assignee: Henry Robinson > Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, > HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, > HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, HDFS-2834.patch, > hdfs-2834-libhdfs-benchmark.png > > > The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated > {{byte[]}}. Although for many clients this is desired behaviour, in certain > situations, such as native-reads through libhdfs, this imposes an extra copy > penalty since the {{byte[]}} needs to be copied out again into a natively > readable memory area. > For these cases, it would be preferable to allow the client to supply its own > buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3069) If an edits file has more edits in it than expected by its name, should trigger an error
If an edits file has more edits in it than expected by its name, should trigger an error Key: HDFS-3069 URL: https://issues.apache.org/jira/browse/HDFS-3069 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Todd Lipcon Assignee: Todd Lipcon In testing what happens in HA split brain scenarios, I ended up with an edits log that was named edits_47-47 but actually had two edits in it (#47 and #48). The edits loading process should detect this situation and barf. Otherwise, the problem shows up later during loading or even on the next restart, and is tough to fix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3067) NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
[ https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3067: - Target Version/s: 0.24.0 Affects Version/s: 0.24.0 Status: Patch Available (was: Open) Marking PA for Henry so test-patch runs. > NPE in DFSInputStream.readBuffer if read is repeated on corrupted block > --- > > Key: HDFS-3067 > URL: https://issues.apache.org/jira/browse/HDFS-3067 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.24.0 >Reporter: Henry Robinson >Assignee: Henry Robinson > Attachments: HDFS-3607.patch > > > With a singly-replicated block that's corrupted, issuing a read against it > twice in succession (e.g. if ChecksumException is caught by the client) gives > a NullPointerException. > Here's the body of a test that reproduces the problem: > {code} > final short REPL_FACTOR = 1; > final long FILE_LENGTH = 512L; > cluster.waitActive(); > FileSystem fs = cluster.getFileSystem(); > Path path = new Path("/corrupted"); > DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); > DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); > ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); > int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); > assertEquals("All replicas not corrupted", REPL_FACTOR, > blockFilesCorrupted); > InetSocketAddress nnAddr = > new InetSocketAddress("localhost", cluster.getNameNodePort()); > DFSClient client = new DFSClient(nnAddr, conf); > DFSInputStream dis = client.open(path.toString()); > byte[] arr = new byte[(int)FILE_LENGTH]; > boolean sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); > } catch (ChecksumException ex) { > sawException = true; > } > > assertTrue(sawException); > sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here > } catch (ChecksumException ex) { > sawException = true; > } > {code} > The stack: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) > [snip test stack] > {code} > and the problem is that currentNode is null. It's left at null after the > first read, which fails, and then is never refreshed because the condition in > read that protects blockSeekTo is only triggered if the current position is > outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes
hdfs balancer doesn't balance blocks between datanodes -- Key: HDFS-3070 URL: https://issues.apache.org/jira/browse/HDFS-3070 Project: Hadoop HDFS Issue Type: Bug Components: balancer Affects Versions: 0.24.0 Reporter: Stephen Chu Attachments: unbalanced_nodes.png I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, both have over 3% disk usage. Attached is a screenshot of the Live Nodes web UI. On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see the blocks being balanced across all 4 datanodes (all blocks on styx01 and styx02 stay put). HA is currently enabled. [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1 active [schu@styx01 ~]$ hdfs balancer -threshold 1 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] 12/03/08 10:10:32 INFO balancer.Balancer: p = Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Balancing took 95.0 milliseconds [schu@styx01 ~]$ I believe with a threshold of 1% the balancer should trigger blocks being moved across DataNodes, right? I am curious about the "namenode = []" from the above output. [schu@styx01 ~]$ hadoop version Hadoop 0.24.0-SNAPSHOT Subversion git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common -r f6a577d697bbcd04ffbc568167c97b79479ff319 Compiled by schu on Thu Mar 8 15:32:50 PST 2012 >From source with checksum ec971a6e7316f7fbf471b617905856b8 >From >http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html: The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes
[ https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-3070: -- Attachment: unbalanced_nodes.png > hdfs balancer doesn't balance blocks between datanodes > -- > > Key: HDFS-3070 > URL: https://issues.apache.org/jira/browse/HDFS-3070 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 0.24.0 >Reporter: Stephen Chu > Attachments: unbalanced_nodes.png > > > I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, > both have over 3% disk usage. > Attached is a screenshot of the Live Nodes web UI. > On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see > the blocks being balanced across all 4 datanodes (all blocks on styx01 and > styx02 stay put). > HA is currently enabled. > [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1 > active > [schu@styx01 ~]$ hdfs balancer -threshold 1 > 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0 > 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] > 12/03/08 10:10:32 INFO balancer.Balancer: p = > Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move > Bytes Being Moved > Balancing took 95.0 milliseconds > [schu@styx01 ~]$ > I believe with a threshold of 1% the balancer should trigger blocks being > moved across DataNodes, right? I am curious about the "namenode = []" from > the above output. > [schu@styx01 ~]$ hadoop version > Hadoop 0.24.0-SNAPSHOT > Subversion > git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common > -r f6a577d697bbcd04ffbc568167c97b79479ff319 > Compiled by schu on Thu Mar 8 15:32:50 PST 2012 > From source with checksum ec971a6e7316f7fbf471b617905856b8 > From > http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html: > The threshold parameter is a fraction in the range of (0%, 100%) with a > default value of 10%. The threshold sets a target for whether the cluster is > balanced. A cluster is balanced if for each datanode, the utilization of the > node (ratio of used space at the node to total capacity of the node) differs > from the utilization of the (ratio of used space in the cluster to total > capacity of the cluster) by no more than the threshold value. The smaller the > threshold, the more balanced a cluster will become. It takes more time to run > the balancer for small threshold values. Also for a very small threshold the > cluster may not be able to reach the balanced state when applications write > and delete files concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3067) NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
[ https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3067: - Component/s: hdfs client > NPE in DFSInputStream.readBuffer if read is repeated on corrupted block > --- > > Key: HDFS-3067 > URL: https://issues.apache.org/jira/browse/HDFS-3067 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Henry Robinson >Assignee: Henry Robinson > Attachments: HDFS-3607.patch > > > With a singly-replicated block that's corrupted, issuing a read against it > twice in succession (e.g. if ChecksumException is caught by the client) gives > a NullPointerException. > Here's the body of a test that reproduces the problem: > {code} > final short REPL_FACTOR = 1; > final long FILE_LENGTH = 512L; > cluster.waitActive(); > FileSystem fs = cluster.getFileSystem(); > Path path = new Path("/corrupted"); > DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); > DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); > ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); > int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); > assertEquals("All replicas not corrupted", REPL_FACTOR, > blockFilesCorrupted); > InetSocketAddress nnAddr = > new InetSocketAddress("localhost", cluster.getNameNodePort()); > DFSClient client = new DFSClient(nnAddr, conf); > DFSInputStream dis = client.open(path.toString()); > byte[] arr = new byte[(int)FILE_LENGTH]; > boolean sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); > } catch (ChecksumException ex) { > sawException = true; > } > > assertTrue(sawException); > sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here > } catch (ChecksumException ex) { > sawException = true; > } > {code} > The stack: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) > [snip test stack] > {code} > and the problem is that currentNode is null. It's left at null after the > first read, which fails, and then is never refreshed because the condition in > read that protects blockSeekTo is only triggered if the current position is > outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3070) hdfs balancer doesn't balance blocks between datanodes
[ https://issues.apache.org/jira/browse/HDFS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Chu updated HDFS-3070: -- Attachment: unbalanced_nodes_inservice.png Woops, the first screenshot shows that 2 nodes are decommissioned. After recommissioning them and attempting to run hdfs balancer, the nodes still don't become balanced and the balancer claims to complete ~100 ms. > hdfs balancer doesn't balance blocks between datanodes > -- > > Key: HDFS-3070 > URL: https://issues.apache.org/jira/browse/HDFS-3070 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer >Affects Versions: 0.24.0 >Reporter: Stephen Chu > Attachments: unbalanced_nodes.png, unbalanced_nodes_inservice.png > > > I TeraGenerated data into DataNodes styx01 and styx02. Looking at the web UI, > both have over 3% disk usage. > Attached is a screenshot of the Live Nodes web UI. > On styx01, I run the _hdfs balancer_ command with threshold 1% and don't see > the blocks being balanced across all 4 datanodes (all blocks on styx01 and > styx02 stay put). > HA is currently enabled. > [schu@styx01 ~]$ hdfs haadmin -getServiceState nn1 > active > [schu@styx01 ~]$ hdfs balancer -threshold 1 > 12/03/08 10:10:32 INFO balancer.Balancer: Using a threshold of 1.0 > 12/03/08 10:10:32 INFO balancer.Balancer: namenodes = [] > 12/03/08 10:10:32 INFO balancer.Balancer: p = > Balancer.Parameters[BalancingPolicy.Node, threshold=1.0] > Time Stamp Iteration# Bytes Already Moved Bytes Left To Move > Bytes Being Moved > Balancing took 95.0 milliseconds > [schu@styx01 ~]$ > I believe with a threshold of 1% the balancer should trigger blocks being > moved across DataNodes, right? I am curious about the "namenode = []" from > the above output. > [schu@styx01 ~]$ hadoop version > Hadoop 0.24.0-SNAPSHOT > Subversion > git://styx01.sf.cloudera.com/home/schu/hadoop-common/hadoop-common-project/hadoop-common > -r f6a577d697bbcd04ffbc568167c97b79479ff319 > Compiled by schu on Thu Mar 8 15:32:50 PST 2012 > From source with checksum ec971a6e7316f7fbf471b617905856b8 > From > http://hadoop.apache.org/hdfs/docs/r0.21.0/api/org/apache/hadoop/hdfs/server/balancer/Balancer.html: > The threshold parameter is a fraction in the range of (0%, 100%) with a > default value of 10%. The threshold sets a target for whether the cluster is > balanced. A cluster is balanced if for each datanode, the utilization of the > node (ratio of used space at the node to total capacity of the node) differs > from the utilization of the (ratio of used space in the cluster to total > capacity of the cluster) by no more than the threshold value. The smaller the > threshold, the more balanced a cluster will become. It takes more time to run > the balancer for small threshold values. Also for a very small threshold the > cluster may not be able to reach the balanced state when applications write > and delete files concurrently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3071) haadmin failover command does not provide enough detail for when target NN is not ready to be active
haadmin failover command does not provide enough detail for when target NN is not ready to be active Key: HDFS-3071 URL: https://issues.apache.org/jira/browse/HDFS-3071 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger When running the failover command, you can get an error message like the following: {quote} $ hdfs --config $(pwd) haadmin -failover namenode2 namenode1 Failover failed: xxx.yyy/1.2.3.4:8020 is not ready to become active {quote} Unfortunately, the error message doesn't describe why that node isn't ready to be active. In my case, the target namenode's logs don't indicate anything either. It turned out that the issue was "Safe mode is ON.Resources are low on NN. Safe mode must be turned off manually.", but ideally the user would be told that at the time of the failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3067) NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
[ https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225787#comment-13225787 ] Hadoop QA commented on HDFS-3067: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12517646/HDFS-3607.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/1975//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1975//console This message is automatically generated. > NPE in DFSInputStream.readBuffer if read is repeated on corrupted block > --- > > Key: HDFS-3067 > URL: https://issues.apache.org/jira/browse/HDFS-3067 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.24.0 >Reporter: Henry Robinson >Assignee: Henry Robinson > Attachments: HDFS-3607.patch > > > With a singly-replicated block that's corrupted, issuing a read against it > twice in succession (e.g. if ChecksumException is caught by the client) gives > a NullPointerException. > Here's the body of a test that reproduces the problem: > {code} > final short REPL_FACTOR = 1; > final long FILE_LENGTH = 512L; > cluster.waitActive(); > FileSystem fs = cluster.getFileSystem(); > Path path = new Path("/corrupted"); > DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L); > DFSTestUtil.waitReplication(fs, path, REPL_FACTOR); > ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path); > int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block); > assertEquals("All replicas not corrupted", REPL_FACTOR, > blockFilesCorrupted); > InetSocketAddress nnAddr = > new InetSocketAddress("localhost", cluster.getNameNodePort()); > DFSClient client = new DFSClient(nnAddr, conf); > DFSInputStream dis = client.open(path.toString()); > byte[] arr = new byte[(int)FILE_LENGTH]; > boolean sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); > } catch (ChecksumException ex) { > sawException = true; > } > > assertTrue(sawException); > sawException = false; > try { > dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here > } catch (ChecksumException ex) { > sawException = true; > } > {code} > The stack: > {code} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545) > [snip test stack] > {code} > and the problem is that currentNode is null. It's left at null after the > first read, which fails, and then is never refreshed because the condition in > read that protects blockSeekTo is only triggered if the current position is > outside the block's range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3072) haadmin should have configurable timeouts for failover commands
haadmin should have configurable timeouts for failover commands --- Key: HDFS-3072 URL: https://issues.apache.org/jira/browse/HDFS-3072 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger The HAAdmin failover could should time out reasonably aggressively and go onto the fencing strategies if it's dealing with a mostly dead active namenode. Currently it uses what's probably the default, which is to say no timeout whatsoever. {code} /** * Return a proxy to the specified target service. */ protected HAServiceProtocol getProtocol(String serviceId) throws IOException { String serviceAddr = getServiceAddr(serviceId); InetSocketAddress addr = NetUtils.createSocketAddr(serviceAddr); return (HAServiceProtocol)RPC.getProxy( HAServiceProtocol.class, HAServiceProtocol.versionID, addr, getConf()); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3063) NameNode should validate all coming file path
[ https://issues.apache.org/jira/browse/HDFS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13225850#comment-13225850 ] Denny Ye commented on HDFS-3063: hi Daryn, in case we use common place to path validation, does there need hook on each method that client can be invoke? The change looks that with validation of SafeMode. In my opinion, a possible place is RPC Server before reflection invocation, but it's terrible for RPC independence > NameNode should validate all coming file path > - > > Key: HDFS-3063 > URL: https://issues.apache.org/jira/browse/HDFS-3063 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.20.205.0 >Reporter: Denny Ye >Priority: Minor > Labels: namenode > Attachments: HDFS-3063.patch > > > NameNode provides RPC service for not only DFS client but also user defined > program. A common case we always met is that user transfers file path > prefixed with HDFS protocol("hdfs://{namenode:{port}}/{folder}/{file}"). > NameNode cannot map node meta-data with this path and always throw NPE. In > user client, we only see the NullPointerException, no other tips for which > step it occurs. > Also, NameNode should validate all coming file path with regular format. > One exception I met: > Exception in thread "main" org.apache.hadoop.ipc.RemoteException: > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:334) > at > org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:329) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira