[jira] [Updated] (HDFS-8226) Non-HA rollback compatibility broken
[ https://issues.apache.org/jira/browse/HDFS-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-8226: - Attachment: HDFS-8226.2.patch Updated the patch with latest code. Please review. Non-HA rollback compatibility broken Key: HDFS-8226 URL: https://issues.apache.org/jira/browse/HDFS-8226 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: J.Andreina Assignee: J.Andreina Priority: Blocker Attachments: HDFS-8226.1.patch, HDFS-8226.2.patch In HA while performing rollback , we use “hdfs namenode –rollback” which would prompt user for confirmation. ( Implemented as part of HDFS-5138) For Non-HA , as per doc if we perform rollback using “start-dfs.sh –rollback” , then namenode startup hangs ( As it tries to start namenode in daemon mode , hence will not be able to prompt user for confirmation ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing
[ https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doris Gu updated HDFS-7601: --- Attachment: 7601-test.patch Operations(e.g. balance) failed due to deficient configuration parsing -- Key: HDFS-7601 URL: https://issues.apache.org/jira/browse/HDFS-7601 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.3.0, 2.6.0 Reporter: Doris Gu Assignee: Doris Gu Priority: Minor Attachments: 7601-test.patch Some operations, for example,balance,parses configuration(from core-site.xml,hdfs-site.xml) to get NameServiceUris to link to. Current method considers those end with or without / as two different uris, then following operation may meet errors. bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different uris which actually the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7601) Operations(e.g. balance) failed due to deficient configuration parsing
[ https://issues.apache.org/jira/browse/HDFS-7601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513640#comment-14513640 ] Hadoop QA commented on HDFS-7601: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728328/7601-test.patch | | Optional Tests | javac unit | | git revision | trunk / 618ba70 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10408/console | This message was automatically generated. Operations(e.g. balance) failed due to deficient configuration parsing -- Key: HDFS-7601 URL: https://issues.apache.org/jira/browse/HDFS-7601 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.3.0, 2.6.0 Reporter: Doris Gu Assignee: Doris Gu Priority: Minor Attachments: 7601-test.patch Some operations, for example,balance,parses configuration(from core-site.xml,hdfs-site.xml) to get NameServiceUris to link to. Current method considers those end with or without / as two different uris, then following operation may meet errors. bq. [hdfs://haCluster, hdfs://haCluster/] are considered to be two different uris which actually the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8108) Fsck should provide the info on mandatory option to be used along with -blocks , -locations and -racks
[ https://issues.apache.org/jira/browse/HDFS-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513795#comment-14513795 ] J.Andreina commented on HDFS-8108: -- This issue is for updating the usage description of fsck . So no need of testcase Fsck should provide the info on mandatory option to be used along with -blocks , -locations and -racks Key: HDFS-8108 URL: https://issues.apache.org/jira/browse/HDFS-8108 Project: Hadoop HDFS Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Priority: Trivial Attachments: HDFS-8108.1.patch, HDFS-8108.2.patch Fsck usage information should provide the information on which options are mandatory to be passed along with -blocks , -locations and -racks to be in sync with documentation. For example : To get information on: 1. Blocks (-blocks), option -files should also be used. 2. Rack information (-racks), option -files and -blocks should also be used. {noformat} ./hdfs fsck -files -blocks ./hdfs fsck -files -blocks -racks {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Attachment: HDFS-8204.003.patch Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7949) WebImageViewer need support file size calculation with striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7949: --- Attachment: (was: HDFS-7949-007) WebImageViewer need support file size calculation with striped blocks - Key: HDFS-7949 URL: https://issues.apache.org/jira/browse/HDFS-7949 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hui Zheng Assignee: Rakesh R Priority: Minor Attachments: HDFS-7949-001.patch, HDFS-7949-002.patch, HDFS-7949-003.patch, HDFS-7949-004.patch, HDFS-7949-005.patch, HDFS-7949-006.patch The file size calculation should be changed when the blocks of the file are striped in WebImageViewer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8266) Erasure Coding: Test of snapshot/.trash with EC files
GAO Rui created HDFS-8266: - Summary: Erasure Coding: Test of snapshot/.trash with EC files Key: HDFS-8266 URL: https://issues.apache.org/jira/browse/HDFS-8266 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8242) Erasure Coding: XML based end-to-end test for ECCli commands
[ https://issues.apache.org/jira/browse/HDFS-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8242: --- Attachment: HDFS-8242-003.patch Erasure Coding: XML based end-to-end test for ECCli commands Key: HDFS-8242 URL: https://issues.apache.org/jira/browse/HDFS-8242 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8242-001.patch, HDFS-8242-002.patch, HDFS-8242-003.patch This JIRA to add test cases with CLI test f/w for the commands present in {{ECCli}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8267) Erasure Coding: Test of Namenode with EC files
[ https://issues.apache.org/jira/browse/HDFS-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513911#comment-14513911 ] Rakesh R commented on HDFS-8267: Thanks [~demongaorui] for creating this task. I've written few HA related EC zone test cases, but couldn't able to complete as HDFS-7859 is open. I'm happy to take up this task, shall I? I could see there are couple of unit testing raised recently. IMHO its good to group all these EC related testing tasks together. One way is to create separate umbrella task and move everything under this. What others opinion? Erasure Coding: Test of Namenode with EC files -- Key: HDFS-8267 URL: https://issues.apache.org/jira/browse/HDFS-8267 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui Labels: EC, test 1. Namenode startup with EC: 1.1. Safemode 1.2. BlockReport 2. Namenode HA with EC: 2.1. Fsimage and editlog test 2.2. Hot restart and recovery of Active NameNode after failure 2.3. Hot restart and recovery of Standby NameNode after failure 2.4. Restart and recovery of both Active and Standby NameNode fail in the same time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5574: Attachment: HDFS-5574.007.patch Thanks for the review Akira. Update the patch to fix compile and stylecheck warnings Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8242) Erasure Coding: XML based end-to-end test for ECCli commands
[ https://issues.apache.org/jira/browse/HDFS-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513852#comment-14513852 ] Rakesh R commented on HDFS-8242: Thanks a lot [~zhz] and [~vinayrpet] for the review comments. I've tried to include more cases now. bq.For each and every test, which modifies the system, such as createZone, we can have a cleanup command added cleanup command. *Below are the test cases, please see:* 1: help: help for erasure coding command 2: help: createZone command 3: help: getZoneInfo command 4: help: listSchemas command 5: createZone : create a zone to encode files 6: createZone : default schema 7: getZoneInfo : get information about the EC zone at specified path 8: getZoneInfo : get EC zone at specified file path 9: listSchemas : get the list of ECSchemas supported 10: createZone : illegal parameters - path is missing 11: createZone : illegal parameters - schema name is missing 12: createZone : illegal parameters - too many arguments 13: createZone : illegal parameters - invalidschema 14: createZone : illegal parameters - no such file 15: getZoneInfo : illegal parameters - path is missing 16: getZoneInfo : illegal parameters - too many arguments 17: getZoneInfo : illegal parameters - no such file 18: listSchemas : illegal parameters - too many parameters Also, done one minor improvements of comma separated {{ecschemaname}} appending in {{ECCommand.java}} class. Attached patch addressing the comments, could you please review it again. Thanks! Erasure Coding: XML based end-to-end test for ECCli commands Key: HDFS-8242 URL: https://issues.apache.org/jira/browse/HDFS-8242 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8242-001.patch, HDFS-8242-002.patch, HDFS-8242-003.patch This JIRA to add test cases with CLI test f/w for the commands present in {{ECCli}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513833#comment-14513833 ] Hadoop QA commented on HDFS-8204: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 5m 24s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 2s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 167m 11s | Tests passed in hadoop-hdfs. | | | | 213m 8s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728324/HDFS-8204.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 618ba70 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10407/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10407/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10407/console | This message was automatically generated. Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule*
[jira] [Updated] (HDFS-8271) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled
[ https://issues.apache.org/jira/browse/HDFS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8271: Affects Version/s: 2.7.1 NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled Key: HDFS-8271 URL: https://issues.apache.org/jira/browse/HDFS-8271 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 NameNode works properly on IPv4 or IPv6 single stack (assuming in the latter case that scripts have been changed to disable preferIPv4Stack, and dependent on the client/data node fix in HDFS-8078). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses being set.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8205) CommandFormat#parse() should not parse option as value of option
[ https://issues.apache.org/jira/browse/HDFS-8205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514824#comment-14514824 ] Hudson commented on HDFS-8205: -- FAILURE: Integrated in Hadoop-trunk-Commit #7687 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7687/]) HDFS-8205. CommandFormat#parse() should not parse option as value of option. (Contributed by Peter Shi and Xiaoyu Yao) (arp: rev 0d5b0143cc003e132ce454415e35d55d46311416) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CommandFormat.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestCount.java HDFS-8205. Fix CHANGES.txt (arp: rev 6bae5962cd70ac33fe599c50fb2a906830e5d4b2) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt CommandFormat#parse() should not parse option as value of option Key: HDFS-8205 URL: https://issues.apache.org/jira/browse/HDFS-8205 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Peter Shi Assignee: Peter Shi Priority: Blocker Fix For: 2.8.0 Attachments: HDFS-8205.01.patch, HDFS-8205.02.patch, HDFS-8205.patch {code}./hadoop fs -count -q -t -h -v / QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTADIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME 15/04/21 15:20:19 INFO hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 9223372036854775807 9223372036854775763none inf 31 13 1230 /{code} This blocks query quota by storage type and clear quota by storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint
[ https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-8214: --- Attachment: HDFS-8214.003.patch [~andrew.wang], Thanks for the review. The .003 patch makes your suggested changes. I also added a 0 check to dfs-dust.js. Perhaps it should just return instead of unknown. Secondary NN Web UI shows wrong date for Last Checkpoint Key: HDFS-8214 URL: https://issues.apache.org/jira/browse/HDFS-8214 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch, HDFS-8214.003.patch SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in the web UI. This causes weird times, generally, just after the epoch, to be displayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8269: - Status: Patch Available (was: Open) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8269.000.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7559) Create unit test to automatically compare HDFS related classes and hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515019#comment-14515019 ] Hadoop QA commented on HDFS-7559: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 7s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 4m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 30s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 1m 19s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 182m 56s | Tests failed in hadoop-hdfs. | | | | 206m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | Timed out tests | org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728465/HDFS-7559.004.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 7f07c4d | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10415/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10415/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10415/console | This message was automatically generated. Create unit test to automatically compare HDFS related classes and hdfs-default.xml --- Key: HDFS-7559 URL: https://issues.apache.org/jira/browse/HDFS-7559 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: HDFS-7559.001.patch, HDFS-7559.002.patch, HDFS-7559.003.patch, HDFS-7559.004.patch Create a unit test that will automatically compare the fields in the various HDFS related classes and hdfs-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515054#comment-14515054 ] Nate Edel commented on HDFS-8078: - Thanks! The base case of just appending ipAddr blindly was the existing behavior, and it's not super. As far as I can tell, the existing code path won't pass in a null, but will in some cases pass in empty string -- the comment addresses that. I can update the patch to treat null with (as it stands, it will append the string null -- forcing that results in an UnknownHostException when it tries to open the URI) -- neither option is super, and this change might mask some bugs (UnknownHostException on null being potentially clearer than a failed connection back to localhost.) ipAddr.contains(:) is a very basic check that it might be an IPv6 address; we should never have an ip:port pair at that point. I then call getByName() as the safest way to validate that it's a legal IPv6 address and convert it to an Inet6Address (the latter allows us to normalize it if it's in a compressed format - e.g. [::1] to [0:0:0:0:0:0:0:1] - or there is other valid but undesirable formatting weirdness.) getByName will never do a DNS lookup; it uses the same heuristic of is there a colon to assume things are IPv6 literals. I don't see any current cases which will pass the ipAddr.contains(:) test but not be an IPv6 address, but if through a bug we got an ip:port combination passed in to DataNodeID as the ipAddr string, it would generate the UnknownHostException as it would fail to parse as an IPV6 literal without doing a DNS lookup (e.g. java.net.UnknownHostException: 127.0.0.1:9000: invalid IPv6 address) The last behavior might be JDK-dependent, but appears to be the case for recent Oracle releases on both 1.7 and 1.8 (and on a quick look at the code for InetAddress.java this is unchanged in OpenJDK.) Minor patch change coming later this afternoon to address nit and case of null being passed in. HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 Attachments: HDFS-8078.4.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last
[jira] [Updated] (HDFS-8271) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled
[ https://issues.apache.org/jira/browse/HDFS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8271: Description: NameNode works properly on IPv4 or IPv6 single stack (assuming in the latter case that scripts have been changed to disable preferIPv4Stack, and dependent on the client/data node fix in HDFS-8078). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses being set.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) was: NameNode works properly on IPv4 or IPv6 single stack (assuming scripts have been changed to disable preferIPv4Stack). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled Key: HDFS-8271 URL: https://issues.apache.org/jira/browse/HDFS-8271 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 NameNode works properly on IPv4 or IPv6 single stack (assuming in the latter case that scripts have been changed to disable preferIPv4Stack, and dependent on the client/data node fix in HDFS-8078). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses being set.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515030#comment-14515030 ] Allen Wittenauer commented on HDFS-7859: test-patch.sh reads the name of the patch, not any of the JIRA metadata. So if the patch is named something generic, it thinks it is trunk. See HowToContribute for the official rules, but as you can see from the name of the patch above, it knows about a few different methods to name them. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514890#comment-14514890 ] Zhe Zhang commented on HDFS-7859: - [~aw] I quickly went through HDFS-7285 sub tasks. If you'd like you can try with HDFS-8236. I actually tried with HDFS-8033 earlier but it still tried to apply the patch against trunk. Maybe it's because I didn't set target version to HDFS-7285 _when submitting patch_. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8271) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled
[ https://issues.apache.org/jira/browse/HDFS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8271: Tags: (was: ipv6) Labels: ipv6 (was: ) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled Key: HDFS-8271 URL: https://issues.apache.org/jira/browse/HDFS-8271 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 NameNode works properly on IPv4 or IPv6 single stack (assuming scripts have been changed to disable preferIPv4Stack). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514859#comment-14514859 ] Chris Nauroth commented on HDFS-8232: - Thank you, Anu. +1 for patch v002 pending Jenkins run. Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8269: - Attachment: HDFS-8269.000.patch getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8269.000.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8271) NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled
Nate Edel created HDFS-8271: --- Summary: NameNode should bind on both IPv6 and IPv4 if running on dual-stack machine and IPv6 enabled Key: HDFS-8271 URL: https://issues.apache.org/jira/browse/HDFS-8271 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Nate Edel Assignee: Nate Edel NameNode works properly on IPv4 or IPv6 single stack (assuming scripts have been changed to disable preferIPv4Stack). On dual-stack machines, NameNode listens only on IPv4 (even ignoring preferIPv6Addresses.) Our initial use case for IPv6 is IPv6-only clusters, but ideally we'd support binding to both the IPv4 and IPv6 machine addresses so that we can support heterogenous clusters (some dual-stack and some IPv6-only machines.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8241) Remove unused Namenode startup option FINALIZE
[ https://issues.apache.org/jira/browse/HDFS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514898#comment-14514898 ] Konstantin Shvachko commented on HDFS-8241: --- Historically, initial implementation of upgrade HADOOP-702 did not have the finalize startup option, only the admin command. {{StartupOption.FINALIZE}} was requested later HADOOP-1604 because people wanted to be able to start new software even if they forgot to {{-finalizeUpgrade}} before shutting down the cluster. Remove unused Namenode startup option FINALIZE - Key: HDFS-8241 URL: https://issues.apache.org/jira/browse/HDFS-8241 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: HDFS-8241.patch Command : hdfs namenode -finalize 15/04/24 22:26:23 INFO namenode.NameNode: createNameNode [-finalize] *Use of the argument 'FINALIZE' is no longer supported.* To finalize an upgrade, start the NN and then run `hdfs dfsadmin -finalizeUpgrade' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8248) Store INodeId instead of the INodeFile object in BlockInfoContiguous
[ https://issues.apache.org/jira/browse/HDFS-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515051#comment-14515051 ] Hadoop QA commented on HDFS-8248: - (!) The patch artifact directory on has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-HDFS-Build/10413/ may provide some hints. Store INodeId instead of the INodeFile object in BlockInfoContiguous Key: HDFS-8248 URL: https://issues.apache.org/jira/browse/HDFS-8248 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8248.000.patch, HDFS-8248.001.patch Currently the namespace and the block manager are tightly coupled together. There are two couplings in terms of implementation: 1. The {{BlockInfoContiguous}} stores a reference of the {{INodeFile}} that owns the block, so that the block manager can look up the corresponding file when replicating blocks, recovering from pipeline failures, etc. 1. The {{INodeFile}} stores {{BlockInfoContiguous}} objects that the file owns. Decoupling the namespace and the block manager allows the BM to be separated out from the Java heap or even as a standalone process. This jira proposes to remove the first coupling by storing the id of the inode instead of the object reference of {{INodeFile}} in the {{BlockInfoContiguous}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8204: Attachment: (was: HDFS-8204.003.patch) Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513638#comment-14513638 ] Hadoop QA commented on HDFS-8204: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 40s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 59s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 4s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 170m 9s | Tests passed in hadoop-hdfs. | | | | 214m 45s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728301/HDFS-8204.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1a2459b | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10404/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10404/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10404/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10404/console | This message was automatically generated. Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at
[jira] [Commented] (HDFS-7687) Change fsck to support EC files
[ https://issues.apache.org/jira/browse/HDFS-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513637#comment-14513637 ] Takanobu Asanuma commented on HDFS-7687: This ticket is depends on some commits in trunk now, such as HDFS-7993, HDFS-8215 and so on. I will submit patches after the commits are merged into HDFS-7285. Change fsck to support EC files --- Key: HDFS-7687 URL: https://issues.apache.org/jira/browse/HDFS-7687 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Takanobu Asanuma We need to change fsck so that it can detect under replicated and corrupted EC files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8262) Erasure Coding: Test of datanode decommission which EC blocks are stored
GAO Rui created HDFS-8262: - Summary: Erasure Coding: Test of datanode decommission which EC blocks are stored Key: HDFS-8262 URL: https://issues.apache.org/jira/browse/HDFS-8262 Project: Hadoop HDFS Issue Type: Test Reporter: GAO Rui -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7949) WebImageViewer need support file size calculation with striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7949: --- Attachment: HDFS-7949-007.patch WebImageViewer need support file size calculation with striped blocks - Key: HDFS-7949 URL: https://issues.apache.org/jira/browse/HDFS-7949 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hui Zheng Assignee: Rakesh R Priority: Minor Attachments: HDFS-7949-001.patch, HDFS-7949-002.patch, HDFS-7949-003.patch, HDFS-7949-004.patch, HDFS-7949-005.patch, HDFS-7949-006.patch, HDFS-7949-007.patch The file size calculation should be changed when the blocks of the file are striped in WebImageViewer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7949) WebImageViewer need support file size calculation with striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513649#comment-14513649 ] Rakesh R commented on HDFS-7949: Thanks again [~zhz] for giving more details. Now, I've modified the utility method as {{StripedBlockUtil#spaceConsumedByStripedBlock}}. Also, I've added few more tests to see different file sizes. Kindly review the changes! WebImageViewer need support file size calculation with striped blocks - Key: HDFS-7949 URL: https://issues.apache.org/jira/browse/HDFS-7949 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hui Zheng Assignee: Rakesh R Priority: Minor Attachments: HDFS-7949-001.patch, HDFS-7949-002.patch, HDFS-7949-003.patch, HDFS-7949-004.patch, HDFS-7949-005.patch, HDFS-7949-006.patch, HDFS-7949-007.patch The file size calculation should be changed when the blocks of the file are striped in WebImageViewer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8263) Erasure Coding: Test of fsck for EC files
GAO Rui created HDFS-8263: - Summary: Erasure Coding: Test of fsck for EC files Key: HDFS-8263 URL: https://issues.apache.org/jira/browse/HDFS-8263 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8226) Non-HA rollback compatibility broken
[ https://issues.apache.org/jira/browse/HDFS-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513598#comment-14513598 ] Hadoop QA commented on HDFS-8226: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 2m 59s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 2m 54s | Site still builds. | | {color:blue}0{color} | shellcheck | 2m 54s | Shellcheck was not available. | | | | 6m 28s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728323/HDFS-8226.2.patch | | Optional Tests | shellcheck site | | git revision | trunk / 618ba70 | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10406/console | This message was automatically generated. Non-HA rollback compatibility broken Key: HDFS-8226 URL: https://issues.apache.org/jira/browse/HDFS-8226 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: J.Andreina Assignee: J.Andreina Priority: Blocker Attachments: HDFS-8226.1.patch, HDFS-8226.2.patch In HA while performing rollback , we use “hdfs namenode –rollback” which would prompt user for confirmation. ( Implemented as part of HDFS-5138) For Non-HA , as per doc if we perform rollback using “start-dfs.sh –rollback” , then namenode startup hangs ( As it tries to start namenode in daemon mode , hence will not be able to prompt user for confirmation ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8264) Erasure Coding: Test of version update
GAO Rui created HDFS-8264: - Summary: Erasure Coding: Test of version update Key: HDFS-8264 URL: https://issues.apache.org/jira/browse/HDFS-8264 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui When implementing version update, fsimage, fseditlg,conflict Block ID should be took care of. This jira tests these issues during version update process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8265) Erasure Coding: Test of Quota calculation for EC files
GAO Rui created HDFS-8265: - Summary: Erasure Coding: Test of Quota calculation for EC files Key: HDFS-8265 URL: https://issues.apache.org/jira/browse/HDFS-8265 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513697#comment-14513697 ] Akira AJISAKA commented on HDFS-5574: - Hi [~decster], would you replace {{DFS_CLIENT_READ_SHORTCIRCUIT_KEY}} with {{HdfsClientConfigKeys.Read.ShortCircuit.KEY}} in TestDFSInputStream? That will fix the javac warning. For checkstyle warnings, would you make {{FSInputChecker#readAndDiscard}} final and adding {{@return}}, {{@param}}, and {{@throws}} tags in javadoc? {code} error line=222 severity=error message=Expected an @return tag. source=com.puppycrawl.tools.checkstyle.checks.javadoc.JavadocMethodCheck/ error line=222 column=3 severity=error message=Method apos;readAndDiscardapos; is not designed for extension - needs to be abstract, final or empty. source=com.puppycrawl.tools.checkstyle.checks.design.DesignForExtensionCheck/ error line=222 column=45 severity=error message=Parameter len should be final. source=com.puppycrawl.tools.checkstyle.checks.FinalParametersCheck/ error line=222 column=49 severity=error message=Expected @param tag for apos;lenapos;. source=com.puppycrawl.tools.checkstyle.checks.javadoc.JavadocMethodCheck/ error line=222 column=61 severity=error message=Expected @throws tag for apos;IOExceptionapos;. source=com.puppycrawl.tools.checkstyle.checks.javadoc.JavadocMethodCheck/ {code} Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8242) Erasure Coding: XML based end-to-end test for ECCli commands
[ https://issues.apache.org/jira/browse/HDFS-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513579#comment-14513579 ] Vinayakumar B commented on HDFS-8242: - Nice work [~rakeshr]. bq. Should we test creating a zone, create a file in the zone, and get EC info from that file (not that dir)? Currently, CLI command has not been added to get the ECInfo for a file, but, {{getZoneInfo}} command with the file as path gets the corresponding zonal dir's schema. Which is nothing but the file's schema itself. May be we can have a test for this. bq. What's the purpose of the following line? expCmd = expCmd.replaceAll(#LF#, System.getProperty(line.separator)); This is to replace the line separator in the expected output with the system's line separator. This is useful for the multi-line output validations. Which may fail based on env otherwise. Following is the example for expected output taken from testAclCLI.xml, after which #LF# will be replaced with CRLF/LF. {code}expected-output# file: /dir1#LF## owner: USERNAME#LF## group: supergroup#LF#user::rwx#LF#user:charlie:r-x#LF#group::r-x#LF#group:admin:rwx#LF#mask::rwx#LF#other::r-x#LF##LF## file: /dir1/dir2#LF## owner: USERNAME#LF## group: supergroup#LF#user::rwx#LF#user:user1:r-x#LF#group::r-x#LF#group:users:rwx#LF#mask::rwx#LF#other::r-x#LF##LF#/expected-output{code} Nits: 1. For each and every test, which modifies the system, such as {{createZone}}, we can have a cleanup command as well which reverts the state to previous. Because all tests will be run on same cluster, without any restart of cluster. ex: in current patch, second {{createZone}} command fails with zone already exists. Still test passes as no validation for this. 2. One more test for {{createZone}} without specifying the schema can be added, and later schema can be verified for this. should have default schema. Erasure Coding: XML based end-to-end test for ECCli commands Key: HDFS-8242 URL: https://issues.apache.org/jira/browse/HDFS-8242 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8242-001.patch, HDFS-8242-002.patch This JIRA to add test cases with CLI test f/w for the commands present in {{ECCli}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8015) Erasure Coding: local and remote block writer for coding work in DataNode
[ https://issues.apache.org/jira/browse/HDFS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513594#comment-14513594 ] Yi Liu commented on HDFS-8015: -- Hi Bo, for the striped block recovery in HDFS-7348, we don't need to a block writer, it's convenient to send packet directly. I will update an initial patch in that JIRA later today. Erasure Coding: local and remote block writer for coding work in DataNode - Key: HDFS-8015 URL: https://issues.apache.org/jira/browse/HDFS-8015 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Li Bo As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure coding, to perform encoding or decoding, we need to be able to write data blocks locally or remotely. This is to come up block writer facility in DataNode side. Better to think about the similar work done in client side, so in future it's possible to unify the both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8241) Remove unused Namenode startup option FINALIZE
[ https://issues.apache.org/jira/browse/HDFS-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513639#comment-14513639 ] Hadoop QA commented on HDFS-8241: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 7m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 170m 46s | Tests passed in hadoop-hdfs. | | | | 219m 4s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728302/HDFS-8241.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1a2459b | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10403/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10403/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10403/console | This message was automatically generated. Remove unused Namenode startup option FINALIZE - Key: HDFS-8241 URL: https://issues.apache.org/jira/browse/HDFS-8241 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Attachments: HDFS-8241.patch Command : hdfs namenode -finalize 15/04/24 22:26:23 INFO namenode.NameNode: createNameNode [-finalize] *Use of the argument 'FINALIZE' is no longer supported.* To finalize an upgrade, start the NN and then run `hdfs dfsadmin -finalizeUpgrade' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7949) WebImageViewer need support file size calculation with striped blocks
[ https://issues.apache.org/jira/browse/HDFS-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7949: --- Attachment: HDFS-7949-007 WebImageViewer need support file size calculation with striped blocks - Key: HDFS-7949 URL: https://issues.apache.org/jira/browse/HDFS-7949 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Hui Zheng Assignee: Rakesh R Priority: Minor Attachments: HDFS-7949-001.patch, HDFS-7949-002.patch, HDFS-7949-003.patch, HDFS-7949-004.patch, HDFS-7949-005.patch, HDFS-7949-006.patch, HDFS-7949-007 The file size calculation should be changed when the blocks of the file are striped in WebImageViewer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8015) Erasure Coding: local and remote block writer for coding work in DataNode
[ https://issues.apache.org/jira/browse/HDFS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8015: Attachment: HDFS-8015-000.patch Patch 000 is an initial version showing the implementation of BlockWriter used in Datanode side. Erasure Coding: local and remote block writer for coding work in DataNode - Key: HDFS-8015 URL: https://issues.apache.org/jira/browse/HDFS-8015 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Li Bo Attachments: HDFS-8015-000.patch As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure coding, to perform encoding or decoding, we need to be able to write data blocks locally or remotely. This is to come up block writer facility in DataNode side. Better to think about the similar work done in client side, so in future it's possible to unify the both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8015) Erasure Coding: local and remote block writer for coding work in DataNode
[ https://issues.apache.org/jira/browse/HDFS-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513671#comment-14513671 ] Li Bo commented on HDFS-8015: - hi, Yi I think there’re several ways to handle writing a decoded block to remote or local. My idea is that first get a domain socket which is bound and listened by {{DataNode#localDataXCeriverServer}}, then write data via the output stream of this socket. The advantage is that we don’t need to handle the details of block writing. The next step is to extend {{BlockReceiver}}. Currently it writes to local disk and may also write to remote. We can add a switcher to control its writing directions, i.e, only local, only remote, remote + local. We can discuss this issue after your first patch is ready. Erasure Coding: local and remote block writer for coding work in DataNode - Key: HDFS-8015 URL: https://issues.apache.org/jira/browse/HDFS-8015 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Li Bo Attachments: HDFS-8015-000.patch As a task of HDFS-7344 ECWorker, in either stripping or non-stripping erasure coding, to perform encoding or decoding, we need to be able to write data blocks locally or remotely. This is to come up block writer facility in DataNode side. Better to think about the similar work done in client side, so in future it's possible to unify the both. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8267) Erasure Coding: Test of Namenode with EC files
GAO Rui created HDFS-8267: - Summary: Erasure Coding: Test of Namenode with EC files Key: HDFS-8267 URL: https://issues.apache.org/jira/browse/HDFS-8267 Project: Hadoop HDFS Issue Type: Test Affects Versions: HDFS-7285 Reporter: GAO Rui 1. Namenode startup with EC: 1.1. Safemode 1.2. BlockReport 2. Namenode HA with EC: 2.1. Fsimage and editlog test 2.2. Hot restart and recovery of Active NameNode after failure 2.3. Hot restart and recovery of Standby NameNode after failure 2.4. Restart and recovery of both Active and Standby NameNode fail in the same time -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8108) Fsck should provide the info on mandatory option to be used along with -blocks , -locations and -racks
[ https://issues.apache.org/jira/browse/HDFS-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513731#comment-14513731 ] Hadoop QA commented on HDFS-8108: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 9s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. | | {color:green}+1{color} | javac | 7m 44s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 35s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 9s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 23s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 167m 52s | Tests passed in hadoop-hdfs. | | | | 215m 32s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728311/HDFS-8108.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 618ba70 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/10405/artifact/patchprocess/whitespace.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10405/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10405/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10405/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10405/console | This message was automatically generated. Fsck should provide the info on mandatory option to be used along with -blocks , -locations and -racks Key: HDFS-8108 URL: https://issues.apache.org/jira/browse/HDFS-8108 Project: Hadoop HDFS Issue Type: Improvement Reporter: J.Andreina Assignee: J.Andreina Priority: Trivial Attachments: HDFS-8108.1.patch, HDFS-8108.2.patch Fsck usage information should provide the information on which options are mandatory to be passed along with -blocks , -locations and -racks to be in sync with documentation. For example : To get information on: 1. Blocks (-blocks), option -files should also be used. 2. Rack information (-racks), option -files and -blocks should also be used. {noformat} ./hdfs fsck -files -blocks ./hdfs fsck -files -blocks -racks {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8112) Enforce authorization policy to protect administration operations for EC zone and schemas
[ https://issues.apache.org/jira/browse/HDFS-8112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513943#comment-14513943 ] Rakesh R commented on HDFS-8112: As per the discussion with [~drankye], the idea of this task is to revisit all the EC command/API operations and refine this aspect once the whole feature is solid. There could be cases where some operations may be good available for non-super admin users. Enforce authorization policy to protect administration operations for EC zone and schemas - Key: HDFS-8112 URL: https://issues.apache.org/jira/browse/HDFS-8112 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Rakesh R We should allow to enforce authorization policy to protect administration operations for EC zone and schemas as such behaviors would impact too much for a system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513990#comment-14513990 ] Brahma Reddy Battula commented on HDFS-7397: [~cmccabe] and [~qwertymaniac] thanks for your inputs..Agreed for update of description.. Attached for the patch..kindly review.. The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7674) Adding metrics for Erasure Coding
[ https://issues.apache.org/jira/browse/HDFS-7674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514081#comment-14514081 ] Kai Zheng commented on HDFS-7674: - Hi [~walter.k.su], You mentioned a good case. Do you want to work on this? If so please take it. I discussed offline with Wei, he may need more time to be ready with this. Thanks. Adding metrics for Erasure Coding - Key: HDFS-7674 URL: https://issues.apache.org/jira/browse/HDFS-7674 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Wei Zhou As the design (in HDFS-7285) indicates, erasure coding involves non-trivial impact and workload for NameNode, DataNode and client; it also allows configurable and pluggable erasure codec and schema with flexible tradeoff options (see HDFS-7337). To support necessary analysis and adjustment, we'd better have various meaningful metrics for the EC support, like encoding/decoding tasks, recovered blocks, read/transferred data size, computation time and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8268) Port conflict log for data node server is not sufficient
Mohammad Shahid Khan created HDFS-8268: -- Summary: Port conflict log for data node server is not sufficient Key: HDFS-8268 URL: https://issues.apache.org/jira/browse/HDFS-8268 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.0, 2.8.0 Environment: x86_64 x86_64 x86_64 GNU/Linux Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan Priority: Minor Data Node Server start up issue due to port conflict. The data node server port dfs.datanode.http.address conflict is not sufficient to identify the reason of failure. The exception log by the server is as below *Actual:* 2015-04-27 16:48:53,960 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:844) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:194) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:340) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) at java.lang.Thread.run(Thread.java:745) *_The above log does not contain the information of the conflicting port._* *Expected output:* java.net.BindException: Problem binding to [0.0.0.0:50075] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:721) at org.apache.hadoop.hdfs.server.datanode.web.DatanodeHttpServer.start(DatanodeHttpServer.java:160) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:795) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1142) at org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:439) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2420) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2298) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2349) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2540) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2564) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:437) at sun.nio.ch.Net.bind(Net.java:429) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:475) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1021) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:455) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:440) at
[jira] [Updated] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7397: --- Attachment: HDFS-7397.patch The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7397: --- Status: Patch Available (was: Open) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7613) Block placement policy for erasure coding groups
[ https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514046#comment-14514046 ] Junping Du commented on HDFS-7613: -- Thanks guys for good discussions above. For support for multiple placement policies, there is an old JIRA HDFS-4894 filed for a long time. Shall we reuse that for discussion and work? Block placement policy for erasure coding groups Key: HDFS-7613 URL: https://issues.apache.org/jira/browse/HDFS-7613 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Walter Su Attachments: HDFS-7613.001.patch Blocks in an erasure coding group should be placed in different failure domains -- different DataNodes at the minimum, and different racks ideally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7859: --- Attachment: HDFS-7859-HDFS-7285.002.patch same patch, but uploaded with the branch name to make test-patch.sh kick off with that branch instead of trunk. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514233#comment-14514233 ] Allen Wittenauer commented on HDFS-7859: (now we just need a submit button. lol) Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515845#comment-14515845 ] Hadoop QA commented on HDFS-8232: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 27s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 38s | The applied patch generated 2 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 4s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 183m 45s | Tests passed in hadoop-hdfs. | | | | 232m 28s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728471/hdfs-8232.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6bae596 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10416/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10416/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10416/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10416/console | This message was automatically generated. Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8235) Create DFSStripedInputStream in DFSClient#open
[ https://issues.apache.org/jira/browse/HDFS-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated HDFS-8235: - Attachment: HDFS-8235.3.patch Create DFSStripedInputStream in DFSClient#open -- Key: HDFS-8235 URL: https://issues.apache.org/jira/browse/HDFS-8235 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Kai Sasaki Attachments: HDFS-8235.1.patch, HDFS-8235.2.patch, HDFS-8235.3.patch Currently DFSClient#open can only create a DFSInputStream object. It should also support DFSStripedInputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516053#comment-14516053 ] Colin Patrick McCabe commented on HDFS-7923: Thanks, [~clamb]. I like this approach. It avoids sending the block report until the NN requests it. So we don't have to throw away a whole block report to achieve backpressure. {code} public static final String DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_KEY = dfs.namenode.max.concurrent.block.reports; public static final int DFS_NAMENODE_MAX_CONCURRENT_BLOCK_REPORTS_DEFAULT = Integer.MAX_VALUE; {code} It seems like this should default to something less than the default number of RPC handler threads, not to MAX_INT. Given that dfs.namenode.handler.count = 10, it seems like this should be no more than 5 or 6, right? The main point here to avoid having the NN handler threads completely choked with block reports, and that is defeated if the value is MAX_INT. I realize that you probably intended this to be configured. But it seems like we should have a reasonable default that works for most people. {code} --- hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto +++ hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto @@ -195,6 +195,7 @@ message HeartbeatRequestProto { optional uint64 cacheCapacity = 6 [ default = 0 ]; optional uint64 cacheUsed = 7 [default = 0 ]; optional VolumeFailureSummaryProto volumeFailureSummary = 8; + optional bool requestSendFullBlockReport = 9; } {code} Let's have a {{\[default = false\]}} here so that we don't have to add a bunch of clunky {{HasFoo}} checks. Unless there is something we'd like to do differently in the false and not present cases, but I can't think of what that would be. {code} + /* Number of block reports currently being processed. */ + private final AtomicInteger blockReportProcessingCount = new AtomicInteger(0); {code} I'm not sure an {{AtomicInteger}} makes sense here. We only modify this variable (write to it) when holding the FSN lock in write mode, right? And we only read from it when holding the FSN in read mode. So, there isn't any need to add atomic ops. {code} + boolean okToSendFullBlockReport = true; + if (requestSendFullBlockReport + blockManager.getBlockReportProcessingCount() = + maxConcurrentBlockReports) { +/* See if we should tell DN to back off for a bit. */ +final long lastBlockReportTime = blockManager.getDatanodeManager(). +getDatanode(nodeReg).getLastBlockReportTime(); +if (lastBlockReportTime 0) { + /* We've received at least one block report. */ + final long msSinceLastBlockReport = now() - lastBlockReportTime; + if (msSinceLastBlockReport maxBlockReportDeferralMsec) { +/* It hasn't been long enough to allow a BR to pass through. */ +okToSendFullBlockReport = false; + } +} + } + return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo, + okToSendFullBlockReport); {code} There is a TOCTOU (time of check, time of use) race condition here, right? 1000 datanodes come in and ask me whether it's ok to send an FBR. In each case, I check the number of ongoing FBRs, which is 0, and say yes. Then 1000 FBRs arrive all at once and the NN melts down. I think we need to track which datanodes we gave the green light to, and not decrement the counter until they either send that report, or some timeout expires. (We need the timeout in case datanodes go away after requesting permission-to-send.) The timeout can probably be as short as a few minutes. If you can't manage to send an FBR in a few minutes, there's more problems going on. {code} public static final String DFS_BLOCKREPORT_MAX_DEFER_MSEC_KEY = dfs.blockreport.max.deferMsec; public static final longDFS_BLOCKREPORT_MAX_DEFER_MSEC_DEFAULT = Long.MAX_VALUE; {code} Do we really need this config key? It seems like we added it because we wanted to avoid starvation (i.e. the case where a given DN never gets given the green light). But we are maintaining the last FBR time for each DN anyway. Surely we can just have a TreeMap or something and just tell the guys with the oldest {{lastSentTime}} to go. There aren't an infinite number of datanodes in the cluster, so eventually everyone will get the green light. I really would prefer not to have this tunable at all, since I think it's unnecessary. In any case, it's certainly doing us no good as MAX_U64. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project:
[jira] [Commented] (HDFS-8143) HDFS Mover tool should exit after some retry when failed to move blocks.
[ https://issues.apache.org/jira/browse/HDFS-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515172#comment-14515172 ] Tsz Wo Nicholas Sze commented on HDFS-8143: --- - If the mover exit due to failure, it should exit with an error state like balancer. - moverFailedRetryCounter should count for consecutive failures. If there is no failure in an iteration, it should reset the count. - Please add a test. HDFS Mover tool should exit after some retry when failed to move blocks. Key: HDFS-8143 URL: https://issues.apache.org/jira/browse/HDFS-8143 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Priority: Blocker Attachments: HDFS-8143.patch Mover is not coming out in case of failed to move blocks. {code} hasRemaining |= Dispatcher.waitForMoveCompletion(storages.targets.values()); {code} {{Dispatcher.waitForMoveCompletion()}} will always return true if some blocks migration failed. So hasRemaining never become false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7588) Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs and uploading files
[ https://issues.apache.org/jira/browse/HDFS-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-7588: --- Status: Open (was: Patch Available) Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs and uploading files --- Key: HDFS-7588 URL: https://issues.apache.org/jira/browse/HDFS-7588 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7588.01.patch, HDFS-7588.02.patch The new HTML5 web browser is neat, however it lacks a few features that might make it more useful: 1. chown 2. chmod 3. Uploading files 4. mkdir -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7588) Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs and uploading files
[ https://issues.apache.org/jira/browse/HDFS-7588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515173#comment-14515173 ] Ravi Prakash commented on HDFS-7588: Hi Allen! I converted this JIRA into an umbrella issue to effect this change piecemeal. That patch no longer applies Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs and uploading files --- Key: HDFS-7588 URL: https://issues.apache.org/jira/browse/HDFS-7588 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-7588.01.patch, HDFS-7588.02.patch The new HTML5 web browser is neat, however it lacks a few features that might make it more useful: 1. chown 2. chmod 3. Uploading files 4. mkdir -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8235) Create DFSStripedInputStream in DFSClient#open
[ https://issues.apache.org/jira/browse/HDFS-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515833#comment-14515833 ] Jing Zhao commented on HDFS-8235: - The 002 patch looks good to me. Some minors: # We can add a preconditions.checkArgument or assert in {{DFSStripedInputStream}} to make sure the ecinfo is not null # In the test, we should use {{FileSystem#open}} to *replace* all the {{new DFSStripedInputStream(...)}}. Thus we do not need to repeat the same tests in {{TestDFSStripedInputStream}}. # Also we can delete the constructor {{DFSStripedInputStream(DFSClient dfsClient, String src, boolean verifyChecksum)}}. Create DFSStripedInputStream in DFSClient#open -- Key: HDFS-8235 URL: https://issues.apache.org/jira/browse/HDFS-8235 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Kai Sasaki Attachments: HDFS-8235.1.patch, HDFS-8235.2.patch Currently DFSClient#open can only create a DFSInputStream object. It should also support DFSStripedInputStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8204) Mover/Balancer should not schedule two replicas to the same DN
[ https://issues.apache.org/jira/browse/HDFS-8204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515122#comment-14515122 ] Tsz Wo Nicholas Sze commented on HDFS-8204: --- {code} if (source.getDatanodeInfo().equals(targetDatanode)) { // the block is moved inside same DN return true; } {code} It seems that the DataTransferProtocol currently does not support moving within the same node since it will get ReplicaAlreadyExistsException when creating the new block. Do you agree? Mover/Balancer should not schedule two replicas to the same DN -- Key: HDFS-8204 URL: https://issues.apache.org/jira/browse/HDFS-8204 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8204.001.patch, HDFS-8204.002.patch, HDFS-8204.003.patch Balancer moves blocks between Datanode(Ver. 2.6 ). Balancer moves blocks between StorageGroups ( introduced by HDFS-6584) , in the new version(Ver. =2.6) . function {code} class DBlock extends LocationsStorageGroup DBlock.isLocatedOn(StorageGroup loc) {code} -is flawed, may causes 2 replicas ends in same node after running balance.- For example: We have 2 nodes. Each node has two storages. We have (DN0, SSD), (DN0, DISK), (DN1, SSD), (DN1, DISK). We have a block with ONE_SSD storage policy. The block has 2 replicas. They are in (DN0,SSD) and (DN1,DISK). Replica in (DN0,SSD) should not be moved to (DN1,SSD) after running Balancer. Otherwise DN1 has 2 replicas. -- UPDATE(Thanks [~szetszwo] for pointing it out): {color:red} This bug will *NOT* causes 2 replicas end in same node after running balance, thanks to Datanode rejecting it. {color} We see a lot of ERROR when running test. {code} 2015-04-27 10:08:15,809 ERROR datanode.DataNode (DataXceiver.java:run(277)) - host1.foo.com:59537:DataXceiver error processing REPLACE_BLOCK operation src: /127.0.0.1:52532 dst: /127.0.0.1:59537 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-264794661-9.96.1.34-1430100451121:blk_1073741825_1001 already exists in state FINALIZED and thus cannot be created. at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1447) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:186) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.replaceBlock(DataXceiver.java:1158) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReplaceBlock(Receiver.java:229) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:722) {code} The Balancer runs 5~20 times iterations in the test, before it exits. It's ineffecient. Balancer should not *schedule* it in the first place, even though it'll failed anyway. In the test, it should exit after 5 times iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-8232: Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I have committed this to trunk and branch-2. Anu, thank you for contributing the patch. Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8232) Missing datanode counters when using Metrics2 sink interface
[ https://issues.apache.org/jira/browse/HDFS-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515982#comment-14515982 ] Hudson commented on HDFS-8232: -- FAILURE: Integrated in Hadoop-trunk-Commit #7691 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7691/]) HDFS-8232. Missing datanode counters when using Metrics2 sink interface. Contributed by Anu Engineer. (cnauroth: rev feb68cb5470dc3e6c16b6bc1549141613e360601) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/FSDatasetMBean.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetricHelper.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/extdataset/ExternalDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeFSDataSetSink.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Missing datanode counters when using Metrics2 sink interface Key: HDFS-8232 URL: https://issues.apache.org/jira/browse/HDFS-8232 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: hdfs-8232.001.patch, hdfs-8232.002.patch When using the Metric2 Sink interface none of the counters declared under Dataanode:FSDataSetBean are visible. They are visible if you use JMX or if you do http://host:port/jmx. Expected behavior is that they be part of Sink interface and accessible in the putMetrics call back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8273) logSync() is called inside of write lock for delete op
Jing Zhao created HDFS-8273: --- Summary: logSync() is called inside of write lock for delete op Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8272) Erasure Coding: simplify the retry logic in DFSStripedInputStream
Jing Zhao created HDFS-8272: --- Summary: Erasure Coding: simplify the retry logic in DFSStripedInputStream Key: HDFS-8272 URL: https://issues.apache.org/jira/browse/HDFS-8272 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Currently in DFSStripedInputStream the retry logic is still the same with DFSInputStream. More specifically, every failed read will try to search for another source node. And an exception is thrown when no new source node can be identified. This logic is not appropriate for EC inputstream and can be simplified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7786) Handle slow writers for DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7786: Assignee: Keisuke Ogiwara (was: Li Bo) Handle slow writers for DFSStripedOutputStream -- Key: HDFS-7786 URL: https://issues.apache.org/jira/browse/HDFS-7786 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Keisuke Ogiwara Fix For: HDFS-7285 The streamers in DFSStripedOutputStream may have different write speed. We need to consider and handle the situation when one or more streamers begin to write slowly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8273) logSync() is called inside of write lock for delete op
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8273: - Status: Patch Available (was: Open) logSync() is called inside of write lock for delete op -- Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8273.000.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-7859: -- Status: Patch Available (was: In Progress) Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8248) Store INodeId instead of the INodeFile object in BlockInfoContiguous
[ https://issues.apache.org/jira/browse/HDFS-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516164#comment-14516164 ] Hadoop QA commented on HDFS-8248: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 36s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 58s | The applied patch generated 4 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 12s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 185m 0s | Tests failed in hadoop-hdfs. | | | | 229m 25s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.server.datanode.TestIncrementalBrVariations | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728070/HDFS-8248.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 9fc32c5 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10419/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10419/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10419/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10419/console | This message was automatically generated. Store INodeId instead of the INodeFile object in BlockInfoContiguous Key: HDFS-8248 URL: https://issues.apache.org/jira/browse/HDFS-8248 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8248.000.patch, HDFS-8248.001.patch Currently the namespace and the block manager are tightly coupled together. There are two couplings in terms of implementation: 1. The {{BlockInfoContiguous}} stores a reference of the {{INodeFile}} that owns the block, so that the block manager can look up the corresponding file when replicating blocks, recovering from pipeline failures, etc. 1. The {{INodeFile}} stores {{BlockInfoContiguous}} objects that the file owns. Decoupling the namespace and the block manager allows the BM to be separated out from the Java heap or even as a standalone process. This jira proposes to remove the first coupling by storing the id of the inode instead of the object reference of {{INodeFile}} in the {{BlockInfoContiguous}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists
[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516215#comment-14516215 ] J.Andreina commented on HDFS-8270: -- I would like to take up this issue Please let me know if you are already started on this create() always retried with hardcoded timeout when file already exists --- Key: HDFS-8270 URL: https://issues.apache.org/jira/browse/HDFS-8270 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Andrey Stepachev In Hbase we stumbled on unexpected behaviour, which could break things. HDFS-6478 fixed wrong exception translation, but that apparently led to unexpected bahaviour: clients trying to create file without override=true will be forced to retry hardcoded amount of time (60 seconds). That could break or slowdown systems, that use filesystem for locks (like hbase fsck did, and we got it broken HBASE-13574). We should make this behaviour configurable, do client really need to wait lease timeout to be sure that file doesn't exists, or it it should be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516274#comment-14516274 ] Zhe Zhang commented on HDFS-8220: - [~libo-intel] is probably most familiar with this part of code. Bo do you mind reviewing the patch? Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516282#comment-14516282 ] Hadoop QA commented on HDFS-8078: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 32s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 7m 38s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 41s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 43s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 189m 16s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 16s | Tests passed in hadoop-hdfs-client. | | | | 238m 36s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | Timed out tests | org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728606/HDFS-8078.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 9fc32c5 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10420/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10420/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10420/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10420/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10420/console | This message was automatically generated. HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 Attachments: HDFS-8078.4.patch, HDFS-8078.5.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-7980: Attachment: HDFS-7980.003.patch Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-7980: Status: Patch Available (was: Open) Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8213) DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace
[ https://issues.apache.org/jira/browse/HDFS-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516311#comment-14516311 ] Masatake Iwasaki commented on HDFS-8213: I agree to use independent config keys for DFSClient in order to make end-to-end tracing from HBase/Accumulo to HDFS work. Few comments for 001. --- In {{SpanReceiverHost#getInstance}}, {{loadSpanReceivers}} is called even if there is already initialized SRH instance. Is it intentional? {code} synchronized (SingletonHolder.INSTANCE.lock) { if (SingletonHolder.INSTANCE.host == null) { SingletonHolder.INSTANCE.host = new SpanReceiverHost(); } SingletonHolder.INSTANCE.host.loadSpanReceivers(conf, configPrefix); ShutdownHookManager.get().addShutdownHook(new Runnable() { public void run() { SingletonHolder.INSTANCE.host.closeReceivers(); } }, 0); return SingletonHolder.INSTANCE.host; {code} --- We need to fix {{TraceUtils#wrapHadoopConf}} which always assumes that prefix is hadoop.htrace.. {code} public class TraceUtils { public static final String HTRACE_CONF_PREFIX = hadoop.htrace.; {code} --- Should we add entry for {{hdfs.client.htrace.spanreceiver.classes}} to hdfs-default.xml? DFSClient should use hdfs.client.htrace HTrace configuration prefix rather than hadoop.htrace - Key: HDFS-8213 URL: https://issues.apache.org/jira/browse/HDFS-8213 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Billie Rinaldi Assignee: Colin Patrick McCabe Priority: Critical Attachments: HDFS-8213.001.patch DFSClient initializing SpanReceivers is a problem for Accumulo, which manages SpanReceivers through its own configuration. This results in the same receivers being registered multiple times and spans being delivered more than once. The documentation says SpanReceiverHost.getInstance should be issued once per process, so there is no expectation that DFSClient should do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7060) Avoid taking locks when sending heartbeats from the DataNode
[ https://issues.apache.org/jira/browse/HDFS-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516265#comment-14516265 ] Xinwei Qin commented on HDFS-7060: --- [~jnp] Thanks for your comment. This test was done before applying HDFS-7999. Avoid taking locks when sending heartbeats from the DataNode Key: HDFS-7060 URL: https://issues.apache.org/jira/browse/HDFS-7060 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Xinwei Qin Attachments: HDFS-7060-002.patch, HDFS-7060.000.patch, HDFS-7060.001.patch We're seeing the heartbeat is blocked by the monitor of {{FsDatasetImpl}} when the DN is under heavy load of writes: {noformat} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:115) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:91) - locked 0x000780612fd8 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:563) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:668) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:827) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:743) - waiting to lock 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:1006) at org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:59) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:244) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:195) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:753) - locked 0x000780304fb8 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:60) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:169) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:621) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232) at java.lang.Thread.run(Thread.java:744) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8183) Erasure Coding: Improve DFSStripedOutputStream closing of datastreamer threads
[ https://issues.apache.org/jira/browse/HDFS-8183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516268#comment-14516268 ] Zhe Zhang commented on HDFS-8183: - Thanks for working on this Rakesh. The patch LGTM. Only suggestion is to update the error message _Failed to shutdown streamer_ to indicate which streams fail to close. Please see if you want to add a test here. Otherwise we can take the next chance to test this change. Erasure Coding: Improve DFSStripedOutputStream closing of datastreamer threads -- Key: HDFS-8183 URL: https://issues.apache.org/jira/browse/HDFS-8183 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8183-001.patch The idea of this task is to improve closing of all the streamers. Presently if any of the streamer throws an exception, it will returning immediately. This leaves all the other streamer threads running. Instead its good to handle the exceptions of each streamer independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516309#comment-14516309 ] Allen Wittenauer commented on HDFS-7859: FYI, there are now two of these running: https://builds.apache.org/job/PreCommit-HDFS-Build/10424/console https://builds.apache.org/job/PreCommit-HDFS-Build/10425/console It's still churning through hadoop-hdfs unit tests on the one that [~xinwei] submitted earlier. hadoop-hdfs is one of the slowest set of unit tests we have. I have a hunch that you folks have added code in this branch which has made it even slower ... to the point that Jenkins will likely kill the test patch job before it finishes. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8201) Add an end to end test for stripping file writing and reading
[ https://issues.apache.org/jira/browse/HDFS-8201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516333#comment-14516333 ] Zhe Zhang commented on HDFS-8201: - Thanks for working on this Xinwei! Not sure if you have noticed, {{TestDFSStripedInputStream}} was added as part of HDFS-8136. It looks quite similar to this patch. Maybe we should cleanup and optimize that test under this JIRA? Add an end to end test for stripping file writing and reading - Key: HDFS-8201 URL: https://issues.apache.org/jira/browse/HDFS-8201 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8201.001.patch According to off-line discussion with [~zhz] and [~xinwei], we need to implement an end to end test for stripping file support: * Create an EC zone; * Create a file in the zone; * Write various typical sizes of content to the file, each size maybe a test method; * Read the written content back; * Compare the written content and read content to ensure it's good; The test facility is subject to add more steps for erasure encoding and recovering. Will open separate issue for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Attachment: (was: HDFS-8078.4.patch) HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Attachment: (was: HDFS-8078.5.patch) HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Status: Open (was: Patch Available) HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8269) getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime
[ https://issues.apache.org/jira/browse/HDFS-8269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8269: - Attachment: HDFS-8269.001.patch getBlockLocations() does not resolve the .reserved path and generates incorrect edit logs when updating the atime - Key: HDFS-8269 URL: https://issues.apache.org/jira/browse/HDFS-8269 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8269.000.patch, HDFS-8269.001.patch When {{FSNamesystem#getBlockLocations}} updates the access time of the INode, it uses the path passed from the client, which generates incorrect edit logs entries: {noformat} RECORD OPCODEOP_TIMES/OPCODE DATA TXID5085/TXID LENGTH0/LENGTH PATH/.reserved/.inodes/18230/PATH MTIME-1/MTIME ATIME1429908236392/ATIME /DATA /RECORD {noformat} Note that the NN does not resolve the {{/.reserved}} path when processing the edit log, therefore it eventually leads to a NPE when loading the edit logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7859) Erasure Coding: Persist EC schemas in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7859: Attachment: HDFS-7859-HDFS-7285.002.patch Submitting a duplicate patch to trigger Jenkins. Erasure Coding: Persist EC schemas in NameNode -- Key: HDFS-7859 URL: https://issues.apache.org/jira/browse/HDFS-7859 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, HDFS-7859.001.patch, HDFS-7859.002.patch In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we persist EC schemas in NameNode centrally and reliably, so that EC zones can reference them by name efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516284#comment-14516284 ] Walter Su commented on HDFS-7980: - 003 patch is ready for review. Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516142#comment-14516142 ] Brahma Reddy Battula commented on HDFS-7397: [~cmccabe] Thanks for the review.. Updated the patch based on your comment..Kindly review.. The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516205#comment-14516205 ] Kai Zheng commented on HDFS-7678: - The refined decoding API from HADOOP-11847: {noformat} /** * Decode with inputs and erasedIndexes, generates outputs. * How to prepare for inputs: * 1. Create an array containing parity units + data units; * 2. Set null in the array locations specified via erasedIndexes to indicate *they're erased and no data are to read from; * 3. Set null in the array locations for extra redundant items, as they're not *necessary to read when decoding. For example in RS-6-3, if only 1 unit *is really erased, then we have 2 extra items as redundant. They can be *set as null to indicate no data will be used from them. * * For an example using RS (6, 3), assuming sources (d0, d1, d2, d3, d4, d5) * and parities (p0, p1, p2), d2 being erased. We can and may want to use only * 6 units like (d1, d3, d4, d5, p0, p2) to recover d2. We will have: * inputs = [p0, null(p1), p2, null(d0), d1, null(d2), d3, d4, d5] * erasedIndexes = [5] // index of d2 into inputs array * outputs = [a-writable-buffer] * * @param inputs inputs to read data from * @param erasedIndexes indexes of erased units into inputs array * @param outputs outputs to write into for data generated according to *erasedIndexes */ public void decode(ByteBuffer[] inputs, int[] erasedIndexes, ByteBuffer[] outputs); {noformat} The impact from the caller's point of view: * It prepares for the input buffers differently, using NULL to indicate not to read or erased; * It prepares for the {{erasedIndexes}} and output buffers differently, only really erased ones are to be taken care of. {{NativeRSRawDecoder}} will be coming out soon according to the refined APIs, and it will only compute/recover the really erased items. The using of it is the same with {{RSRawDecoder}}. Discussed off-line with [~zhz], it would be good to use the refined API here if appropriate. Sure we can also follow on separately later if necessary. Thanks. Erasure coding: DFSInputStream with decode functionality Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678.000.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8273) logSync() is called inside of write lock for delete op
[ https://issues.apache.org/jira/browse/HDFS-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8273: - Attachment: HDFS-8273.000.patch logSync() is called inside of write lock for delete op -- Key: HDFS-8273 URL: https://issues.apache.org/jira/browse/HDFS-8273 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Jing Zhao Assignee: Haohui Mai Priority: Blocker Attachments: HDFS-8273.000.patch HDFS-7573 moves the logSync call inside of the write lock by accident. We should move it out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8258) namenode shutdown strangely
[ https://issues.apache.org/jira/browse/HDFS-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516168#comment-14516168 ] Masatake Iwasaki commented on HDFS-8258: Thanks for reporting this, [~hwbj]. I thinks HDFS-7225 is fixes for this. namenode shutdown strangely Key: HDFS-8258 URL: https://issues.apache.org/jira/browse/HDFS-8258 Project: Hadoop HDFS Issue Type: Bug Reporter: wei.he Hi I use the hadoop2.5. The standby namenode shutdown after restart one of DNs . And I got the error with tail last logs. ... 2015-04-23 15:37:39,133 INFO BlockStateChange (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap updated: 192.168.146.223:50010 is added to blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} size 0 2015-04-23 15:37:39,137 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(873)) - BLOCK* registerDatanode: from DatanodeRegistration(192.168.146.130, datanodeUuid=ee834f1d-ba78-48de-bd7a-c364b67b535f, infoPort=50075, ipcPort=8010, storageInfo=lv=-55;cid=CID-64dc0d1c-f525-432b-9b28-2b92262d6111;nsid=740344496;c=0) storage ee834f1d-ba78-48de-bd7a-c364b67b535f 2015-04-23 15:37:39,138 INFO namenode.NameNode (DatanodeManager.java:registerDatanode(881)) - BLOCK* registerDatanode: 192.168.146.130:50010 2015-04-23 15:37:39,261 INFO net.NetworkTopology (NetworkTopology.java:remove(482)) - Removing a node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010 2015-04-23 15:37:39,262 INFO net.NetworkTopology (NetworkTopology.java:add(413)) - Adding a new node: /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010 2015-04-23 15:37:39,264 WARN namenode.FSNamesystem (FSNamesystem.java:getCorruptFiles(6775)) - Get corrupt file blocks returned error: Operation category READ is not supported in state standby 2015-04-23 15:37:39,264 FATAL blockmanagement.BlockManager (BlockManager.java:run(3390)) - ReplicationMonitor thread received Runtime exception. java.lang.NullPointerException at java.util.TreeMap.getEntry(TreeMap.java:342) at java.util.TreeMap.get(TreeMap.java:273) at org.apache.hadoop.hdfs.server.blockmanagement.InvalidateBlocks.invalidateWork(InvalidateBlocks.java:137) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateWorkForOneNode(BlockManager.java:3231) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeInvalidateWork(BlockManager.java:1191) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3431) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3375) at java.lang.Thread.run(Thread.java:744) 2015-04-23 15:37:39,267 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID DS-f0b04209-2f6a-491b-9f28-173c4c53d364 for DN 192.168.146.130:50010 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID DS-d255d5a4-4543-4621-b258-4c575843f29c for DN 192.168.146.130:50010 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID DS-4a88b36b-7ae6-4f30-b95c-0c4e47d70878 for DN 192.168.146.130:50010 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID DS-d4166bd7-a8c0-4067-8c68-78c6c31dcd9e for DN 192.168.146.130:50010 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID DS-468eeca6-e45c-428f-811a-71c5d1f04a9f for DN 192.168.146.130:50010 2015-04-23 15:37:39,269 INFO BlockStateChange (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap updated: 192.168.146.34:50010 is added to blk_1253664895_179969194 size 285087186 2015-04-23 15:37:39,271 INFO BlockStateChange (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap updated: 192.168.146.210:50010 is added to blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW], ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} size 0 2015-04-23 15:37:39,274 INFO BlockStateChange
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516183#comment-14516183 ] Hadoop QA commented on HDFS-7397: - (!) The patch artifact directory on has been removed! This is a fatal error for test-patch.sh. Aborting. Jenkins (node H4) information at https://builds.apache.org/job/PreCommit-HDFS-Build/10423/ may provide some hints. The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397-002.patch, HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions
[ https://issues.apache.org/jira/browse/HDFS-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516288#comment-14516288 ] Walter Su commented on HDFS-8037: - update patch against trunk. WebHDFS: CheckAccess silently accepts certain malformed FsActions - Key: HDFS-8037 URL: https://issues.apache.org/jira/browse/HDFS-8037 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jake Low Assignee: Walter Su Priority: Minor Labels: easyfix, newbie Attachments: HDFS-8037.001.patch, HDFS-8037.002.patch WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8037) WebHDFS: CheckAccess silently accepts certain malformed FsActions
[ https://issues.apache.org/jira/browse/HDFS-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8037: Attachment: HDFS-8037.002.patch WebHDFS: CheckAccess silently accepts certain malformed FsActions - Key: HDFS-8037 URL: https://issues.apache.org/jira/browse/HDFS-8037 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.6.0 Reporter: Jake Low Assignee: Walter Su Priority: Minor Labels: easyfix, newbie Attachments: HDFS-8037.001.patch, HDFS-8037.002.patch WebHDFS's {{CHECKACCESS}} operation accepts a parameter called {{fsaction}}, which represents the type(s) of access to check for. According to the documentation, and also the source code, the domain of {{fsaction}} is the set of strings matched by the regex {{\[rwx-\]{3\}}}. This domain is wider than the set of valid {{FsAction}} objects, because it doesn't guarantee sensible ordering of access types. For example, the strings {{rxw}} and {{--r}} are valid {{fsaction}} parameter values, but don't correspond to valid {{FsAction}} instances. The result is that WebHDFS silently accepts {{fsaction}} parameter values which don't match any valid {{FsAction}} instance, but doesn't actually perform any permissions checking in this case. For example, here's a {{CHECKACCESS}} call where we request {{rw-}} access on a file which we only have permission to read and execute. It raises an exception, as it should. {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-x; HTTP/1.1 403 Forbidden Content-Type: application/json { RemoteException: { exception: AccessControlException, javaClassName: org.apache.hadoop.security.AccessControlException, message: Permission denied: user=nobody, access=READ_WRITE, inode=\\/myfile\:root:supergroup:drwxr-xr-x } } {code} But if we instead request {{r-w}} access, the call appears to succeed: {code:none} curl -i -X GET http://localhost:50070/webhdfs/v1/myfile?op=CHECKACCESSuser.name=nobodyfsaction=r-w; HTTP/1.1 200 OK Content-Length: 0 {code} As I see it, the fix would be to change the regex pattern in {{FsActionParam}} to something like {{\[r-\]\[w-\]\[x-\]}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8180) AbstractFileSystem Implementation for WebHdfs
[ https://issues.apache.org/jira/browse/HDFS-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santhosh G Nayak updated HDFS-8180: --- Attachment: HDFS-8180-2.patch 1. swebhdfs is an existing scheme, It will be more intuitive if we name AFS implementation of swebhdfs as {{SWebHdfs}}. Any thoughts? 2. Added the following entries in the core-default.xml, where AFS implementations are configured for most of the schemes. {code:xml} property namefs.AbstractFileSystem.webhdfs.impl/name valueorg.apache.hadoop.fs.WebHdfs/value descriptionThe FileSystem for webhdfs: uris./description /property property namefs.AbstractFileSystem.swebhdfs.impl/name valueorg.apache.hadoop.fs.SWebHdfs/value descriptionThe FileSystem for swebhdfs: uris./description /property {code} 3. Removed the {{FileSystem}} contract tests from the patch and added {{FileContext}} tests, as AFS implementation is required mainly to support {{FileContext}} APIs. It creates a {{MiniDFSCluster}} and runs the {{FileContextMainOperationsBaseTest}} tests. AbstractFileSystem Implementation for WebHdfs - Key: HDFS-8180 URL: https://issues.apache.org/jira/browse/HDFS-8180 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Affects Versions: 2.6.0 Reporter: Santhosh G Nayak Assignee: Santhosh G Nayak Labels: hadoop Attachments: HDFS-8180-1.patch, HDFS-8180-2.patch Add AbstractFileSystem implementation for WebHdfs to support FileContext APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Status: Patch Available (was: Open) HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 Attachments: HDFS-8078.6.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nate Edel updated HDFS-8078: Attachment: HDFS-8078.6.patch Fix likely cause of checkstyle error, requeue for ND tests. HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: ipv6 Attachments: HDFS-8078.6.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514264#comment-14514264 ] Hadoop QA commented on HDFS-5574: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 30s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 5m 26s | The applied patch generated 1 additional checkstyle issues. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 43s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | common tests | 24m 21s | Tests passed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 184m 14s | Tests failed in hadoop-hdfs. | | | | 252m 50s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestDatanodeManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728364/HDFS-5574.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5e67c4d | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10409/artifact/patchprocess/checkstyle-result-diff.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10409/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10409/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10409/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10409/console | This message was automatically generated. Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8214) Secondary NN Web UI shows wrong date for Last Checkpoint
[ https://issues.apache.org/jira/browse/HDFS-8214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514367#comment-14514367 ] Charles Lamb commented on HDFS-8214: The test failure is spurious. I ran the failed test (TestDiskspaceQuotaUpdate) and it passed on my machine. The checkstyle warning is {quote} error line=56 column=3 severity=error message=Redundant apos;publicapos; modifier. source=com.puppycrawl.tools.checkstyle.checks.modifier.RedundantModifierCheck/ {quote} This is because I added the new getLastCheckpointDeltaMs() method. It is complaining about public being redundant. I could remove it, but keeping it there is maintain the existing style of other getters. Secondary NN Web UI shows wrong date for Last Checkpoint Key: HDFS-8214 URL: https://issues.apache.org/jira/browse/HDFS-8214 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, namenode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-8214.001.patch, HDFS-8214.002.patch SecondaryNamenode is using Time.monotonicNow() to display Last Checkpoint in the web UI. This causes weird times, generally, just after the epoch, to be displayed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7397) The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading
[ https://issues.apache.org/jira/browse/HDFS-7397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514375#comment-14514375 ] Hadoop QA commented on HDFS-7397: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 194m 27s | Tests failed in hadoop-hdfs. | | | | 231m 59s | | \\ \\ || Reason || Tests || | Timed out tests | org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12728384/HDFS-7397.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / 5e67c4d | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10410/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10410/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10410/console | This message was automatically generated. The conf key dfs.client.read.shortcircuit.streams.cache.size is misleading Key: HDFS-7397 URL: https://issues.apache.org/jira/browse/HDFS-7397 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Brahma Reddy Battula Priority: Minor Attachments: HDFS-7397.patch For dfs.client.read.shortcircuit.streams.cache.size, is it in MB or KB? Interestingly, it is neither in MB nor KB. It is the number of shortcircuit streams. -- This message was sent by Atlassian JIRA (v6.3.4#6332)