[jira] [Commented] (HDFS-9149) Consider multi datacenter when sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941100#comment-14941100 ] He Tianyi commented on HDFS-9149: - I think that's a good point [~hexiaoqiao]. One simple idea is generalizes {{getWeight}} into a function that calculates distance between two locations (more like {{getDistance}}), regardless of the meaning of each hierarchy. The only thing is that, I'm not aware why did {{getWeight}} designed to be like this in the first place, i.e. whether there is some particular concern. Does someone know the idea behind this design choice? > Consider multi datacenter when sortByDistance > - > > Key: HDFS-9149 > URL: https://issues.apache.org/jira/browse/HDFS-9149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Tianyi > > {{sortByDistance}} doesn't consider multi-datacenter when read data, so there > my be reading data via other datacenter when hadoop deployment with multi-IDC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9149) Consider multi datacenter when sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Tianyi reassigned HDFS-9149: --- Assignee: He Tianyi > Consider multi datacenter when sortByDistance > - > > Key: HDFS-9149 > URL: https://issues.apache.org/jira/browse/HDFS-9149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Tianyi > > {{sortByDistance}} doesn't consider multi-datacenter when read data, so there > my be reading data via other datacenter when hadoop deployment with multi-IDC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940929#comment-14940929 ] Rakesh R commented on HDFS-9185: Note: It looks like test case failures are not related to the patch. [TestRecoverStripedFile|https://builds.apache.org/job/PreCommit-HDFS-Build/12769/testReport/org.apache.hadoop.hdfs/TestRecoverStripedFile/] case is consistently passing now. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9180) Update excluded DataNodes in DFSStripedOutputStream based on failures in data streamers
[ https://issues.apache.org/jira/browse/HDFS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941035#comment-14941035 ] Hadoop QA commented on HDFS-9180: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 21m 0s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 9m 1s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 13s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 11s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 5m 5s | The patch appears to introduce 7 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 40s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 239m 52s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 33s | Tests passed in hadoop-hdfs-client. | | | | 295m 12s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-client | | Failed unit tests | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestWriteReadStripedFile | | | hadoop.hdfs.TestDFSStripedOutputStream | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764700/HDFS-9180.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / fd026f5 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12771/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12771/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-client.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12771/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12771/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12771/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12771/console | This message was automatically generated. > Update excluded DataNodes in DFSStripedOutputStream based on failures in data > streamers > --- > > Key: HDFS-9180 > URL: https://issues.apache.org/jira/browse/HDFS-9180 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-9180.000.patch, HDFS-9180.001.patch > > > This is a TODO in HDFS-9040: based on the failures all the striped data > streamers hit, the DFSStripedOutputStream should keep a record of all the > DataNodes that should be excluded. > This jira will also fix several bugs in the DFSStripedOutputStream. Will > provide more details in the comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9189) javadoc jar contains full build, not just javadoc, making it really big
André Kelpe created HDFS-9189: - Summary: javadoc jar contains full build, not just javadoc, making it really big Key: HDFS-9189 URL: https://issues.apache.org/jira/browse/HDFS-9189 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.7.1, 2.6.1 Environment: For some reason the build of the javadoc jars includes all of the build including third party jars, class files and all sorts of other stuff making the jars really big (128MB). Reporter: André Kelpe -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9149) Consider multi datacenter when sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941243#comment-14941243 ] He Xiaoqiao commented on HDFS-9149: --- hi [~He Tianyi]], thank you for your comments. {quote} The only thing is that, I'm not aware why did getWeight designed to be like this in the first place, i.e. whether there is some particular concern. {quote} maybe there is no any particular concerns. From the original implemention of {{pseudoSortByDistance}} to [HDFS-6268|https://issues.apache.org/jira/browse/HDFS-6268] which is first time to restructure by {{SortByDistance}} there is no indication to consider multi-IDC scenario. {quote} One simple idea is generalizes getWeight into a function that calculates distance between two locations (more like getDistance), regardless of the meaning of each hierarchy. {quote} i think it could be simple and resonable to add if statement based on {{getWeight}}: {code:java} protected int getWeight(Node reader, Node node) { -// 0 is local, 1 is same rack, 2 is off rack +// 0 is local, 1 is same rack, 2 is same IDC, 3 is off IDC // Start off by initializing to off rack -int weight = 2; +int weight = 3; if (reader != null) { if (reader.equals(node)) { weight = 0; } else if (isOnSameRack(reader, node)) { weight = 1; + } else { +rParent = reader.getParent(); +nParent = node.getParent(); +if (null != rParent && null != nParent && isSameParent(rParent, nParent)) + weight = 2; } } return weight; {code} > Consider multi datacenter when sortByDistance > - > > Key: HDFS-9149 > URL: https://issues.apache.org/jira/browse/HDFS-9149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Tianyi > > {{sortByDistance}} doesn't consider multi-datacenter when read data, so there > my be reading data via other datacenter when hadoop deployment with multi-IDC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9149) Consider multi datacenter when sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941264#comment-14941264 ] He Tianyi commented on HDFS-9149: - Thanks, [~hexiaoqiao]. The simpler idea sounds good! But I'm not quite sure adding one if statement could cover all cases. We'll need to assume that grandparent represents a IDC node if we go with it, which does not always hold (since {{NetworkTopology}} did not imply that). e.g. I have a real scenario that location are configured like {{/DC/BUILDING/RACK/NODE}}. In this case, it is true that locality will happen to be better, but perhaps not better enough. > Consider multi datacenter when sortByDistance > - > > Key: HDFS-9149 > URL: https://issues.apache.org/jira/browse/HDFS-9149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Tianyi > > {{sortByDistance}} doesn't consider multi-datacenter when read data, so there > my be reading data via other datacenter when hadoop deployment with multi-IDC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9190) VolumeScanner throwing NPE while scanning suspect block.
[ https://issues.apache.org/jira/browse/HDFS-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-9190: - Summary: VolumeScanner throwing NPE while scanning suspect block. (was: VolumeScanner throwing NPE while scanning suspect blocks.) > VolumeScanner throwing NPE while scanning suspect block. > > > Key: HDFS-9190 > URL: https://issues.apache.org/jira/browse/HDFS-9190 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Priority: Critical > > Volume scanner NPEs while scanning suspect Block. > Following is the stack trace: > {noformat} > 2015-10-02 06:45:30,333 [VolumeScannerThread(dataDir)] ERROR > datanode.VolumeScanner: VolumeScanner(dataDir, > DS-5fc4263e-7a5c-4463-9f82-842108c0ab3b) exiting because of exception > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:539) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:619) > 2015-10-02 06:45:30,333 > [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@7768ca5] WARN > datanode.DataNode: DatanodeRegistration(sourceDN:1004, > datanodeUuid=f554982f-7c45-4fd4-ad57-9d472a39729e, infoPort=1006, > infoSecurePort=0, ipcPort=8020, > storageInfo=lv=-56;cid=CID-ddc217ab-5203-48ef-9695-a348feb4dac2;nsid=1872110141;c=1443758672580):Failed > to transfer BP-1749317823--1443758669533:blk_1073742231_1407 to > destDN:1004 got > java.net.SocketException: Original Exception : java.io.IOException: > Connection reset by peer > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > at > sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443) > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:579) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:759) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:706) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2124) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Connection reset by peer > ... 9 more > {noformat} > It is NPEing at the code in file: VolumeScanner#runLoop > {noformat} > long saveDelta = monotonicMs - curBlockIter.getLastSavedMs(); > {noformat} > curBlockIter is not initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
Catherine Palmer created HDFS-9191: -- Summary: Typo in Hdfs.java. NoSuchElementException is misspelled Key: HDFS-9191 URL: https://issues.apache.org/jira/browse/HDFS-9191 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Reporter: Catherine Palmer Assignee: Catherine Palmer Priority: Trivial Fix For: 3.0.0 Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) TestRecoverStripedFile is failing
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941492#comment-14941492 ] Jing Zhao commented on HDFS-9185: - The new patch looks good to me. All the failed tests passed in my local machine. +1. I will commit it shortly. > TestRecoverStripedFile is failing > - > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9185: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) I've committed this to trunk. Thanks for the contribution [~rakeshr]! Thanks for the review [~umamaheswararao]! > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941524#comment-14941524 ] Hudson commented on HDFS-9191: -- FAILURE: Integrated in Hadoop-trunk-Commit #8556 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8556/]) HDFS-9191. Typo in Hdfs.java. NoSuchElementException is misspelled. (jghoman: rev 3929ac9340a5c9f26574dc076a449f7e11931527) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9149) Consider multi datacenter when sortByDistance
[ https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941360#comment-14941360 ] He Xiaoqiao commented on HDFS-9149: --- Thanks, [~He Tianyi]. it's not a good solution exactly. maybe we could calculate weight recursively? or any better suggestion? There is another kind of situation. it is hard to calc weight between reader and DN where reader are not one node of Hadoop Cluster but in the same IDC with part of cluster. > Consider multi datacenter when sortByDistance > - > > Key: HDFS-9149 > URL: https://issues.apache.org/jira/browse/HDFS-9149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: He Xiaoqiao >Assignee: He Tianyi > > {{sortByDistance}} doesn't consider multi-datacenter when read data, so there > my be reading data via other datacenter when hadoop deployment with multi-IDC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941391#comment-14941391 ] Allen Wittenauer commented on HDFS-9184: Let me clarify a bit: The HDFS audit log is probably the single most widely machine parsed log in the entirety of the Hadoop. It was specifically made a fixed field log to make it easy even for beginner admins to use, in a format that doesn't require a lot of heavy machinery to actually make useful. As a result, changing the format of this file has an extreme impact on pretty much every Hadoop operations team in existence. So while the functionality may be useful, there is no way in good conscious should we be modifying the current layout in branch-2. So I still stand at: -1 for branch-2 0 for trunk > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9190) VolumeScanner throwing NPE while scanning suspect block.
[ https://issues.apache.org/jira/browse/HDFS-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941462#comment-14941462 ] Xiaoyu Yao commented on HDFS-9190: -- This should be fixed by HDFS-8850. [~shahrs87], can you try the build latest trunk or patch from HDFS-8850 and confirm? > VolumeScanner throwing NPE while scanning suspect block. > > > Key: HDFS-9190 > URL: https://issues.apache.org/jira/browse/HDFS-9190 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Priority: Critical > > Volume scanner NPEs while scanning suspect Block. > Following is the stack trace: > {noformat} > 2015-10-02 06:45:30,333 [VolumeScannerThread(dataDir)] ERROR > datanode.VolumeScanner: VolumeScanner(dataDir, > DS-5fc4263e-7a5c-4463-9f82-842108c0ab3b) exiting because of exception > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:539) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:619) > 2015-10-02 06:45:30,333 > [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@7768ca5] WARN > datanode.DataNode: DatanodeRegistration(sourceDN:1004, > datanodeUuid=f554982f-7c45-4fd4-ad57-9d472a39729e, infoPort=1006, > infoSecurePort=0, ipcPort=8020, > storageInfo=lv=-56;cid=CID-ddc217ab-5203-48ef-9695-a348feb4dac2;nsid=1872110141;c=1443758672580):Failed > to transfer BP-1749317823--1443758669533:blk_1073742231_1407 to > destDN:1004 got > java.net.SocketException: Original Exception : java.io.IOException: > Connection reset by peer > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > at > sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443) > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:579) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:759) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:706) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2124) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Connection reset by peer > ... 9 more > {noformat} > It is NPEing at the code in file: VolumeScanner#runLoop > {noformat} > long saveDelta = monotonicMs - curBlockIter.getLastSavedMs(); > {noformat} > curBlockIter is not initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Catherine Palmer updated HDFS-9191: --- Status: Patch Available (was: Open) > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941516#comment-14941516 ] Hudson commented on HDFS-9185: -- FAILURE: Integrated in Hadoop-trunk-Commit #8555 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8555/]) HDFS-9185. Fix null tracer in ErasureCodingWorker. Contributed by Rakesh (jing9: rev c6cafc77e697317dad0708309b67b900a2e3a413) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRecoverStripedFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES-HDFS-EC-7285.txt > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-9191: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Since it's a comment change, not waiting for Jenkins. Thanks for the contribution, Catherine! > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9186) Simplify embedding libhdfspp into other projects
[ https://issues.apache.org/jira/browse/HDFS-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9186: -- Status: Patch Available (was: Open) > Simplify embedding libhdfspp into other projects > > > Key: HDFS-9186 > URL: https://issues.apache.org/jira/browse/HDFS-9186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9186.HDFS-8707.000.patch > > > I'd like to add a script to the root libhdfspp directory that can prune > anything that libhdfspp doesn't need to compile out of the hadoop source > tree. > This way the project is a lot smaller if it's going to be included in a > third-party directory of another project. The directory structure, aside > from missing directories, is preserved so modifications can be diffed against > a fresh checkout of the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941542#comment-14941542 ] Colin Patrick McCabe commented on HDFS-9184: Is it documented anywhere that the audit log is key/value? I didn't see any specification for the format... did I miss some docs somewhere? I don't think this is similar to protobuf because there is a clearly defined and documented way to extend PB. Many modern Hadoop systems access HDFS through a proxy. For example, some people use Tachyon to get read and write caching. RecordService provides row-level security and deserialization services. Hive itself usually does its work on behalf of some other process like Tableau, or Spark. How will this solution work in those cases? For me, a lot of this discussion gets back to the reasons why htrace is a separate system rather than just part of HDFS or HBase. You need something that can span multiple projects and create a coherent narrative about what's going on. I agree that HTrace should not be run at 100% sampling, but I am not convinced by the arguments that we need 100% sampling. If this is to diagnose performance issues, then 1% or so sampling should be fine. If this is about security issues, then it seems flawed, since it doesn't actually stop anyone from accessing anything. Can you be a little clearer about the specific use-cases for this? > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9186) Simplify embedding libhdfspp into other projects
[ https://issues.apache.org/jira/browse/HDFS-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9186: -- Attachment: HDFS-9186.HDFS-8707.000.patch Simple script that copies only the things needed to compile/test and sticks them into the `pwd`/minimized > Simplify embedding libhdfspp into other projects > > > Key: HDFS-9186 > URL: https://issues.apache.org/jira/browse/HDFS-9186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9186.HDFS-8707.000.patch > > > I'd like to add a script to the root libhdfspp directory that can prune > anything that libhdfspp doesn't need to compile out of the hadoop source > tree. > This way the project is a lot smaller if it's going to be included in a > third-party directory of another project. The directory structure, aside > from missing directories, is preserved so modifications can be diffed against > a fresh checkout of the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9190) VolumeScanner throwing NPE while scanning suspect blocks.
Rushabh S Shah created HDFS-9190: Summary: VolumeScanner throwing NPE while scanning suspect blocks. Key: HDFS-9190 URL: https://issues.apache.org/jira/browse/HDFS-9190 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.7.0 Reporter: Rushabh S Shah Priority: Critical Volume scanner NPEs while scanning suspect Block. Following is the stack trace: {noformat} 2015-10-02 06:45:30,333 [VolumeScannerThread(dataDir)] ERROR datanode.VolumeScanner: VolumeScanner(dataDir, DS-5fc4263e-7a5c-4463-9f82-842108c0ab3b) exiting because of exception java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:539) at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:619) 2015-10-02 06:45:30,333 [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@7768ca5] WARN datanode.DataNode: DatanodeRegistration(sourceDN:1004, datanodeUuid=f554982f-7c45-4fd4-ad57-9d472a39729e, infoPort=1006, infoSecurePort=0, ipcPort=8020, storageInfo=lv=-56;cid=CID-ddc217ab-5203-48ef-9695-a348feb4dac2;nsid=1872110141;c=1443758672580):Failed to transfer BP-1749317823--1443758669533:blk_1073742231_1407 to destDN:1004 got java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:579) at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:759) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:706) at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2124) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Connection reset by peer ... 9 more {noformat} It is NPEing at the code in file: VolumeScanner#runLoop {noformat} long saveDelta = monotonicMs - curBlockIter.getLastSavedMs(); {noformat} curBlockIter is not initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941442#comment-14941442 ] Daryn Sharp commented on HDFS-9184: --- Adding another kvp to the audit log is not an incompatible change, and isn't IMHO grounds for a -1. I'm pretty sure the previous proto=(rpc|webhdfs) key was added mid-2.x with no fanfare. The goal of this jira is sorely needed. The crux is how can we do it with minimal performance impact and no incompatibility. My concern is the overhead with a per-call context. I'd rather see it in the connection context. I thought we could leverage the dfsclient id, but alas it's not part of the connection context like I thought. But, adding an optional & arbitrary string to the connection context might work. I can envision a conceptually simple api to append a delimited value. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9185: Summary: Fix null tracer in ErasureCodingWorker (was: TestRecoverStripedFile is failing) > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9180) Update excluded DataNodes in DFSStripedOutputStream based on failures in data streamers
[ https://issues.apache.org/jira/browse/HDFS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9180: Attachment: HDFS-9180.002.patch Thanks for the review, Yi! The failed EC related tests are mainly caused by some bugs in the testing code. Update the patch to fix. > Update excluded DataNodes in DFSStripedOutputStream based on failures in data > streamers > --- > > Key: HDFS-9180 > URL: https://issues.apache.org/jira/browse/HDFS-9180 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-9180.000.patch, HDFS-9180.001.patch, > HDFS-9180.002.patch > > > This is a TODO in HDFS-9040: based on the failures all the striped data > streamers hit, the DFSStripedOutputStream should keep a record of all the > DataNodes that should be excluded. > This jira will also fix several bugs in the DFSStripedOutputStream. Will > provide more details in the comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941454#comment-14941454 ] Allen Wittenauer commented on HDFS-9184: bq. I'm pretty sure the previous proto=(rpc|webhdfs) key was added mid-2.x with no fanfare. Believe me, it broke stuff. I would have -1'd that one too. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941473#comment-14941473 ] Jitendra Nath Pandey commented on HDFS-9184: Audit log format is designed to be a key value format so that it can be extensible. Addition of a new key optional value pair is not an incompatible change. However, we can also consider making this feature configurable which is off by default, so that there is no change at all. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9190) VolumeScanner throwing NPE while scanning suspect block.
[ https://issues.apache.org/jira/browse/HDFS-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah resolved HDFS-9190. -- Resolution: Duplicate [~xyao]: thanks for pointing out to hdfs-8850 Closing this ticket as duplicate > VolumeScanner throwing NPE while scanning suspect block. > > > Key: HDFS-9190 > URL: https://issues.apache.org/jira/browse/HDFS-9190 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.0 >Reporter: Rushabh S Shah >Priority: Critical > > Volume scanner NPEs while scanning suspect Block. > Following is the stack trace: > {noformat} > 2015-10-02 06:45:30,333 [VolumeScannerThread(dataDir)] ERROR > datanode.VolumeScanner: VolumeScanner(dataDir, > DS-5fc4263e-7a5c-4463-9f82-842108c0ab3b) exiting because of exception > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:539) > at > org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:619) > 2015-10-02 06:45:30,333 > [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@7768ca5] WARN > datanode.DataNode: DatanodeRegistration(sourceDN:1004, > datanodeUuid=f554982f-7c45-4fd4-ad57-9d472a39729e, infoPort=1006, > infoSecurePort=0, ipcPort=8020, > storageInfo=lv=-56;cid=CID-ddc217ab-5203-48ef-9695-a348feb4dac2;nsid=1872110141;c=1443758672580):Failed > to transfer BP-1749317823--1443758669533:blk_1073742231_1407 to > destDN:1004 got > java.net.SocketException: Original Exception : java.io.IOException: > Connection reset by peer > at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) > at > sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:443) > at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:575) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:579) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:759) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:706) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2124) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Connection reset by peer > ... 9 more > {noformat} > It is NPEing at the code in file: VolumeScanner#runLoop > {noformat} > long saveDelta = monotonicMs - curBlockIter.getLastSavedMs(); > {noformat} > curBlockIter is not initialized. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941525#comment-14941525 ] Lei (Eddy) Xu commented on HDFS-9015: - +1 LGTM. > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941535#comment-14941535 ] Jitendra Nath Pandey commented on HDFS-9184: bq ...connection context Many applications heavily rely on filesystem cache and connection cache for performance. A string in connection context would need to be updated for different calls. It may not work in multi-threaded applications. I think if we restrict the length of this additional string these costs can be kept to minimal. For example, a default length of 128 bytes will be a small increment to current audit log record sizes. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9186) Simplify embedding libhdfspp into other projects
[ https://issues.apache.org/jira/browse/HDFS-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941544#comment-14941544 ] Hadoop QA commented on HDFS-9186: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 0m 0s | Pre-patch HDFS-8707 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 421 release audit warnings. | | {color:red}-1{color} | shellcheck | 0m 6s | The applied patch generated 11 new shellcheck (v0.3.3) issues (total was 25, now 36). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 0m 27s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764814/HDFS-9186.HDFS-8707.000.patch | | Optional Tests | shellcheck | | git revision | HDFS-8707 / 3668778 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12773/artifact/patchprocess/patchReleaseAuditProblems.txt | | shellcheck | https://builds.apache.org/job/PreCommit-HDFS-Build/12773/artifact/patchprocess/diffpatchshellcheck.txt | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12773/console | This message was automatically generated. > Simplify embedding libhdfspp into other projects > > > Key: HDFS-9186 > URL: https://issues.apache.org/jira/browse/HDFS-9186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9186.HDFS-8707.000.patch > > > I'd like to add a script to the root libhdfspp directory that can prune > anything that libhdfspp doesn't need to compile out of the hadoop source > tree. > This way the project is a lot smaller if it's going to be included in a > third-party directory of another project. The directory structure, aside > from missing directories, is preserved so modifications can be diffed against > a fresh checkout of the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941460#comment-14941460 ] Daryn Sharp commented on HDFS-9184: --- It's a simple and _extensible_ kvp file. If something doesn't parse it as such, it's the parser's fault, not an incompatibility that should hinder progress. Food for thought: by this incompatibility logic, we can't add any new fields to protobufs > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Catherine Palmer updated HDFS-9191: --- Attachment: hdfs-9191.patch quick patch to fix a typo; no tests since the typo is in the comment > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9142) Namenode Http address is not configured correctly for federated cluster in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-9142: -- Attachment: HDFS-9142.v4.patch > Namenode Http address is not configured correctly for federated cluster in > MiniDFSCluster > - > > Key: HDFS-9142 > URL: https://issues.apache.org/jira/browse/HDFS-9142 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Siqi Li >Assignee: Siqi Li > Attachments: HDFS-9142.v1.patch, HDFS-9142.v2.patch, > HDFS-9142.v3.patch, HDFS-9142.v4.patch > > > When setting up simpleHAFederatedTopology in MiniDFSCluster, each Namenode > should have its own configuration object, and the configuration should have > "dfs.namenode.http-address--" set up correctly for > allpair -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-9015: Resolution: Fixed Fix Version/s: 2.8.0 3.0.0 Target Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the work, [~mingma]. Committed to trunk and branch-2. > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9186) Simplify embedding libhdfspp into other projects
[ https://issues.apache.org/jira/browse/HDFS-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer updated HDFS-9186: -- Status: Open (was: Patch Available) > Simplify embedding libhdfspp into other projects > > > Key: HDFS-9186 > URL: https://issues.apache.org/jira/browse/HDFS-9186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9186.HDFS-8707.000.patch > > > I'd like to add a script to the root libhdfspp directory that can prune > anything that libhdfspp doesn't need to compile out of the hadoop source > tree. > This way the project is a lot smaller if it's going to be included in a > third-party directory of another project. The directory structure, aside > from missing directories, is preserved so modifications can be diffed against > a fresh checkout of the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941613#comment-14941613 ] Allen Wittenauer commented on HDFS-9184: bq. Is it documented anywhere that the audit log is key/value? I didn't see any specification for the format... It's a) not documented and b) not a kvp. Story time. This is going to be the shorter version. I have few regrets about things I helped design in Hadoop, but this does happen to be one of them especially due to all of the misunderstanding around what it's purpose in life is and how people actually use it. When [~chris.douglas] and I did the design work on the audit log back in 2008 (IIRC), I specifically wanted a fixed field log file format. We were going to be writing ops tools to answer questions that we the ops team simply could not. It was important that the format stay fixed for a variety of reasons: * The ops team at Y! was tiny with a mix of junior and senior folks. The junior folks were likely going to be the ones writing the code since the senior folks were busy dealing with the continual fallout from the weekly Hadoop upgrades and just getting a working infrastructure in place while we moved away from YST. (... and getting ops-specific tooling out of dev was regularly blocked by management ...) * We needed to make sure that no matter what the devs added to Hadoop, the log file wouldn't change. At that point in time, the logs for things like the NN were wildly fluctuating and were pretty much impossible to use for any sort of metrics or monitoring. We needed a safespace that was away from the turmoil happening in the rest of the system. If the system would have been open ended, it would have been absolute hell to work with. Forcing a format that at that point covered 100% of the foreseeable use cases solved that problem. * The content was modeled after Solaris BSM with a few key differences. BSM wrote in binary which just wasn't a real option without us pulling out more advanced techniques. It would fail the 'quick and dirty' tests that the ops team had to have in order to fulfill user needs. BSM also supported a heck of a lot more than Hadoop did. So a straight logfile it was. Now one of the things I wanted to avoid was the "tab problem". e.g., fields that are empty end up looking like fieldfield. So we settled on a = format where every label would always be present so that we could then use spaces to break up the columns. [Thus why I say it is *not* kvp. In most key-value stores that I've worked with, it's rare to see key=(null)]. I've also heard that the file is a "weird form of JSON". No, it's not. In fact, I vetoed JSON because of the extra parsing overhead with very little gain to be seen by doing that vs. just fixing all the fields. Now, what would I do differently? #1 would be documentation with a clear explanation of this history, covering the whys and the hows. #2 would probably be to make it officially key value with some fields being required. But that's a different problem altogether > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 |
[jira] [Updated] (HDFS-9188) Make block corruption related tests FsDataset-agnostic.
[ https://issues.apache.org/jira/browse/HDFS-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-9188: Attachment: HDFS-9188.001.patch Address release audit and whitespace warnings. The test failures are not relevant. > Make block corruption related tests FsDataset-agnostic. > > > Key: HDFS-9188 > URL: https://issues.apache.org/jira/browse/HDFS-9188 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-9188.000.patch, HDFS-9188.001.patch > > > Currently, HDFS does block corruption tests by directly accessing the files > stored on the storage directories, which assumes {{FsDatasetImpl}} is the > dataset implementation. However, with works like OZone (HDFS-7240) and > HDFS-8679, there will be different FsDataset implementations. > So we need a general way to run whitebox tests like corrupting blocks and crc > files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941639#comment-14941639 ] Daryn Sharp commented on HDFS-9184: --- (I'll rest my case, sans history, with the format is "label=val label=val ...". A rather self-documenting format. If a parser can't handle another label, esp. one tacked on to the end, that's just bad programming) Anyway, the most basic use-case is: Production user X is pounding the NN. I wonder what job it is? Let me look at oozie, arg, 20 jobs. Hey, user X, stop abusing the NN, kill your bad job. You don't know which job? Can you tell from these paths? You can't? Fine, I'll login to one of the hosts in the audit log and look for the tasks. Arg, 5 different jobs running tasks as user X on this node. I guess I'll try to intersect the jobs across multiple nodes... Boy, I wish the audit log could tell me which job it is... I'd love to see a keep-it-simple approach for this most basic issue we've all faced. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941654#comment-14941654 ] Cathy Palmer commented on HDFS-9191: Thanks for the tutorial! I'm going to write up my notes and share. Good luck to you at your next gig! You know that if you take your bike to work, you can ride to Seattle and back over the bridge at lunch. Or there's a restaurant at the top of Mercer Island, Roanoke Inn that's a great lunch stop. You cannot kayak there though. :) You can also ride to Factoria mall area via bike trail for lunch. It will get you out of the office. Cathy > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941574#comment-14941574 ] Hudson commented on HDFS-9191: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2415 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2415/]) HDFS-9191. Typo in Hdfs.java. NoSuchElementException is misspelled. (jghoman: rev 3929ac9340a5c9f26574dc076a449f7e11931527) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941576#comment-14941576 ] Ming Ma commented on HDFS-9015: --- Thanks [~eddyxu]! > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941592#comment-14941592 ] Hudson commented on HDFS-9185: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1210 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1210/]) HDFS-9185. Fix null tracer in ErasureCodingWorker. Contributed by Rakesh (jing9: rev c6cafc77e697317dad0708309b67b900a2e3a413) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES-HDFS-EC-7285.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRecoverStripedFile.java > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941594#comment-14941594 ] Hudson commented on HDFS-9100: -- FAILURE: Integrated in Hadoop-trunk-Commit #8558 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8558/]) HDFS-9100. HDFS Balancer does not respect (yzhang: rev 1037ee580f87e6bf13155834c36f26794381678b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9186) Simplify embedding libhdfspp into other projects
[ https://issues.apache.org/jira/browse/HDFS-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941597#comment-14941597 ] James Clampffer commented on HDFS-9186: --- Decided to close this. While it does what I need I think most projects that incorporate libhdfs++ are going to include it in application/project specific ways. > Simplify embedding libhdfspp into other projects > > > Key: HDFS-9186 > URL: https://issues.apache.org/jira/browse/HDFS-9186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9186.HDFS-8707.000.patch > > > I'd like to add a script to the root libhdfspp directory that can prune > anything that libhdfspp doesn't need to compile out of the hadoop source > tree. > This way the project is a lot smaller if it's going to be included in a > third-party directory of another project. The directory structure, aside > from missing directories, is preserved so modifications can be diffed against > a fresh checkout of the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-9100: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I committed to trunk and branch-2. Thanks Casey for the contribution! > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Fix For: 2.8.0 > > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941687#comment-14941687 ] Hudson commented on HDFS-9191: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #472 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/472/]) HDFS-9191. Typo in Hdfs.java. NoSuchElementException is misspelled. (jghoman: rev 3929ac9340a5c9f26574dc076a449f7e11931527) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941688#comment-14941688 ] Hudson commented on HDFS-9185: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #472 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/472/]) HDFS-9185. Fix null tracer in ErasureCodingWorker. Contributed by Rakesh (jing9: rev c6cafc77e697317dad0708309b67b900a2e3a413) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES-HDFS-EC-7285.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRecoverStripedFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941696#comment-14941696 ] Colin Patrick McCabe commented on HDFS-9184: [~aw]: I feel like this is a good example of why the audit log format should have been JSON. We wouldn't be having this discussion if the format had been one JSON record per line, since it would be obvious how to parse it. It's also relatively easy to find libraries for JSON in every language you might want to use (although perhaps it wasn't so easy back when the audit log was first added to HDFS?) I'm not sure I understand the desire for COBOL-style fixed fields (party like it's 1975?). But I do agree that compatibility is a concern here since there is basically no spec that we can point to when people are writing their parsers. They could easily just be doing {{scanf("%s %s %s", foo, bar, baz)}} and then we would break them. [~daryn]: thanks for giving an example of how this would be used. I agree this has been a pain point for a while. This is possibly a dumb question, but couldn't clientId be used for this purpose? This solution also presupposes some kind of daemon or service to gather context IDs in Hive. This service hasn't been written yet, but if it were, it seems like it might start looking a lot like HTrace. Like I said earlier, I also feel like this solution wouldn't work in the case where HBase was in use, or RecordService, or Tachyon. We are definitely planning some YARN and MR integration for HTrace. I would really like to get more people excited about this project and work out what we'd need to do to get it to cover all these use-cases. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for
[jira] [Commented] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941575#comment-14941575 ] Hudson commented on HDFS-9185: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2415 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2415/]) HDFS-9185. Fix null tracer in ErasureCodingWorker. Contributed by Rakesh (jing9: rev c6cafc77e697317dad0708309b67b900a2e3a413) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRecoverStripedFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES-HDFS-EC-7285.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-9186) Simplify embedding libhdfspp into other projects
[ https://issues.apache.org/jira/browse/HDFS-9186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Clampffer resolved HDFS-9186. --- Resolution: Not A Problem > Simplify embedding libhdfspp into other projects > > > Key: HDFS-9186 > URL: https://issues.apache.org/jira/browse/HDFS-9186 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-9186.HDFS-8707.000.patch > > > I'd like to add a script to the root libhdfspp directory that can prune > anything that libhdfspp doesn't need to compile out of the hadoop source > tree. > This way the project is a lot smaller if it's going to be included in a > third-party directory of another project. The directory structure, aside > from missing directories, is preserved so modifications can be diffed against > a fresh checkout of the source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941652#comment-14941652 ] Allen Wittenauer commented on HDFS-9184: bq. If a parser can't handle another label, esp. one tacked on to the end, that's just bad programming You've missed several key points in that story. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-8850: - Labels: (was: 2.7.2-candidate) > VolumeScanner thread exits with exception if there is no block pool to be > scanned but there are suspicious blocks > - > > Key: HDFS-8850 > URL: https://issues.apache.org/jira/browse/HDFS-8850 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0, 2.7.2 > > Attachments: HDFS-8850.001.patch > > > The VolumeScanner threads inside the BlockScanner exit with an exception if > there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-8850: - Fix Version/s: (was: 2.8.0) 2.7.2 3.0.0 > VolumeScanner thread exits with exception if there is no block pool to be > scanned but there are suspicious blocks > - > > Key: HDFS-8850 > URL: https://issues.apache.org/jira/browse/HDFS-8850 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0, 2.7.2 > > Attachments: HDFS-8850.001.patch > > > The VolumeScanner threads inside the BlockScanner exit with an exception if > there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9137) DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded
[ https://issues.apache.org/jira/browse/HDFS-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-9137: -- Attachment: HDFS-9137.00.patch Attached a patch for fixing this issue. > DeadLock between DataNode#refreshVolumes and > BPOfferService#registrationSucceeded > -- > > Key: HDFS-9137 > URL: https://issues.apache.org/jira/browse/HDFS-9137 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0, 2.7.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-9137.00.patch > > > I can see this code flows between DataNode#refreshVolumes and > BPOfferService#registrationSucceeded could cause deadLock. > In practice situation may be rare as user calling refreshVolumes at the time > DN registration with NN. But seems like issue can happen. > Reason for deadLock: > 1) refreshVolumes will be called with DN lock and after at the end it will > also trigger Block report. In the Block report call, > BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes > readLock on bpos. > DN lock then boos lock > 2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and > calling dn.bpRegistrationSucceeded which is again synchronized call on DN. > bpos lock and then DN lock. > So, this can clearly create dead lock. > I think simple fix could be to move triggerBlockReport call outside out DN > lock and I feel that call may not be really needed inside DN lock. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9137) DeadLock between DataNode#refreshVolumes and BPOfferService#registrationSucceeded
[ https://issues.apache.org/jira/browse/HDFS-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-9137: -- Status: Patch Available (was: Open) > DeadLock between DataNode#refreshVolumes and > BPOfferService#registrationSucceeded > -- > > Key: HDFS-9137 > URL: https://issues.apache.org/jira/browse/HDFS-9137 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1, 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-9137.00.patch > > > I can see this code flows between DataNode#refreshVolumes and > BPOfferService#registrationSucceeded could cause deadLock. > In practice situation may be rare as user calling refreshVolumes at the time > DN registration with NN. But seems like issue can happen. > Reason for deadLock: > 1) refreshVolumes will be called with DN lock and after at the end it will > also trigger Block report. In the Block report call, > BPServiceActor#triggerBlockReport calls toString on bpos. Here it takes > readLock on bpos. > DN lock then boos lock > 2) BPOfferSetrvice#registrationSucceeded call is taking writeLock on bpos and > calling dn.bpRegistrationSucceeded which is again synchronized call on DN. > bpos lock and then DN lock. > So, this can clearly create dead lock. > I think simple fix could be to move triggerBlockReport call outside out DN > lock and I feel that call may not be really needed inside DN lock. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941660#comment-14941660 ] Kihwal Lee commented on HDFS-8850: -- Cherry-picked to branch-2.7. > VolumeScanner thread exits with exception if there is no block pool to be > scanned but there are suspicious blocks > - > > Key: HDFS-8850 > URL: https://issues.apache.org/jira/browse/HDFS-8850 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0, 2.7.2 > > Attachments: HDFS-8850.001.patch > > > The VolumeScanner threads inside the BlockScanner exit with an exception if > there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-9191: -- Comment: was deleted (was: Thanks for the tutorial! I'm going to write up my notes and share. Good luck to you at your next gig! You know that if you take your bike to work, you can ride to Seattle and back over the bridge at lunch. Or there's a restaurant at the top of Mercer Island, Roanoke Inn that's a great lunch stop. You cannot kayak there though. :) You can also ride to Factoria mall area via bike trail for lunch. It will get you out of the office. Cathy ) > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941693#comment-14941693 ] Hudson commented on HDFS-9100: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2416 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2416/]) HDFS-9100. HDFS Balancer does not respect (yzhang: rev 1037ee580f87e6bf13155834c36f26794381678b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Fix For: 2.8.0 > > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8873) Allow the directoryScanner to be rate-limited
[ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-8873: - Labels: 2.7.2-candidate (was: ) > Allow the directoryScanner to be rate-limited > - > > Key: HDFS-8873 > URL: https://issues.apache.org/jira/browse/HDFS-8873 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.1 >Reporter: Nathan Roberts >Assignee: Daniel Templeton > Labels: 2.7.2-candidate > Fix For: 2.8.0 > > Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, > HDFS-8873.003.patch, HDFS-8873.004.patch, HDFS-8873.005.patch, > HDFS-8873.006.patch, HDFS-8873.007.patch, HDFS-8873.008.patch, > HDFS-8873.009.patch > > > The new 2-level directory layout can make directory scans expensive in terms > of disk seeks (see HDFS-8791) for details. > It would be good if the directoryScanner() had a configurable duty cycle that > would reduce its impact on disk performance (much like the approach in > HDFS-8617). > Without such a throttle, disks can go 100% busy for many minutes at a time > (assuming the common case of all inodes in cache but no directory blocks > cached, 64K seeks are required for full directory listing which translates to > 655 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941560#comment-14941560 ] Yongjun Zhang commented on HDFS-9100: - Sorry for the delay, will commit momentarily. > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941670#comment-14941670 ] Hudson commented on HDFS-9185: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #480 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/480/]) HDFS-9185. Fix null tracer in ErasureCodingWorker. Contributed by Rakesh (jing9: rev c6cafc77e697317dad0708309b67b900a2e3a413) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRecoverStripedFile.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES-HDFS-EC-7285.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8850) VolumeScanner thread exits with exception if there is no block pool to be scanned but there are suspicious blocks
[ https://issues.apache.org/jira/browse/HDFS-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941558#comment-14941558 ] Rushabh S Shah commented on HDFS-8850: -- [~hitliuyi], [~cmccabe]: Does it make sense to commit to 2.7.2 ? > VolumeScanner thread exits with exception if there is no block pool to be > scanned but there are suspicious blocks > - > > Key: HDFS-8850 > URL: https://issues.apache.org/jira/browse/HDFS-8850 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.0 >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Labels: 2.7.2-candidate > Fix For: 2.8.0 > > Attachments: HDFS-8850.001.patch > > > The VolumeScanner threads inside the BlockScanner exit with an exception if > there is no block pool to be scanned but there are suspicious blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941568#comment-14941568 ] Hudson commented on HDFS-9015: -- FAILURE: Integrated in Hadoop-trunk-Commit #8557 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8557/]) HDFS-9015. Refactor TestReplicationPolicy to test different block (lei: rev a68b6eb0f4110ba626a44fad6b9eb5d8c5a4901f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941591#comment-14941591 ] Hudson commented on HDFS-9191: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1210 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1210/]) HDFS-9191. Typo in Hdfs.java. NoSuchElementException is misspelled. (jghoman: rev 3929ac9340a5c9f26574dc076a449f7e11931527) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941593#comment-14941593 ] Hudson commented on HDFS-9015: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1210 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1210/]) HDFS-9015. Refactor TestReplicationPolicy to test different block (lei: rev a68b6eb0f4110ba626a44fad6b9eb5d8c5a4901f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941669#comment-14941669 ] Hudson commented on HDFS-9191: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #480 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/480/]) HDFS-9191. Typo in Hdfs.java. NoSuchElementException is misspelled. (jghoman: rev 3929ac9340a5c9f26574dc076a449f7e11931527) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941677#comment-14941677 ] Hudson commented on HDFS-9100: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #446 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/446/]) HDFS-9100. HDFS Balancer does not respect (yzhang: rev 1037ee580f87e6bf13155834c36f26794381678b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Fix For: 2.8.0 > > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941679#comment-14941679 ] Hudson commented on HDFS-9185: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #446 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/446/]) HDFS-9185. Fix null tracer in ErasureCodingWorker. Contributed by Rakesh (jing9: rev c6cafc77e697317dad0708309b67b900a2e3a413) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRecoverStripedFile.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES-HDFS-EC-7285.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941678#comment-14941678 ] Hudson commented on HDFS-9191: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #446 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/446/]) HDFS-9191. Typo in Hdfs.java. NoSuchElementException is misspelled. (jghoman: rev 3929ac9340a5c9f26574dc076a449f7e11931527) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941680#comment-14941680 ] Hudson commented on HDFS-9015: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #446 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/446/]) HDFS-9015. Refactor TestReplicationPolicy to test different block (lei: rev a68b6eb0f4110ba626a44fad6b9eb5d8c5a4901f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941694#comment-14941694 ] Hudson commented on HDFS-9015: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2416 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2416/]) HDFS-9015. Refactor TestReplicationPolicy to test different block (lei: rev a68b6eb0f4110ba626a44fad6b9eb5d8c5a4901f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9193) when a datanode's usage go above 70 percent, we can't open datanodes tab in NN UI
Chang Li created HDFS-9193: -- Summary: when a datanode's usage go above 70 percent, we can't open datanodes tab in NN UI Key: HDFS-9193 URL: https://issues.apache.org/jira/browse/HDFS-9193 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Priority: Blocker -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9193) when a datanode's usage go above 70 percent, we can't open datanodes tab in NN UI
[ https://issues.apache.org/jira/browse/HDFS-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941830#comment-14941830 ] Hadoop QA commented on HDFS-9193: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 0m 0s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | release audit | 0m 12s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | | | 0m 16s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764849/HDFS-9193.patch | | Optional Tests | | | git revision | trunk / fdf02d1 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12778/artifact/patchprocess/patchReleaseAuditProblems.txt | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12778/console | This message was automatically generated. > when a datanode's usage go above 70 percent, we can't open datanodes tab in > NN UI > - > > Key: HDFS-9193 > URL: https://issues.apache.org/jira/browse/HDFS-9193 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Blocker > Attachments: HDFS-9193.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941846#comment-14941846 ] Konstantin Shvachko commented on HDFS-3107: --- Sorry, got distracted. It would be good to create a new jira for truncate support in nfs. > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.
[ https://issues.apache.org/jira/browse/HDFS-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8164: Status: Open (was: Patch Available) > cTime is 0 in VERSION file for newly formatted NameNode. > > > Key: HDFS-8164 > URL: https://issues.apache.org/jira/browse/HDFS-8164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.3-alpha >Reporter: Chris Nauroth >Assignee: Xiao Chen >Priority: Minor > Attachments: HDFS-8164.001.patch, HDFS-8164.002.patch > > > After formatting a NameNode and inspecting its VERSION file, the cTime > property shows 0. The value does get updated to current time during an > upgrade, but I believe this is intended to be the creation time of the > cluster, and therefore the initial value of 0 before an upgrade can cause > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.
[ https://issues.apache.org/jira/browse/HDFS-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8164: Attachment: HDFS-8164.003.patch > cTime is 0 in VERSION file for newly formatted NameNode. > > > Key: HDFS-8164 > URL: https://issues.apache.org/jira/browse/HDFS-8164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.3-alpha >Reporter: Chris Nauroth >Assignee: Xiao Chen >Priority: Minor > Attachments: HDFS-8164.001.patch, HDFS-8164.002.patch, > HDFS-8164.003.patch > > > After formatting a NameNode and inspecting its VERSION file, the cTime > property shows 0. The value does get updated to current time during an > upgrade, but I believe this is intended to be the creation time of the > cluster, and therefore the initial value of 0 before an upgrade can cause > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9192) Add valgrind suppression for statically initialized library objects
James Clampffer created HDFS-9192: - Summary: Add valgrind suppression for statically initialized library objects Key: HDFS-9192 URL: https://issues.apache.org/jira/browse/HDFS-9192 Project: Hadoop HDFS Issue Type: Sub-task Reporter: James Clampffer Assignee: James Clampffer When using --leak-check=full there's a lot of noise due to static initialization of constants and memory pools, most of them from protobuf. Add a suppression file that helps cut down on this noise but is selective enough that real issues aren't going to be masked as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941808#comment-14941808 ] Hudson commented on HDFS-9100: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #481 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/481/]) HDFS-9100. HDFS Balancer does not respect (yzhang: rev 1037ee580f87e6bf13155834c36f26794381678b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Fix For: 2.8.0 > > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941809#comment-14941809 ] Hudson commented on HDFS-9015: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #481 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/481/]) HDFS-9015. Refactor TestReplicationPolicy to test different block (lei: rev a68b6eb0f4110ba626a44fad6b9eb5d8c5a4901f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941851#comment-14941851 ] Constantine Peresypkin commented on HDFS-3107: -- Already done: https://issues.apache.org/jira/browse/HDFS-9164 > HDFS truncate > - > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Lei Chang >Assignee: Plamen Jeliazkov > Fix For: 2.7.0 > > Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, > HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, > HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, > HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, > HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.
[ https://issues.apache.org/jira/browse/HDFS-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8164: Status: Patch Available (was: Open) > cTime is 0 in VERSION file for newly formatted NameNode. > > > Key: HDFS-8164 > URL: https://issues.apache.org/jira/browse/HDFS-8164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.3-alpha >Reporter: Chris Nauroth >Assignee: Xiao Chen >Priority: Minor > Attachments: HDFS-8164.001.patch, HDFS-8164.002.patch, > HDFS-8164.003.patch > > > After formatting a NameNode and inspecting its VERSION file, the cTime > property shows 0. The value does get updated to current time during an > upgrade, but I believe this is intended to be the creation time of the > cluster, and therefore the initial value of 0 before an upgrade can cause > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.
[ https://issues.apache.org/jira/browse/HDFS-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8164: Attachment: HDFS-8164.003.patch Thanks [~yzhangal] for the review! Your comments make sense to me. I have uploaded a new patch encapsulating {{FSNameSystem#getCTime}}. I leave {{FSImage}} untouched for now, if in the future {{getCTime}} is needed from there, it can be easily added. > cTime is 0 in VERSION file for newly formatted NameNode. > > > Key: HDFS-8164 > URL: https://issues.apache.org/jira/browse/HDFS-8164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.3-alpha >Reporter: Chris Nauroth >Assignee: Xiao Chen >Priority: Minor > Attachments: HDFS-8164.001.patch, HDFS-8164.002.patch, > HDFS-8164.003.patch > > > After formatting a NameNode and inspecting its VERSION file, the cTime > property shows 0. The value does get updated to current time during an > upgrade, but I believe this is intended to be the creation time of the > cluster, and therefore the initial value of 0 before an upgrade can cause > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.
[ https://issues.apache.org/jira/browse/HDFS-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8164: Attachment: (was: HDFS-8164.003.patch) > cTime is 0 in VERSION file for newly formatted NameNode. > > > Key: HDFS-8164 > URL: https://issues.apache.org/jira/browse/HDFS-8164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.3-alpha >Reporter: Chris Nauroth >Assignee: Xiao Chen >Priority: Minor > Attachments: HDFS-8164.001.patch, HDFS-8164.002.patch > > > After formatting a NameNode and inspecting its VERSION file, the cTime > property shows 0. The value does get updated to current time during an > upgrade, but I believe this is intended to be the creation time of the > cluster, and therefore the initial value of 0 before an upgrade can cause > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.
[ https://issues.apache.org/jira/browse/HDFS-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8164: Status: Open (was: Patch Available) > cTime is 0 in VERSION file for newly formatted NameNode. > > > Key: HDFS-8164 > URL: https://issues.apache.org/jira/browse/HDFS-8164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.3-alpha >Reporter: Chris Nauroth >Assignee: Xiao Chen >Priority: Minor > Attachments: HDFS-8164.001.patch, HDFS-8164.002.patch > > > After formatting a NameNode and inspecting its VERSION file, the cTime > property shows 0. The value does get updated to current time during an > upgrade, but I believe this is intended to be the creation time of the > cluster, and therefore the initial value of 0 before an upgrade can cause > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9193) when a datanode's usage go above 70 percent, we can't open datanodes tab in NN UI
[ https://issues.apache.org/jira/browse/HDFS-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-9193: --- Attachment: HDFS-9193.patch > when a datanode's usage go above 70 percent, we can't open datanodes tab in > NN UI > - > > Key: HDFS-9193 > URL: https://issues.apache.org/jira/browse/HDFS-9193 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Blocker > Attachments: HDFS-9193.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9142) Namenode Http address is not configured correctly for federated cluster in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-9142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941875#comment-14941875 ] Hadoop QA commented on HDFS-9142: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 7m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 48s | There were no new javac warning messages. | | {color:red}-1{color} | release audit | 0m 13s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 23s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 25s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 186m 29s | Tests failed in hadoop-hdfs. | | | | 209m 16s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.tools.TestGetGroups | | | hadoop.hdfs.TestHdfsAdmin | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.TestClientReportBadBlock | | | hadoop.hdfs.TestSafeMode | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.tools.TestDebugAdmin | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.cli.TestErasureCodingCLI | | | hadoop.hdfs.TestMultiThreadedHflush | | | hadoop.hdfs.TestEncryptionZonesWithKMS | | | hadoop.hdfs.tools.TestStoragePolicyCommands | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.security.TestPermissionSymlinks | | | hadoop.hdfs.TestDFSRollback | | | hadoop.fs.TestUnbuffer | | | hadoop.hdfs.TestQuota | | | hadoop.hdfs.TestFileAppend2 | | | hadoop.hdfs.TestDFSClientRetries | | | hadoop.security.TestRefreshUserMappings | | | hadoop.hdfs.server.namenode.TestCheckpoint | | | hadoop.hdfs.TestReadWhileWriting | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.snapshot.TestDisallowModifyROSnapshot | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.TestDFSUpgrade | | | hadoop.hdfs.TestGetBlocks | | | hadoop.fs.permission.TestStickyBit | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.cli.TestXAttrCLI | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.fs.TestGlobPaths | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.server.namenode.TestCacheDirectives | | | hadoop.security.TestPermission | | | hadoop.hdfs.server.namenode.snapshot.TestXAttrWithSnapshot | | | hadoop.hdfs.server.namenode.TestINodeFile | | | hadoop.hdfs.server.namenode.snapshot.TestFileContextSnapshot | | | hadoop.hdfs.TestSetrepDecreasing | | | hadoop.hdfs.TestDFSFinalize | | | hadoop.hdfs.server.namenode.snapshot.TestAclWithSnapshot | | | hadoop.hdfs.TestDFSStorageStateRecovery | | | hadoop.hdfs.TestDisableConnCache | | | hadoop.hdfs.server.namenode.TestCheckPointForSecurityTokens | | | hadoop.hdfs.TestRestartDFS | | | hadoop.cli.TestHDFSCLI | | | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.TestFileCreation | | | hadoop.cli.TestAclCLI | | | hadoop.hdfs.TestDFSPermission | | | hadoop.cli.TestDeleteCLI | | | hadoop.cli.TestCryptoAdminCLI | | | hadoop.hdfs.TestRollingUpgradeRollback | | | hadoop.hdfs.server.namenode.TestFileContextAcl | | | hadoop.hdfs.server.namenode.TestNameNodeXAttr | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.TestDFSStartupVersions | | | hadoop.hdfs.TestFetchImage | | | hadoop.cli.TestCacheAdminCLI | | | hadoop.hdfs.web.TestWebHDFSXAttr | | | hadoop.hdfs.server.namenode.TestFsck | | | hadoop.hdfs.TestSnapshotCommands | | | hadoop.hdfs.TestClose | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshottableDirListing | | | hadoop.hdfs.TestFileStatus | | | hadoop.hdfs.TestFsShellPermission | | | hadoop.fs.loadGenerator.TestLoadGenerator | | | hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade | | | hadoop.hdfs.server.namenode.TestFileContextXAttr | | | hadoop.hdfs.server.namenode.TestStorageRestore | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764818/HDFS-9142.v4.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / a68b6eb | | Release Audit |
[jira] [Updated] (HDFS-9193) when a datanode's usage go above 70 percent, we can't open datanodes tab in NN UI
[ https://issues.apache.org/jira/browse/HDFS-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-9193: --- Status: Patch Available (was: Open) the error is caused by a reference error happened in dfshealth.js when trying to load datanodes tab page {code} } else if (u.usedPercentage < 85) { {code} the u is defined no where uploaded patch fix this problem > when a datanode's usage go above 70 percent, we can't open datanodes tab in > NN UI > - > > Key: HDFS-9193 > URL: https://issues.apache.org/jira/browse/HDFS-9193 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li >Priority: Blocker > Attachments: HDFS-9193.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941827#comment-14941827 ] Hudson commented on HDFS-9100: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #1211 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1211/]) HDFS-9100. HDFS Balancer does not respect (yzhang: rev 1037ee580f87e6bf13155834c36f26794381678b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Fix For: 2.8.0 > > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8164) cTime is 0 in VERSION file for newly formatted NameNode.
[ https://issues.apache.org/jira/browse/HDFS-8164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-8164: Status: Patch Available (was: Open) > cTime is 0 in VERSION file for newly formatted NameNode. > > > Key: HDFS-8164 > URL: https://issues.apache.org/jira/browse/HDFS-8164 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.0.3-alpha >Reporter: Chris Nauroth >Assignee: Xiao Chen >Priority: Minor > Attachments: HDFS-8164.001.patch, HDFS-8164.002.patch, > HDFS-8164.003.patch > > > After formatting a NameNode and inspecting its VERSION file, the cTime > property shows 0. The value does get updated to current time during an > upgrade, but I believe this is intended to be the creation time of the > cluster, and therefore the initial value of 0 before an upgrade can cause > confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941898#comment-14941898 ] Hudson commented on HDFS-9015: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #473 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/473/]) HDFS-9015. Refactor TestReplicationPolicy to test different block (lei: rev a68b6eb0f4110ba626a44fad6b9eb5d8c5a4901f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941897#comment-14941897 ] Hudson commented on HDFS-9100: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #473 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/473/]) HDFS-9100. HDFS Balancer does not respect (yzhang: rev 1037ee580f87e6bf13155834c36f26794381678b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Fix For: 2.8.0 > > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9180) Update excluded DataNodes in DFSStripedOutputStream based on failures in data streamers
[ https://issues.apache.org/jira/browse/HDFS-9180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941913#comment-14941913 ] Hadoop QA commented on HDFS-9180: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 39s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 52s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 11s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 188m 46s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 32s | Tests passed in hadoop-hdfs-client. | | | | 239m 43s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.web.TestWebHDFSOAuth2 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestParallelShortCircuitRead | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764816/HDFS-9180.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a68b6eb | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12775/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12775/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12775/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12775/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12775/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12775/console | This message was automatically generated. > Update excluded DataNodes in DFSStripedOutputStream based on failures in data > streamers > --- > > Key: HDFS-9180 > URL: https://issues.apache.org/jira/browse/HDFS-9180 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > Attachments: HDFS-9180.000.patch, HDFS-9180.001.patch, > HDFS-9180.002.patch > > > This is a TODO in HDFS-9040: based on the failures all the striped data > streamers hit, the DFSStripedOutputStream should keep a record of all the > DataNodes that should be excluded. > This jira will also fix several bugs in the DFSStripedOutputStream. Will > provide more details in the comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9100) HDFS Balancer does not respect dfs.client.use.datanode.hostname
[ https://issues.apache.org/jira/browse/HDFS-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941946#comment-14941946 ] Hudson commented on HDFS-9100: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2386 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/]) HDFS-9100. HDFS Balancer does not respect (yzhang: rev 1037ee580f87e6bf13155834c36f26794381678b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java > HDFS Balancer does not respect dfs.client.use.datanode.hostname > --- > > Key: HDFS-9100 > URL: https://issues.apache.org/jira/browse/HDFS-9100 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, HDFS >Reporter: Yongjun Zhang >Assignee: Casey Brotherton > Fix For: 2.8.0 > > Attachments: HDFS-9100.000.patch, HDFS-9100.001.patch, > HDFS-9100.002.patch, HDFS-9100.003.patch > > > In Balancer Dispatch.java: > {code} >private void dispatch() { > LOG.info("Start moving " + this); > Socket sock = new Socket(); > DataOutputStream out = null; > DataInputStream in = null; > try { > sock.connect( > NetUtils.createSocketAddr(target.getDatanodeInfo().getXferAddr()), > HdfsConstants.READ_TIMEOUT); > {code} > getXferAddr() is called without taking into consideration of > dfs.client.use.datanode.hostname setting, this would possibly fail balancer > run issued from outside a cluster. > Thanks [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9015) Refactor TestReplicationPolicy to test different block placement policies
[ https://issues.apache.org/jira/browse/HDFS-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941949#comment-14941949 ] Hudson commented on HDFS-9015: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2386 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/]) HDFS-9015. Refactor TestReplicationPolicy to test different block (lei: rev a68b6eb0f4110ba626a44fad6b9eb5d8c5a4901f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BaseReplicationPolicyTest.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Refactor TestReplicationPolicy to test different block placement policies > - > > Key: HDFS-9015 > URL: https://issues.apache.org/jira/browse/HDFS-9015 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ming Ma >Assignee: Ming Ma > Fix For: 3.0.0, 2.8.0 > > Attachments: HDFS-9015.patch > > > TestReplicationPolicy can be parameterized so that default policy, upgrade > domain policy and other policies can share some common test cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9191) Typo in Hdfs.java. NoSuchElementException is misspelled
[ https://issues.apache.org/jira/browse/HDFS-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941947#comment-14941947 ] Hudson commented on HDFS-9191: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2386 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/]) HDFS-9191. Typo in Hdfs.java. NoSuchElementException is misspelled. (jghoman: rev 3929ac9340a5c9f26574dc076a449f7e11931527) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Typo in Hdfs.java. NoSuchElementException is misspelled > - > > Key: HDFS-9191 > URL: https://issues.apache.org/jira/browse/HDFS-9191 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Catherine Palmer >Assignee: Catherine Palmer >Priority: Trivial > Labels: newbie > Fix For: 3.0.0 > > Attachments: hdfs-9191.patch > > > Line 241 NoSuchElementException has a typo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9185) Fix null tracer in ErasureCodingWorker
[ https://issues.apache.org/jira/browse/HDFS-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941948#comment-14941948 ] Hudson commented on HDFS-9185: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2386 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2386/]) HDFS-9185. Fix null tracer in ErasureCodingWorker. Contributed by Rakesh (jing9: rev c6cafc77e697317dad0708309b67b900a2e3a413) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES-HDFS-EC-7285.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRecoverStripedFile.java > Fix null tracer in ErasureCodingWorker > -- > > Key: HDFS-9185 > URL: https://issues.apache.org/jira/browse/HDFS-9185 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Rakesh R >Assignee: Rakesh R >Priority: Critical > Fix For: 3.0.0 > > Attachments: HDFS-9185-00.patch, HDFS-9185-01.patch > > > Below is the message taken from build: > {code} > Error Message > Time out waiting for EC block recovery. > Stacktrace > java.io.IOException: Time out waiting for EC block recovery. > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.waitForRecoveryFinished(TestRecoverStripedFile.java:383) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.assertFileBlocksRecovery(TestRecoverStripedFile.java:283) > at > org.apache.hadoop.hdfs.TestRecoverStripedFile.testRecoverAnyBlocks1(TestRecoverStripedFile.java:168) > {code} > Reference : https://builds.apache.org/job/PreCommit-HDFS-Build/12758 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9188) Make block corruption related tests FsDataset-agnostic.
[ https://issues.apache.org/jira/browse/HDFS-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941955#comment-14941955 ] Hadoop QA commented on HDFS-9188: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 8m 4s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 11 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:red}-1{color} | release audit | 0m 13s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 26s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 17s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 191m 35s | Tests failed in hadoop-hdfs. | | | | 215m 11s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles | | | hadoop.hdfs.TestFileCreation | | | hadoop.hdfs.web.TestWebHDFSOAuth2 | | | hadoop.hdfs.server.namenode.TestProcessCorruptBlocks | | | hadoop.hdfs.util.TestByteArrayManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764834/HDFS-9188.001.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 1037ee5 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12776/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12776/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12776/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12776/console | This message was automatically generated. > Make block corruption related tests FsDataset-agnostic. > > > Key: HDFS-9188 > URL: https://issues.apache.org/jira/browse/HDFS-9188 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-9188.000.patch, HDFS-9188.001.patch > > > Currently, HDFS does block corruption tests by directly accessing the files > stored on the storage directories, which assumes {{FsDatasetImpl}} is the > dataset implementation. However, with works like OZone (HDFS-7240) and > HDFS-8679, there will be different FsDataset implementations. > So we need a general way to run whitebox tests like corrupting blocks and crc > files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8766) Implement a libhdfs(3) compatible API
[ https://issues.apache.org/jira/browse/HDFS-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941988#comment-14941988 ] Haohui Mai commented on HDFS-8766: -- bq. I added a simple timeout to clear out the bad datanodes after a specified period of time (default to 2 minutes). I think that should be sufficient for the initial API The default timeout is 10 minutes and the timeout should be bounded to every single data node. This functionality requires a specific gmock test. I think it makes sense to separate the integration test to an another jira. > Implement a libhdfs(3) compatible API > - > > Key: HDFS-8766 > URL: https://issues.apache.org/jira/browse/HDFS-8766 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-8766.HDFS-8707.000.patch, > HDFS-8766.HDFS-8707.001.patch, HDFS-8766.HDFS-8707.002.patch, > HDFS-8766.HDFS-8707.003.patch, HDFS-8766.HDFS-8707.004.patch > > > Add a synchronous API that is compatible with the hdfs.h header used in > libhdfs and libhdfs3. This will make it possible for projects using > libhdfs/libhdfs3 to relink against libhdfspp with minimal changes. > This also provides a pure C interface that can be linked against projects > that aren't built in C++11 mode for various reasons but use the same > compiler. It also allows many other programming languages to access > libhdfspp through builtin FFI interfaces. > The libhdfs API is very similar to the posix file API which makes it easier > for programs built using posix filesystem calls to be modified to access HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9188) Make block corruption related tests FsDataset-agnostic.
[ https://issues.apache.org/jira/browse/HDFS-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942001#comment-14942001 ] Colin Patrick McCabe commented on HDFS-9188: thanks, [~eddyxu]. {{ReplicaToCorrupt}}: it seems like this should be named something like {{MaterializedReplica}}. Its distinguishing factor is that it is the concrete representation of some replica in the {{FSDataset}}. {code} 92/** 93 * Corrupt the block file by deleting it. 94 * @return true if the deletion is completed. 95 */ 96 boolean deleteData(); {code} This should be able to throw an IOE. Same with {{deleteMeta}}. > Make block corruption related tests FsDataset-agnostic. > > > Key: HDFS-9188 > URL: https://issues.apache.org/jira/browse/HDFS-9188 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS, test >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-9188.000.patch, HDFS-9188.001.patch > > > Currently, HDFS does block corruption tests by directly accessing the files > stored on the storage directories, which assumes {{FsDatasetImpl}} is the > dataset implementation. However, with works like OZone (HDFS-7240) and > HDFS-8679, there will be different FsDataset implementations. > So we need a general way to run whitebox tests like corrupting blocks and crc > files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)