[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885092#comment-13885092 ] stack commented on HDFS-5776: - bq. what do you think ? That looks good to me [~xieliang007] bq. ...making the pool size readonly, i can reupload a new patch. We can add back the flexibility in a later issue -- i.e. being able to adjust pool size on the fly. I suggest posting a patch where the pool size is read from the configuration and is read-only post construction. It would address an above reviewers concern and I believe address all outstanding concerns. Base your revision on v10 if you don't mind. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885091#comment-13885091 ] Liang Xie commented on HDFS-5776: - bq. I think a better way is to add this check in chooseDataNode: if chooseDataNode finds that this is for seeking the second DN (if ignored is not null), and it could not immediately/easily find a DN, the chooseDataNode should skip retrying and we may want to fall back to the normal read. Yeh, sound reasonable. will look into it later once get chance. P.S. i am taking a 8+ days long holiday(China Spring Festival) and probably can not reply or make patch timely, sorry. Happy Holiday to all guys, thanks for looking at this JIRA !!! > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885083#comment-13885083 ] Liang Xie commented on HDFS-5776: - bq. we do not check threadpool in enableHedgedReads. This makes it possible that isHedgedReadsEnabled() returns true while hedged read is actually not enabled. i can change to sth like those if you gys want: {code} return allowHedgedReads && (HEDGED_READ_THREAD_POOL != null) && HEDGED_READ_THREAD_POOL.getMaximumPoolSize() > 0; {code} what do you think ? bq. DFSClient#setThreadsNumForHedgedReads allows users to keep changing the size of the thread pool. we definitely need the ability to modify the pool size on the fly, especially for HBase ops. bq. Read the thread pool size configuration only when initializing the thread pool, and the size should be >0 and cannot be changed Here is the same disagreement, if you guys all still insist on making the pool size readonly, i can reupload a new patch. Per my few previous operation experience, it's absolutely inconvenienced to an system ops/admin. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885045#comment-13885045 ] Hadoop QA commented on HDFS-5845: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625772/hdfs-5845-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5975//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5975//console This message is automatically generated. > SecondaryNameNode dies when checkpointing with cache pools > -- > > Key: HDFS-5845 > URL: https://issues.apache.org/jira/browse/HDFS-5845 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Labels: caching > Attachments: hdfs-5845-1.patch > > > The SecondaryNameNode clears and reloads its FSNamesystem when doing > checkpointing. However, FSNamesystem#clear does not clear CacheManager state > during this reload. This leads to an error like the following: > {noformat} > org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885036#comment-13885036 ] Suresh Srinivas commented on HDFS-5709: --- [~andrew.wang], that looks good. The likelihood of collision in case of .snapshot.LV.UPGRADE_RENAMED is probably very low. When namenode fails to ugprade due to reserved name collision, it should print out all the list of reserved names in the file system along with the error need to do -upgrade with -renameReserved flag. That way users know to pass all the reserved names and their corresponding preferred name, if they choose to use key/value pairs. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885032#comment-13885032 ] Akira AJISAKA commented on HDFS-5844: - Thank you for committing, [~arpitagarwal]! > Fix broken link in WebHDFS.apt.vm > - > > Key: HDFS-5844 > URL: https://issues.apache.org/jira/browse/HDFS-5844 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5844.patch > > > There is one broken link in WebHDFS.apt.vm. > {code} > {{{RemoteException JSON Schema}}} > {code} > should be > {code} > {{RemoteException JSON Schema}} > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885027#comment-13885027 ] Hadoop QA commented on HDFS-5746: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625767/HDFS-5746.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1546 javac compiler warnings (more than the trunk's current 1541 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated -14 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5972//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5972//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5972//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5972//console This message is automatically generated. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch, HDFS-5746.004.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885013#comment-13885013 ] Hudson commented on HDFS-5844: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5057/]) HDFS-5844. Fix broken link in WebHDFS.apt.vm (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1562357) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm > Fix broken link in WebHDFS.apt.vm > - > > Key: HDFS-5844 > URL: https://issues.apache.org/jira/browse/HDFS-5844 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5844.patch > > > There is one broken link in WebHDFS.apt.vm. > {code} > {{{RemoteException JSON Schema}}} > {code} > should be > {code} > {{RemoteException JSON Schema}} > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5844: Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 for the patch. Generated site and verified it fixes the link. I committed this to trunk, branch-2 and branch-2.3 Thanks for the contribution [~ajisakaa]. > Fix broken link in WebHDFS.apt.vm > - > > Key: HDFS-5844 > URL: https://issues.apache.org/jira/browse/HDFS-5844 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5844.patch > > > There is one broken link in WebHDFS.apt.vm. > {code} > {{{RemoteException JSON Schema}}} > {code} > should be > {code} > {{RemoteException JSON Schema}} > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885006#comment-13885006 ] Hadoop QA commented on HDFS-5153: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625769/HDFS-5153.05.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5971//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5971//console This message is automatically generated. > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch, HDFS-5153.04.patch, HDFS-5153.05.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5709) Improve upgrade with existing files and directories named ".snapshot"
[ https://issues.apache.org/jira/browse/HDFS-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884987#comment-13884987 ] Andrew Wang commented on HDFS-5709: --- I had a quick call with Suresh to hash this out, and we arrived at the following which should be suitable for everyone: * Rather than a configuration option which can stick around forever, an additional command line flag (e.g. "-upgrade -renameReserved") is better. This way we worry about it once, and there are no lingering effects. * We default to renaming reserved paths to a convention like {{.snapshot.LV.UPGRADE_RENAMED}}, but also allow users to pass key/value pairs on the command line, e.g. "-upgrade -renameReserved .snapshot=.user-snapshot". In either case, we should do our best to detect collisions, but it's hard with the edit log. * It'd be good to do this for "/.reserved" too, which will help demonstrate that this is a generic solution. I think this is an accurate summary, so I'll start revving the patch as per above. Please comment if something is still off. > Improve upgrade with existing files and directories named ".snapshot" > - > > Key: HDFS-5709 > URL: https://issues.apache.org/jira/browse/HDFS-5709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: snapshots, upgrade > Attachments: hdfs-5709-1.patch, hdfs-5709-2.patch, hdfs-5709-3.patch, > hdfs-5709-4.patch, hdfs-5709-5.patch > > > Right now in trunk, upgrade fails messily if the old fsimage or edits refer > to a directory named ".snapshot". We should at least print a better error > message (which I believe was the original intention in HDFS-4666), and [~atm] > proposed automatically renaming these files and directories. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884983#comment-13884983 ] Arpit Agarwal commented on HDFS-5318: - I am +1 on this approach. I think it's fine to document the requirement around reporting non-finalized replicas. Unless anyone else has objections I'll review the latest patch this week. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, > HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-3828) Block Scanner rescans blocks too frequently
[ https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884967#comment-13884967 ] Hadoop QA commented on HDFS-3828: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543965/hdfs-3828-3.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5974//console This message is automatically generated. > Block Scanner rescans blocks too frequently > --- > > Key: HDFS-3828 > URL: https://issues.apache.org/jira/browse/HDFS-3828 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 >Reporter: Andy Isaacson >Assignee: Andy Isaacson > Fix For: 2.3.0 > > Attachments: hdfs-3828-1.txt, hdfs-3828-2.txt, hdfs-3828-3.txt, > hdfs3828.txt > > > {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from > {{DataBlockScanner#run}} via {{scanBlockPoolSlice}}. But cleanUp > unconditionally roll()s the verificationLogs, so after two iterations we have > lost the first iteration of block verification times. As a result a cluster > with just one block repeatedly rescans it every 10 seconds: > {noformat} > 2012-08-16 15:59:57,884 INFO datanode.BlockPoolSliceScanner > (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for > BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915 > 2012-08-16 16:00:07,904 INFO datanode.BlockPoolSliceScanner > (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for > BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915 > 2012-08-16 16:00:17,925 INFO datanode.BlockPoolSliceScanner > (BlockPoolSliceScanner.java:verifyBlock(391)) - Verification succeeded for > BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915 > {noformat} > {quote} > To fix this, we need to avoid roll()ing the logs multiple times per period. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-3907) Allow multiple users for local block readers
[ https://issues.apache.org/jira/browse/HDFS-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884962#comment-13884962 ] Hadoop QA commented on HDFS-3907: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544410/hdfs-3907.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5973//console This message is automatically generated. > Allow multiple users for local block readers > > > Key: HDFS-3907 > URL: https://issues.apache.org/jira/browse/HDFS-3907 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 1.0.0, 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Fix For: 2.3.0 > > Attachments: hdfs-3907.txt > > > The {{dfs.block.local-path-access.user}} config added in HDFS-2246 only > supports a single user, however as long as blocks are group readable by more > than one user the feature could be used by multiple users, to support this we > just need to allow both to be configured. In practice this allows us to also > support HBase where the client (RS) runs as the hbase system user and the DN > runs as hdfs system user. I think this should work secure as well since we're > not using impersonation in the HBase case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5845: -- Labels: caching (was: ) Status: Patch Available (was: Open) > SecondaryNameNode dies when checkpointing with cache pools > -- > > Key: HDFS-5845 > URL: https://issues.apache.org/jira/browse/HDFS-5845 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Labels: caching > Attachments: hdfs-5845-1.patch > > > The SecondaryNameNode clears and reloads its FSNamesystem when doing > checkpointing. However, FSNamesystem#clear does not clear CacheManager state > during this reload. This leads to an error like the following: > {noformat} > org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5845: -- Attachment: hdfs-5845-1.patch Patch attached. This was pretty simple, but requires taking the FSN writelock on the 2NN since we have a bunch of write lock asserts in CacheManager. I think this is okay since we already do this in the SbNN, but someone should weigh in if this isn't okay. {{diff -w}} helps with reviewing the test change, since I needed to indent a test by one. > SecondaryNameNode dies when checkpointing with cache pools > -- > > Key: HDFS-5845 > URL: https://issues.apache.org/jira/browse/HDFS-5845 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > Labels: caching > Attachments: hdfs-5845-1.patch > > > The SecondaryNameNode clears and reloads its FSNamesystem when doing > checkpointing. However, FSNamesystem#clear does not clear CacheManager state > during this reload. This leads to an error like the following: > {noformat} > org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5318) Support read-only and read-write paths to shared replicas
[ https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884935#comment-13884935 ] Hadoop QA commented on HDFS-5318: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624805/HDFS-5318c-branch-2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5970//console This message is automatically generated. > Support read-only and read-write paths to shared replicas > - > > Key: HDFS-5318 > URL: https://issues.apache.org/jira/browse/HDFS-5318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.3.0 >Reporter: Eric Sirianni > Attachments: HDFS-5318.patch, HDFS-5318a-branch-2.patch, > HDFS-5318b-branch-2.patch, HDFS-5318c-branch-2.patch, hdfs-5318.pdf > > > There are several use cases for using shared-storage for datanode block > storage in an HDFS environment (storing cold blocks on a NAS device, Amazon > S3, etc.). > With shared-storage, there is a distinction between: > # a distinct physical copy of a block > # an access-path to that block via a datanode. > A single 'replication count' metric cannot accurately capture both aspects. > However, for most of the current uses of 'replication count' in the Namenode, > the "number of physical copies" aspect seems to be the appropriate semantic. > I propose altering the replication counting algorithm in the Namenode to > accurately infer distinct physical copies in a shared storage environment. > With HDFS-5115, a {{StorageID}} is a UUID. I propose associating some minor > additional semantics to the {{StorageID}} - namely that multiple datanodes > attaching to the same physical shared storage pool should report the same > {{StorageID}} for that pool. A minor modification would be required in the > DataNode to enable the generation of {{StorageID}} s to be pluggable behind > the {{FsDatasetSpi}} interface. > With those semantics in place, the number of physical copies of a block in a > shared storage environment can be calculated as the number of _distinct_ > {{StorageID}} s associated with that block. > Consider the following combinations for two {{(DataNode ID, Storage ID)}} > pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B: > * {{DN_A != DN_B && S_A != S_B}} - *different* access paths to *different* > physical replicas (i.e. the traditional HDFS case with local disks) > ** → Block B has {{ReplicationCount == 2}} > * {{DN_A != DN_B && S_A == S_B}} - *different* access paths to the *same* > physical replica (e.g. HDFS datanodes mounting the same NAS share) > ** → Block B has {{ReplicationCount == 1}} > For example, if block B has the following location tuples: > * {{DN_1, STORAGE_A}} > * {{DN_2, STORAGE_A}} > * {{DN_3, STORAGE_B}} > * {{DN_4, STORAGE_B}}, > the effect of this proposed change would be to calculate the replication > factor in the namenode as *2* instead of *4*. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5845: -- Priority: Blocker (was: Major) > SecondaryNameNode dies when checkpointing with cache pools > -- > > Key: HDFS-5845 > URL: https://issues.apache.org/jira/browse/HDFS-5845 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Blocker > > The SecondaryNameNode clears and reloads its FSNamesystem when doing > checkpointing. However, FSNamesystem#clear does not clear CacheManager state > during this reload. This leads to an error like the following: > {noformat} > org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
[ https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884927#comment-13884927 ] Suresh Srinivas commented on HDFS-5845: --- [~andrew.wang], I am marking this as blocker for 2.3.0. > SecondaryNameNode dies when checkpointing with cache pools > -- > > Key: HDFS-5845 > URL: https://issues.apache.org/jira/browse/HDFS-5845 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.3.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > > The SecondaryNameNode clears and reloads its FSNamesystem when doing > checkpointing. However, FSNamesystem#clear does not clear CacheManager state > during this reload. This leads to an error like the following: > {noformat} > org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5153: Attachment: HDFS-5153.05.patch Rebase patch. > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch, HDFS-5153.04.patch, HDFS-5153.05.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools
Andrew Wang created HDFS-5845: - Summary: SecondaryNameNode dies when checkpointing with cache pools Key: HDFS-5845 URL: https://issues.apache.org/jira/browse/HDFS-5845 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Andrew Wang The SecondaryNameNode clears and reloads its FSNamesystem when doing checkpointing. However, FSNamesystem#clear does not clear CacheManager state during this reload. This leads to an error like the following: {noformat} org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists. {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884919#comment-13884919 ] Andrew Wang commented on HDFS-5746: --- bq. It doesn't infinitely loop, because sendCallback always removes the fd from toRemove. I missed this, good point. The verify I wanted was a second look at the code, no need for a test. bq. I like the current terminology. "lockable" just sounds vague Okay, I'm alright with "anchorable" for the flag. Can we change the name of the refcount field and methods though? "anchor" and "unanchor" do not sound not incremental operations to me, and the field being named "anchor" does not evoke a count. bq, Yea, I was wondering since I didn't see a field and accessor for the slot index. I assume that'll be added at some point though. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch, HDFS-5746.004.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5153) Datanode should send block reports for each storage in a separate message
[ https://issues.apache.org/jira/browse/HDFS-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HDFS-5153: --- Assignee: Arpit Agarwal > Datanode should send block reports for each storage in a separate message > - > > Key: HDFS-5153 > URL: https://issues.apache.org/jira/browse/HDFS-5153 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.0.0 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-5153.01.patch, HDFS-5153.03.patch, > HDFS-5153.03b.patch, HDFS-5153.04.patch > > > When the number of blocks on the DataNode grows large we start running into a > few issues: > # Block reports take a long time to process on the NameNode. In testing we > have seen that a block report with 6 Million blocks takes close to one second > to process on the NameNode. The NameSystem write lock is held during this > time. > # We start hitting the default protobuf message limit of 64MB somewhere > around 10 Million blocks. While we can increase the message size limit it > already takes over 7 seconds to serialize/unserialize a block report of this > size. > HDFS-2832 has introduced the concept of a DataNode as a collection of > storages i.e. the NameNode is aware of all the volumes (storage directories) > attached to a given DataNode. This makes it easy to split block reports from > the DN by sending one report per storage directory to mitigate the above > problems. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884914#comment-13884914 ] Andrew Wang commented on HDFS-5746: --- Few more test-related comments: * Tests for {{DSW#remove}} would be good, even when the race is fixed properly. If I'm right about the inf loop, a test would have caught it. * TestSCSMS, testStartupShutdown seems like a strict subset of testAllocateSlots functionality. * How about some prodding of a closed SCSMS too? Would also be good to test a couple of the other {{free()}} paths of SCSMS, since it can happen at close of the last slot, the SCSMS, and in allocateNextSlot too. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch, HDFS-5746.004.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5746: --- Attachment: HDFS-5746.004.patch > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch, HDFS-5746.004.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884911#comment-13884911 ] Colin Patrick McCabe commented on HDFS-5746: bq. Can we verify the fix for racing sendCallback and toRemove? I think we need to check that the fd being removed is in entries before doing sendCallback. firstEntry also doesn't remove the entry from toRemove, so it looks like this inf loops. pollFirstEntry instead? It doesn't infinitely loop, because sendCallback always removes the fd from toRemove. I can't think of any practical way to test the scenario you outlined, with an event happening on {{sendCallback}} racing with the same fd added to {{toRemove}} . Maybe a stress test would hit it. bq. Maybe remove() should also return a boolean "success" value too, rather than just swallowing an unknown socket. It's not needed because if we try to remove something that doesn't exist, we hit a precondition check. bq. Should doc that we only support one Handler per fd, it overwrites on add. added this comment bq. Can add a Precondition check to make sure the lock is held in checkNotClosed added bq. Flag constants would be more readable as "1<<63" and "1<<62" rather than 15 zeroes (I did verify though ) ok bq. Comment in Slot constructor talks about incrementing a refcount, but that's no longer happening there. No need to throw IOException in Slot constructor fixed bq. Terminology: it seems like the "anchorable" flag means "is mlocked by DN and can increment the refcount" and "anchor" is a refcount for "using mlocked data" I like the current terminology. "lockable" just sounds vague-- especially because we already have an operation which is (m)locking the block on the datanode, so it gets confusing to use the same term for what the client is doing. bq. How do we communicate the slot index between the DN and client? I see we keep the slot address, but what we need to pass to the client is an index. Maybe this is coming. the DN will have to pass the slot index as part of the response to the REQUEST_SHORT_CIRCUIT_FDS operation. It will also pass the shared memory segment itself as part of that operation :) actually, it's a bit more complex than that... if there is an outstanding shm segment, the DN will try to reuse it-- otherwise it will create a new one. But since all the slots are the same size and interchangeable, the allocation is not that complex. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-4284) BlockReaderLocal not notified of failed disks
[ https://issues.apache.org/jira/browse/HDFS-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HDFS-4284: - Assignee: Jimmy Xiang > BlockReaderLocal not notified of failed disks > - > > Key: HDFS-4284 > URL: https://issues.apache.org/jira/browse/HDFS-4284 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0, 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Jimmy Xiang > > When a DN marks a disk as bad, it stops using replicas on that disk. > However a long-running {{BlockReaderLocal}} instance will continue to access > replicas on the failing disk. > Somehow we should let the in-client BlockReaderLocal know that a disk has > been marked as bad so that it can stop reading from the bad disk. > From HDFS-4239: > bq. To rephrase that, a long running BlockReaderLocal will ride over local DN > restarts and disk "ejections". We had to drain the RS of all its regions in > order to stop it from using the bad disk. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884897#comment-13884897 ] Jimmy Xiang commented on HDFS-4239: --- Cool. I agree. Attached v2 that released all references to the volume marked down. In my test, I don't see any open file descriptor pointing to the volume marked down. > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Attachment: hdfs-4239_v2.patch > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk
[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-4239: -- Status: Patch Available (was: Open) > Means of telling the datanode to stop using a sick disk > --- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: stack >Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884892#comment-13884892 ] Hadoop QA commented on HDFS-5804: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625685/HDFS-5804.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5966//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5966//console This message is automatically generated. > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > exception-as-root.log, javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.ne
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884886#comment-13884886 ] Andrew Wang commented on HDFS-5746: --- Thanks Colin, more comments: * For the {{notificationSockets}} javadoc, I basically just wanted the explanation you gave: it's a socketpair, where the loop listens on 1, clients kick the loop by writing on 0. * Can we verify the fix for racing {{sendCallback}} and {{toRemove}}? I think we need to check that the fd being removed is in {{entries}} before doing {{sendCallback}}. {{firstEntry}} also doesn't remove the entry from {{toRemove}}, so it looks like this inf loops. {{pollFirstEntry}} instead? * Maybe {{remove()}} should also return a boolean "success" value too, rather than just swallowing an unknown socket. Were these comments addressed? {quote} * Should doc that we only support one Handler per fd, it overwrites on add. * Can add a Precondition check to make sure the lock is held in checkNotClosed {quote} ShortCircuitSharedMemorySegment: * Flag constants would be more readable as "1<<63" and "1<<62" rather than 15 zeroes (I did verify though :)) * Comment in Slot constructor talks about incrementing a refcount, but that's no longer happening there. * No need to throw IOException in Slot constructor. * Terminology: it seems like the "anchorable" flag means "is mlocked by DN and can increment the refcount" and "anchor" is a refcount for "using mlocked data". Renaming things would make this clearer, e.g. "lockable" for the flag, and then "lockcount" for the count. IMO, incrementing an anchor is not a great physical analogy :) * How do we communicate the slot index between the DN and client? I see we keep the slot address, but what we need to pass to the client is an index. Maybe this is coming. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5746: --- Attachment: HDFS-5746.003.patch Fix javadoc warnings. javac warnings are about the use of {{sun.misc.Unsafe}}, and are unavoidable. Findbugs warning should be fixed (hopefully) by making {{DomainSocketWatcher}} a final class. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, > HDFS-5746.003.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884827#comment-13884827 ] Daryn Sharp commented on HDFS-5804: --- Looks good! Just fix the javadoc and audit warnings. > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > exception-as-root.log, javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) > at > org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(
[jira] [Commented] (HDFS-5771) Track progress when loading fsimage
[ https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884810#comment-13884810 ] Chris Nauroth commented on HDFS-5771: - Hi, Haohui. A couple of notes: # I see there are multiple sections that do {{beginStep}}/{{endStep}} for {{StepType#INODES}}. Considering the way the {{StartupProgress}} class works, the effect of this will be that progress jumps to 100% complete the first time {{endStep}} gets called. After that, the subsequent calls to {{beginStep}}/{{endStep}} are no-ops. Are all of the various inode sections serialized sequentially in the new format? If so, then would it be possible to do the {{beginStep}} call for {{StepType#INODES}} before the first inode section, and then do the {{endStep}} after the last inode section? # There is a similar situation with {{saveInodes}} and {{saveSnapshots}} trying to begin/end the same step. > Track progress when loading fsimage > --- > > Key: HDFS-5771 > URL: https://issues.apache.org/jira/browse/HDFS-5771 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-5698 (FSImage in protobuf) >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch > > > The old code that loads the fsimage tracks the progress during loading. This > jira proposes to implement the same functionality in the new code which > serializes the fsimage using protobuf.. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884795#comment-13884795 ] Hadoop QA commented on HDFS-5746: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625650/HDFS-5746.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1550 javac compiler warnings (more than the trunk's current 1545 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5962//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5962//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5962//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5962//console This message is automatically generated. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlocked. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HDFS-5804: -- Attachment: HDFS-5804.patch > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > exception-as-root.log, javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) > at > org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultC
[jira] [Assigned] (HDFS-5780) TestRBWBlockInvalidation times out intemittently on branch-2
[ https://issues.apache.org/jira/browse/HDFS-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai reassigned HDFS-5780: --- Assignee: Mit Desai > TestRBWBlockInvalidation times out intemittently on branch-2 > > > Key: HDFS-5780 > URL: https://issues.apache.org/jira/browse/HDFS-5780 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Mit Desai >Assignee: Mit Desai > > i recently found out that the test > TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN times > out intermittently. > I am using Fedora, JDK7 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-4564) Webhdfs returns incorrect http response codes for denied operations
[ https://issues.apache.org/jira/browse/HDFS-4564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-4564: -- Attachment: HDFS-4564.patch HDFS-4564.branch-23.patch > Webhdfs returns incorrect http response codes for denied operations > --- > > Key: HDFS-4564 > URL: https://issues.apache.org/jira/browse/HDFS-4564 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: webhdfs >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Attachments: HDFS-4564.branch-23.patch, HDFS-4564.branch-23.patch, > HDFS-4564.patch > > > Webhdfs is returning 401 (Unauthorized) instead of 403 (Forbidden) when it's > denying operations. Examples including rejecting invalid proxy user attempts > and renew/cancel with an invalid user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5841: -- Attachment: hdfs-5841-3.patch Rebase, surprised this has gone stale already. > Update HDFS caching documentation with new changes > -- > > Key: HDFS-5841 > URL: https://issues.apache.org/jira/browse/HDFS-5841 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: caching > Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch > > > The caching documentation is a little out of date, since it's missing > description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884715#comment-13884715 ] Hadoop QA commented on HDFS-5804: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625663/HDFS-5804.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated -14 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-nfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5963//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/5963//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5963//console This message is automatically generated. > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, > javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884710#comment-13884710 ] Arpit Agarwal commented on HDFS-5776: - I've stated my concerns but if there is broad consensus we don't need caps I won't hold up the checkin. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884705#comment-13884705 ] Hadoop QA commented on HDFS-5841: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625660/hdfs-5841-2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5964//console This message is automatically generated. > Update HDFS caching documentation with new changes > -- > > Key: HDFS-5841 > URL: https://issues.apache.org/jira/browse/HDFS-5841 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: caching > Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch > > > The caching documentation is a little out of date, since it's missing > description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884694#comment-13884694 ] stack commented on HDFS-5776: - Thanks lads. We are almost there. [~xieliang007] It is better if we work through the issues here before the patch goes in especially while you have the attention of quality reviewers. From your POV, I'm sure it a little frustrating trying to drive the patch home between differing opinions (The time difference doesn't help either -- smile). Try to salve any annoyance with the thought that, though it may appear otherwise, folks here are trying to work together to help get the best patch in. Good on you Liang. [~xieliang007] I'd agree with the last few [~jingzhao] review comments. What you think? [~arpitagarwal] Do you buy [~cmccabe]'s argument? It is good by me. If you agree, lets shift the focus to v10 and leave the v9 style behind. Good stuff > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HDFS-5804: -- Attachment: HDFS-5804.patch Removed all the security checks. > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, > javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) > at > org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty
[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-5841: -- Attachment: hdfs-5841-2.patch Thanks for the review Colin, patch attached. I also updated the help text in CacheAdmin to match your recommendation. > Update HDFS caching documentation with new changes > -- > > Key: HDFS-5841 > URL: https://issues.apache.org/jira/browse/HDFS-5841 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: caching > Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch > > > The caching documentation is a little out of date, since it's missing > description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884573#comment-13884573 ] Suresh Srinivas commented on HDFS-5776: --- bq. We do not check other configuration settings to see if they are "reasonable." [~cmccabe], I agree with the points you have made. Checking for reasonable value for the new config does not seem necessary. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes
[ https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884561#comment-13884561 ] Colin Patrick McCabe commented on HDFS-5841: {code} This can also be manually specified by "never". {code} This seems awkward. How about "'never' specifies that there is no limit." +1 once that's addressed. > Update HDFS caching documentation with new changes > -- > > Key: HDFS-5841 > URL: https://issues.apache.org/jira/browse/HDFS-5841 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Andrew Wang >Assignee: Andrew Wang > Labels: caching > Attachments: hdfs-5841-1.patch > > > The caching documentation is a little out of date, since it's missing > description of features like TTL and expiration. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884554#comment-13884554 ] Colin Patrick McCabe commented on HDFS-5776: [~arpitagarwal] : if I understand your comments correctly, you are concerned that hedged reads may spawn too many threads. But that's why {{dfs.client.hedged.read.threadpool.size}} exists. The {{DFSClient}} will not create more threads than this. We do not check other configuration settings to see if they are "reasonable." For example, if someone wants to set {{dfs.balancer.dispatcherThreads}}, {{dfs.balancer.moverThreads}}, or {{dfs.datanode.max.transfer.threads}} to a zillion, we don't complain. If we tried to set hard limits everywhere, people with different needs would have to recompile hadoop to meet those needs. Please remember that, if the client wants to, he/she can sit in a loop and call {{new Thread(...)}}. It's not like by giving users the ability to control the number of threads they use, we are opening up some new world of security vulnerabilities. The ability for the client to create any number of threads already exists. And it only inconveniences one person: the client themselves. [~sureshms]: I agree that we should figure out the configuration issues here rather than changing the configuration in an incompatible way later. Jing suggested adding "an Allow-Hedged-Reads configuration" boolean. That certainly seems to solve the problem of having different threads use different settings. Is there any objection, besides the inelegance of having two configs rather than one? > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884536#comment-13884536 ] Suresh Srinivas commented on HDFS-5698: --- bq. It'll be important to know if NN's with huge images will be unable to load their images w/o more heap allocation. All the objects created are short lived. Hence this should not affect NN heap allocation. However, it would be interesting to see the time spent in GC and rate of garbage creation. > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment
[ https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5746: --- Attachment: HDFS-5746.002.patch bq. I didn't see anything named ShortCircuitSharedMemorySegment in the patch, should it be included? It should be there... bq. Javadoc for SharedFileDescriptorFactory constructor added bq. rand() isn't reentrant, potentially making it unsafe for createDescriptor0. Should we use rand_r instead, or slap a synchronized on it? Apparently, on Linux rand is re-entrant because glibc puts a mutex around it. But you're right, we should be POSIX-compliant here. I added a mutex around rand. Using the reentrant versions would be awkward because of the need to pass around state somehow (probably a java array). bq. Also not sure why we concat two rand(). Seems like one should be enough with the collision detection code. fair enough. bq. The open is done with mode 0777, wouldn't 0700 be safer? I thought we were passing these over a domain socket, so we can keep the permissions locked up. Good point. we don't want random users to be able to open this file during the brief period it exists in the namespace. bq. Paranoia, should we do a check in CloseableReferenceCount#reference for overflow to the closed bit? I know we have 30 bits, but who knows. Well, this code was just moved from DomainSocket.java, not changed. The issue is that we want to use atomic addition, not compare-and-exchange, for speed. Given that, all we know is the state after the addition, not before. This is fairly performance-critical for UNIX domain sockets (it has to do this before every socket operation) so it has to be fast. The failure mode also seems fairly benign: the refcount overflows into the closed bit and causes the socket to appear closed. At some point we should evaulate a 64-bit counter. It might be just as fast on 64-bit machines. bq. Unrelated nit: DomainSocket#write(byte[], int, int) boolean exec is indented wrong, mind fixing it? ok bq. \[DomainSocketWatcher\] javadoc is c+p from DomainSocket, I think it should be updated for DSW. Some high-level description of how the nested classes fit together would be nice. added bq. Some Java-isms. Runnable is preferred over Thread. It's also weird that DSW is a Thread subclass and it calls start on itself. An inner class implementing Runnable would be more idiomatic. It's kind of annoying that using an inner Runnable class would increase the indentation of run(). Still, I suppose it does provide better isolation, making it impossible to invoke random Thread methods on the DomainSocketWatcher. So I will implement that. bq. Explain use of loopSocks 0 versus loopSocks 1? This is a crucial part of this class: we need to use a socketpair rather than a plain condition variable because of blocking on poll. It's arbitrary: both sockets are connected to one another and exactly alike. I chose to listen on 1 and write on 0, but I could easily have made the opposite choice. bq. "loopSocks" is also not a very descriptive name, maybe "wakeupPair" or "eventPair" instead? I changed it to {{notificationSockets}}. Can add a Precondition check to make sure the lock is held in checkNotClosed If we fail to kick, add and remove could block until the poll timeout. Should doc that we only support one Handler per fd, it overwrites on add. Maybe Precondition this instead if we don't want to overwrite, I can't tell from context here. bq. The repeated calls to sendCallback are worrisome. For instance, a sock could be EOF and closed, be removed by the first sendCallback, and then if there's a pending toRemove for the sock, the second sendCallback aborts on the Precondition check. Good catch. Fixed. bq. closeAll parameter in sendCallback is unused removed bq. This comment probably means to refer to loopSocks: // Close shutdownSocketPair\[0\], so that shutdownSocketPair\[1\] gets an EOF ok bq. This comment probably meant poll, not select: // were waiting in select(). ok bq. Why are two of the @Test in TestDomainSocketWatcher commented out? fixed bq. Timeouts seem kind of long, these should be super fast tests right? reduced. I didn't want to reduce too much to avoid flakiness. > add ShortCircuitSharedMemorySegment > --- > > Key: HDFS-5746 > URL: https://issues.apache.org/jira/browse/HDFS-5746 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, hdfs-client >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Fix For: 3.0.0 > > Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch > > > Add ShortCircuitSharedMemorySegment, which will be used to communicate > information between the datanode and the client about whether a replica is > mlock
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884531#comment-13884531 ] Daryn Sharp commented on HDFS-5698: --- You may want to investigate if the inodemap will perform better with a {{ConcurrentHashMap}} than a {{LightWeightGSet}}. That will increase the parallelism of the map insertion. I think the gset was chosen for memory concerns. Assuming you plan to parallelize the parent/child linkages, I think the {{addChild}} may need to be in a synchronized block unless the inodeMap is made concurrent. I'm not a snapshot expert, but I wonder how thread-safe the snapshot manager is. Are the directory diffs constructed "on the fly" during addition of the children, or are they stored separately in the fsimage? We just need to be certain it's actually feasible to offset a ~2X increase in load time. Also, did you happen to gather heap usage statistics? Is part of the load increase maybe due to increased GC? It'll be important to know if NN's with huge images will be unable to load their images w/o more heap allocation. > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884471#comment-13884471 ] Haohui Mai commented on HDFS-5698: -- Our profiling results show that the parsing the bytes and constructing the protobuf objects take significant amount of time. The work is parallelized like the following: {code} while (has data) { bytes[] data = read(); thread_pool.submit(parse_data(data)); } parse_data_for_inode(data) { INode inode = construct(data); synchronized(inodemap) { inodemap.add(inode); } block_map_thread_pool.submit(update_block_map(data)); } parse_data_for_inode_dir(data) { foreach (children : data.getChildren()) inodemap.get(data.getparent()).addChild(inodemap.get(children)) } {code} Two things are worth noting. (1) The contention only happens when adding the inode into the inodemap. (2) Updating the block maps happens in parallel. Our profiling results show that updating the block maps can take up to 20% of the execution time. The latency can be hidden in the above implementation. I've only tested an early prototype on my laptop. With 4 threads it brings the load latency comparable to the old format. To report comparable numbers, however, I'll need to update the code and rerun the test on the machine that I ran my previous tests. > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5828) BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same node group when dfs.namenode.avoid.write.stale.datanode is true
[ https://issues.apache.org/jira/browse/HDFS-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Buddy updated HDFS-5828: Attachment: HDFS-5828.patch The problem was BlockPlacementPolicyDefault.chooseTarget was manually adding the nodes in the results list to the excluded nodes list instead of using the addToExcludedNodes method. The addToExcludedNodes method is overridden by BlockPlacementPolicyWithNodeGroup to also exclude other nodes in the same node group. > BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same > node group when dfs.namenode.avoid.write.stale.datanode is true > - > > Key: HDFS-5828 > URL: https://issues.apache.org/jira/browse/HDFS-5828 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Buddy > Attachments: HDFS-5828.patch > > > When placing replicas using the replica placement policy > BlockPlacementPolicyWithNodeGroup, the number of targets returned should be > less than or equal to the number of node groups and no node group should get > two replicas of the same block. The Junit test > TestReplicationPolicyWithNodeGroup.testChooseMoreTargetsThanNodeGroups > verifies this. > However, if the conf property "dfs.namenode.avoid.write.stale.datanode" is > set to true, then block placement policy will return more targets than node > groups when the number of replicas requested exceeds the number of node > groups. > This can be seen by putting: >CONF.setBoolean(DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY, true); > in the setup method for TestReplicationPolicyWithNodeGroup. This will cause > testChooseMoreTargetsThanNodeGroups to fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5804) HDFS NFS Gateway fails to mount and proxy when using Kerberos
[ https://issues.apache.org/jira/browse/HDFS-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884455#comment-13884455 ] Daryn Sharp commented on HDFS-5804: --- Are the other {{isSecurityEnabled}} checks still required? > HDFS NFS Gateway fails to mount and proxy when using Kerberos > - > > Key: HDFS-5804 > URL: https://issues.apache.org/jira/browse/HDFS-5804 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: nfs >Affects Versions: 3.0.0, 2.2.0 >Reporter: Abin Shahab > Attachments: HDFS-5804.patch, HDFS-5804.patch, HDFS-5804.patch, > HDFS-5804.patch, HDFS-5804.patch, exception-as-root.log, > javadoc-after-patch.log, javadoc-before-patch.log > > > When using HDFS nfs gateway with secure hadoop > (hadoop.security.authentication: kerberos), mounting hdfs fails. > Additionally, there is no mechanism to support proxy user(nfs needs to proxy > as the user invoking commands on the hdfs mount). > Steps to reproduce: > 1) start a hadoop cluster with kerberos enabled. > 2) sudo su -l nfsserver and start an nfs server. This 'nfsserver' account has > a an account in kerberos. > 3) Get the keytab for nfsserver, and issue the following mount command: mount > -t nfs -o vers=3,proto=tcp,nolock $server:/ $mount_point > 4) You'll see in the nfsserver logs that Kerberos is complaining about not > having a TGT for root. > This is the stacktrace: > java.io.IOException: Failed on local exception: java.io.IOException: > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[TOKEN, KERBEROS]; Host Details : local host is: > "my-nfs-server-host.com/10.252.4.197"; destination host is: > "my-namenode-host.com":8020; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1351) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy9.getFileLinkInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileLinkInfo(ClientNamenodeProtocolTranslatorPB.java:664) > at org.apache.hadoop.hdfs.DFSClient.getFileLinkInfo(DFSClient.java:1713) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileStatus(Nfs3Utils.java:58) > at > org.apache.hadoop.hdfs.nfs.nfs3.Nfs3Utils.getFileAttr(Nfs3Utils.java:79) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.fsinfo(RpcProgramNfs3.java:1643) > at > org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.handleInternal(RpcProgramNfs3.java:1891) > at > org.apache.hadoop.oncrpc.RpcProgram.messageReceived(RpcProgram.java:143) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:281) > at > org.apache.hadoop.oncrpc.RpcUtil$RpcMessageParserStage.messageReceived(RpcUtil.java:132) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:5
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884361#comment-13884361 ] Suresh Srinivas commented on HDFS-5776: --- bq. Could we create another JIRA to track those disagreement? I have said more than three times: the default pool size is 0, so no hurt for all of existing applications by default. The fact that the issue is brought up many times means that there is an issue that needs to be discussed and resolved. bq. I guess it's possible cost one week, one month even one year to argue them... If takes more time, so be it. There are many committers who have spent time reviewing and commenting. I understand this is an important feature and the need to get it done sooner. But the core issues must be solved in this jira instead of pushing it to another jira. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884351#comment-13884351 ] Arpit Agarwal commented on HDFS-5776: - [~stack] I am basically +1 on the v9 patch at this point but v10 is a step back. We need a throttle on unbounded thread growth and threadpool size is the most trivial to add. We can file a separate Jira to replace the thread pool limit with something more sophisticated e.g. the client can keep a dynamic estimate of the 95th percentile latency and use that instead of a fixed value from configuration. Jing mentioned some issues that look fairly easy to address. {quote} In the old impl, the refetchToken/refetchEncryptionKey are shared by all nodes from chooseDataNode once key/token exception happened. that means if the first node consumed this retry quota, then if the second or third node hit the key/token exception, clearDataEncryptionKey/fetchBlockAt opeerations will not be called, it's a little unfair {quote} [~xieliang007] That makes sense, thanks for the clarification. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5698) Use protobuf to serialize / deserialize FSImage
[ https://issues.apache.org/jira/browse/HDFS-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884307#comment-13884307 ] Kihwal Lee commented on HDFS-5698: -- Thanks for running tests and sharing the numbers. I did some testing In the past and the loading speed was about 30MB/sec at best. I/O wasn't the bottleneck. THP and CompressedOOPS help a bit, but in the end the bottleneck was java object creations. Due to the way things are serialized, multi-threaded loading wasn't feasible. Now that we have the inode section and the inode directory section separated, parallelism can be added for loading each section. Please share your implementation ideas. The parallelism may come out far less than expected due to internal locks. So it will be great if a rough prototype & testing is done to show what's attainable. Do you already have numbers for how long it took to load each section? > Use protobuf to serialize / deserialize FSImage > --- > > Key: HDFS-5698 > URL: https://issues.apache.org/jira/browse/HDFS-5698 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5698.000.patch, HDFS-5698.001.patch > > > Currently, the code serializes FSImage using in-house serialization > mechanisms. There are a couple disadvantages of the current approach: > # Mixing the responsibility of reconstruction and serialization / > deserialization. The current code paths of serialization / deserialization > have spent a lot of effort on maintaining compatibility. What is worse is > that they are mixed with the complex logic of reconstructing the namespace, > making the code difficult to follow. > # Poor documentation of the current FSImage format. The format of the FSImage > is practically defined by the implementation. An bug in implementation means > a bug in the specification. Furthermore, it also makes writing third-party > tools quite difficult. > # Changing schemas is non-trivial. Adding a field in FSImage requires bumping > the layout version every time. Bumping out layout version requires (1) the > users to explicitly upgrade the clusters, and (2) putting new code to > maintain backward compatibility. > This jira proposes to use protobuf to serialize the FSImage. Protobuf has > been used to serialize / deserialize the RPC message in Hadoop. > Protobuf addresses all the above problems. It clearly separates the > responsibility of serialization and reconstructing the namespace. The > protobuf files document the current format of the FSImage. The developers now > can add optional fields with ease, since the old code can always read the new > FSImage. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5828) BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same node group when dfs.namenode.avoid.write.stale.datanode is true
[ https://issues.apache.org/jira/browse/HDFS-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884251#comment-13884251 ] Buddy commented on HDFS-5828: - The reason that it was sometimes succeeding for me is that I was in the debugger and the node was sometimes going stale (30 seconds). If the node is not stale, then it always fails. Also node that logNodeIsNotChosen does not actually log anything, it just builds the message. The message is not logged in this case. > BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same > node group when dfs.namenode.avoid.write.stale.datanode is true > - > > Key: HDFS-5828 > URL: https://issues.apache.org/jira/browse/HDFS-5828 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Buddy > > When placing replicas using the replica placement policy > BlockPlacementPolicyWithNodeGroup, the number of targets returned should be > less than or equal to the number of node groups and no node group should get > two replicas of the same block. The Junit test > TestReplicationPolicyWithNodeGroup.testChooseMoreTargetsThanNodeGroups > verifies this. > However, if the conf property "dfs.namenode.avoid.write.stale.datanode" is > set to true, then block placement policy will return more targets than node > groups when the number of replicas requested exceeds the number of node > groups. > This can be seen by putting: >CONF.setBoolean(DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY, true); > in the setup method for TestReplicationPolicyWithNodeGroup. This will cause > testChooseMoreTargetsThanNodeGroups to fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5828) BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same node group when dfs.namenode.avoid.write.stale.datanode is true
[ https://issues.apache.org/jira/browse/HDFS-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884236#comment-13884236 ] Buddy commented on HDFS-5828: - The failure appears to be non-deterministic. In some cases the first chooseLocalStorage throws an exception and we get the message: 2014-01-28 10:12:25,981 WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(309)) - Failed to place enough replicas, still in need of 10 to reach 10. For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy When that happens, the unit test succeeds. If chooseLocalStorage finds a local storage and does not throw an exception, then the above message is not logged and the unit test fails. > BlockPlacementPolicyWithNodeGroup can place multiple replicas on the same > node group when dfs.namenode.avoid.write.stale.datanode is true > - > > Key: HDFS-5828 > URL: https://issues.apache.org/jira/browse/HDFS-5828 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0 >Reporter: Buddy > > When placing replicas using the replica placement policy > BlockPlacementPolicyWithNodeGroup, the number of targets returned should be > less than or equal to the number of node groups and no node group should get > two replicas of the same block. The Junit test > TestReplicationPolicyWithNodeGroup.testChooseMoreTargetsThanNodeGroups > verifies this. > However, if the conf property "dfs.namenode.avoid.write.stale.datanode" is > set to true, then block placement policy will return more targets than node > groups when the number of replicas requested exceeds the number of node > groups. > This can be seen by putting: >CONF.setBoolean(DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY, true); > in the setup method for TestReplicationPolicyWithNodeGroup. This will cause > testChooseMoreTargetsThanNodeGroups to fail. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
[ https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884132#comment-13884132 ] Hudson commented on HDFS-5825: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1656 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1656/]) HDFS-5825. Use FileUtils.copyFile() to implement DFSTestUtils.copyFile(). (Contributed by Haohui Mai) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561792) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java > Use FileUtils.copyFile() to implement DFSTestUtils.copyFile() > - > > Key: HDFS-5825 > URL: https://issues.apache.org/jira/browse/HDFS-5825 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-5825.000.patch > > > {{DFSTestUtils.copyFile()}} is implemented by copying data through > FileInputStream / FileOutputStream. Apache Common IO provides > {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient. > This jira proposes to implement {{DFSTestUtils.copyFile()}} using > {{FileUtils.copyFile()}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884136#comment-13884136 ] Hudson commented on HDFS-5830: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1656 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1656/]) HDFS-5830. WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster. (Yongjun Zhang via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561885) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884134#comment-13884134 ] Hudson commented on HDFS-5781: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1656 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1656/]) HDFS-5781. Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561788) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5833) SecondaryNameNode have an incorrect java doc
[ https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884135#comment-13884135 ] Hudson commented on HDFS-5833: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1656 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1656/]) HDFS-5833. Fix incorrect javadoc in SecondaryNameNode. (Contributed by Bangtao Zhou) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561938) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java > SecondaryNameNode have an incorrect java doc > > > Key: HDFS-5833 > URL: https://issues.apache.org/jira/browse/HDFS-5833 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Bangtao Zhou >Priority: Trivial > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5833-1.patch > > > SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode > uses the *NamenodeProtocol* to talk to the primary NameNode, not the > *ClientProtocol* -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5297) Fix dead links in HDFS site documents
[ https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884130#comment-13884130 ] Hudson commented on HDFS-5297: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1656 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1656/]) HDFS-5297. Fix dead links in HDFS site documents. (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561849) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsEditsViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsImageViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ShortCircuitLocalReads.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm > Fix dead links in HDFS site documents > - > > Key: HDFS-5297 > URL: https://issues.apache.org/jira/browse/HDFS-5297 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5297.patch > > > I found a lot of broken hyperlinks in HDFS document to be fixed. > Ex.) > In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below > {noformat} >For command usage, see {{{dfsadmin}}}. > {noformat} > It should be fixed to > {noformat} >For command usage, see > {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884119#comment-13884119 ] Hudson commented on HDFS-5830: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1681 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1681/]) HDFS-5830. WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster. (Yongjun Zhang via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561885) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884117#comment-13884117 ] Hudson commented on HDFS-5781: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1681 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1681/]) HDFS-5781. Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561788) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5297) Fix dead links in HDFS site documents
[ https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884113#comment-13884113 ] Hudson commented on HDFS-5297: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1681 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1681/]) HDFS-5297. Fix dead links in HDFS site documents. (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561849) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsEditsViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsImageViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ShortCircuitLocalReads.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm > Fix dead links in HDFS site documents > - > > Key: HDFS-5297 > URL: https://issues.apache.org/jira/browse/HDFS-5297 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5297.patch > > > I found a lot of broken hyperlinks in HDFS document to be fixed. > Ex.) > In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below > {noformat} >For command usage, see {{{dfsadmin}}}. > {noformat} > It should be fixed to > {noformat} >For command usage, see > {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
[ https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884115#comment-13884115 ] Hudson commented on HDFS-5825: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1681 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1681/]) HDFS-5825. Use FileUtils.copyFile() to implement DFSTestUtils.copyFile(). (Contributed by Haohui Mai) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561792) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java > Use FileUtils.copyFile() to implement DFSTestUtils.copyFile() > - > > Key: HDFS-5825 > URL: https://issues.apache.org/jira/browse/HDFS-5825 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-5825.000.patch > > > {{DFSTestUtils.copyFile()}} is implemented by copying data through > FileInputStream / FileOutputStream. Apache Common IO provides > {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient. > This jira proposes to implement {{DFSTestUtils.copyFile()}} using > {{FileUtils.copyFile()}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5833) SecondaryNameNode have an incorrect java doc
[ https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884118#comment-13884118 ] Hudson commented on HDFS-5833: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1681 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1681/]) HDFS-5833. Fix incorrect javadoc in SecondaryNameNode. (Contributed by Bangtao Zhou) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561938) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java > SecondaryNameNode have an incorrect java doc > > > Key: HDFS-5833 > URL: https://issues.apache.org/jira/browse/HDFS-5833 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Bangtao Zhou >Priority: Trivial > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5833-1.patch > > > SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode > uses the *NamenodeProtocol* to talk to the primary NameNode, not the > *ClientProtocol* -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5730) Inconsistent Audit logging for HDFS APIs
[ https://issues.apache.org/jira/browse/HDFS-5730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884103#comment-13884103 ] Uma Maheswara Rao G commented on HDFS-5730: --- Thanks a lot, Colin for taking a look. More reviews are welcomed. > Inconsistent Audit logging for HDFS APIs > > > Key: HDFS-5730 > URL: https://issues.apache.org/jira/browse/HDFS-5730 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0, 2.2.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G > Attachments: HDFS-5730.patch, HDFS-5730.patch > > > When looking at the audit loggs in HDFS, I am seeing some inconsistencies > what was logged with audit and what is added recently. > For more details please check the comments. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5830) WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster.
[ https://issues.apache.org/jira/browse/HDFS-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884013#comment-13884013 ] Hudson commented on HDFS-5830: -- FAILURE: Integrated in Hadoop-Yarn-trunk #464 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/464/]) HDFS-5830. WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when accessing another cluster. (Yongjun Zhang via Colin Patrick McCabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561885) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUtil.java > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing another cluster. > > > Key: HDFS-5830 > URL: https://issues.apache.org/jira/browse/HDFS-5830 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs-client >Affects Versions: 2.3.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Blocker > Fix For: 2.3.0 > > Attachments: HDFS-5830.001.patch > > > WebHdfsFileSystem.getFileBlockLocations throws IllegalArgumentException when > accessing a another cluster (that doesn't have caching support). > java.lang.IllegalArgumentException: cachedLocs should not be null, use a > different constructor > at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at org.apache.hadoop.hdfs.protocol.LocatedBlock.(LocatedBlock.java:79) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlock(JsonUtil.java:414) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlockList(JsonUtil.java:446) > at org.apache.hadoop.hdfs.web.JsonUtil.toLocatedBlocks(JsonUtil.java:479) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileBlockLocations(WebHdfsFileSystem.java:1067) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1812) > at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:1797) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value
[ https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884011#comment-13884011 ] Hudson commented on HDFS-5781: -- FAILURE: Integrated in Hadoop-Yarn-trunk #464 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/464/]) HDFS-5781. Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561788) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOpCodes.java > Use an array to record the mapping between FSEditLogOpCode and the > corresponding byte value > --- > > Key: HDFS-5781 > URL: https://issues.apache.org/jira/browse/HDFS-5781 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.4.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Fix For: 2.4.0 > > Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, > HDFS-5781.002.patch, HDFS-5781.002.patch > > > HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a > given byte value. While improving the efficiency, it may cause issue. E.g., > when several new editlog ops are added to trunk around the same time (for > several different new features), it is hard to backport the editlog ops with > larger byte values to branch-2 before those with smaller values, since there > will be gaps in the byte values of the enum. > This jira plans to still use an array to record the mapping between editlog > ops and their byte values, and allow gap between valid ops. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5825) Use FileUtils.copyFile() to implement DFSTestUtils.copyFile()
[ https://issues.apache.org/jira/browse/HDFS-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884009#comment-13884009 ] Hudson commented on HDFS-5825: -- FAILURE: Integrated in Hadoop-Yarn-trunk #464 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/464/]) HDFS-5825. Use FileUtils.copyFile() to implement DFSTestUtils.copyFile(). (Contributed by Haohui Mai) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561792) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java > Use FileUtils.copyFile() to implement DFSTestUtils.copyFile() > - > > Key: HDFS-5825 > URL: https://issues.apache.org/jira/browse/HDFS-5825 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai >Priority: Minor > Fix For: 2.3.0 > > Attachments: HDFS-5825.000.patch > > > {{DFSTestUtils.copyFile()}} is implemented by copying data through > FileInputStream / FileOutputStream. Apache Common IO provides > {{FileUtils.copyFile()}}. It uses FileChannel which is more efficient. > This jira proposes to implement {{DFSTestUtils.copyFile()}} using > {{FileUtils.copyFile()}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5833) SecondaryNameNode have an incorrect java doc
[ https://issues.apache.org/jira/browse/HDFS-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884012#comment-13884012 ] Hudson commented on HDFS-5833: -- FAILURE: Integrated in Hadoop-Yarn-trunk #464 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/464/]) HDFS-5833. Fix incorrect javadoc in SecondaryNameNode. (Contributed by Bangtao Zhou) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561938) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java > SecondaryNameNode have an incorrect java doc > > > Key: HDFS-5833 > URL: https://issues.apache.org/jira/browse/HDFS-5833 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Bangtao Zhou >Priority: Trivial > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5833-1.patch > > > SecondaryNameNode have an incorrect java doc, actually the SecondaryNameNode > uses the *NamenodeProtocol* to talk to the primary NameNode, not the > *ClientProtocol* -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5297) Fix dead links in HDFS site documents
[ https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884007#comment-13884007 ] Hudson commented on HDFS-5297: -- FAILURE: Integrated in Hadoop-Yarn-trunk #464 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/464/]) HDFS-5297. Fix dead links in HDFS site documents. (Contributed by Akira Ajisaka) (arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1561849) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithNFS.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsEditsViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsImageViewer.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsPermissionsGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsQuotaAdminGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsUserGuide.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Hftp.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ShortCircuitLocalReads.apt.vm * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm > Fix dead links in HDFS site documents > - > > Key: HDFS-5297 > URL: https://issues.apache.org/jira/browse/HDFS-5297 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5297.patch > > > I found a lot of broken hyperlinks in HDFS document to be fixed. > Ex.) > In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below > {noformat} >For command usage, see {{{dfsadmin}}}. > {noformat} > It should be fixed to > {noformat} >For command usage, see > {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883974#comment-13883974 ] Hadoop QA commented on HDFS-5844: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625540/HDFS-5844.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestNameNodeHttpServer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5961//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5961//console This message is automatically generated. > Fix broken link in WebHDFS.apt.vm > - > > Key: HDFS-5844 > URL: https://issues.apache.org/jira/browse/HDFS-5844 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Attachments: HDFS-5844.patch > > > There is one broken link in WebHDFS.apt.vm. > {code} > {{{RemoteException JSON Schema}}} > {code} > should be > {code} > {{RemoteException JSON Schema}} > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883944#comment-13883944 ] Liang Xie commented on HDFS-5776: - Could we create another JIRA to track those disagreement? I have said more than three times: the default pool size is 0, so no hurt for all of existing applications by default. I guess it's possible cost one week, one month even one year to argue them... Thanks > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883940#comment-13883940 ] Jing Zhao commented on HDFS-5776: - Another thing for enoughNodesForHedgedRead. The current patch checks enoughNodesForHedgedRead before calling hedgedFetchBlockByteRange. Since the deadnodes keeps being updated while reading, we may still hit the issue where we could not easily find the second DN for reading. I think a better way is to add this check in chooseDataNode: if chooseDataNode finds that this is for seeking the second DN (if ignored is not null), and it could not immediately/easily find a DN, the chooseDataNode should skip retrying and we may want to fall back to the normal read. > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883928#comment-13883928 ] Jing Zhao commented on HDFS-5776: - bq. it's more flexible if we provide instance level disable/enable APIs, so we can archive to use the hbase shell script to control the switch per dfs client instance, that'll be cooler I still have some concern about the current implementation: 1) we do not check threadpool in enableHedgedReads. This makes it possible that isHedgedReadsEnabled() returns true while hedged read is actually not enabled. 2) DFSClient#setThreadsNumForHedgedReads allows users to keep changing the size of the thread pool. To provide instance level disable/enable APIs, I think maybe we can do the following: 1) Read the thread pool size configuration only when initializing the thread pool, and the size should be >0 and cannot be changed. 2) Add an "Allow-Hedged-Reads" configuration. Each DFSClient instance reads this configuration, and if it is true, checks and initializes the thread pool if necessary. Users can turn on/off the switch using the enable/disable methods. In the enable method, we check and initialize the thread pool if necessary. What do you think [~xieliang007]? > Support 'hedged' reads in DFSClient > --- > > Key: HDFS-5776 > URL: https://issues.apache.org/jira/browse/HDFS-5776 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HDFS-5776-v10.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, > HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, > HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt > > > This is a placeholder of hdfs related stuff backport from > https://issues.apache.org/jira/browse/HBASE-7509 > The quorum read ability should be helpful especially to optimize read outliers > we can utilize "dfs.dfsclient.quorum.read.threshold.millis" & > "dfs.dfsclient.quorum.read.threadpool.size" to enable/disable the hedged read > ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we > could export the interested metric valus into client system(e.g. HBase's > regionserver metric). > The core logic is in pread code path, we decide to goto the original > fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per > the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled
[ https://issues.apache.org/jira/browse/HDFS-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883894#comment-13883894 ] Hadoop QA commented on HDFS-5843: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12625529/hdfs-5843.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestPersistBlocks {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5960//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5960//console This message is automatically generated. > DFSClient.getFileChecksum() throws IOException if checksum is disabled > -- > > Key: HDFS-5843 > URL: https://issues.apache.org/jira/browse/HDFS-5843 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Laurent Goujon > Attachments: hdfs-5843.patch > > > If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} > for example), calling {{FileSystem.getFileChecksum()}} throws the following > IOException: > {noformat} > java.io.IOException: Fail to get block MD5 for > BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001 > at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965) > at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186) > at > org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194) > [...] > {noformat} > From the logs, the datanode is doing some wrong arithmetics because of the > crcPerBlock: > {noformat} > 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - > 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation src: > /127.0.0.1:52407 dest: /127.0.0.1:52398 > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > at java.lang.Thread.run(Thread.java:695) > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5297) Fix dead links in HDFS site documents
[ https://issues.apache.org/jira/browse/HDFS-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883878#comment-13883878 ] Akira AJISAKA commented on HDFS-5297: - Thank you for reviewing and committing, [~arpitagarwal]! bq. There is one broken link in WebHDFS.apt.vm. Filed HDFS-5844 and attached a patch. Would you review it? > Fix dead links in HDFS site documents > - > > Key: HDFS-5297 > URL: https://issues.apache.org/jira/browse/HDFS-5297 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA > Fix For: 3.0.0, 2.3.0 > > Attachments: HDFS-5297.patch > > > I found a lot of broken hyperlinks in HDFS document to be fixed. > Ex.) > In HdfsUserGuide.apt.vm, there is an broken hyperlinks as below > {noformat} >For command usage, see {{{dfsadmin}}}. > {noformat} > It should be fixed to > {noformat} >For command usage, see > {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}. > {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5844: Status: Patch Available (was: Open) > Fix broken link in WebHDFS.apt.vm > - > > Key: HDFS-5844 > URL: https://issues.apache.org/jira/browse/HDFS-5844 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Attachments: HDFS-5844.patch > > > There is one broken link in WebHDFS.apt.vm. > {code} > {{{RemoteException JSON Schema}}} > {code} > should be > {code} > {{RemoteException JSON Schema}} > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5844) Fix broken link in WebHDFS.apt.vm
[ https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5844: Attachment: HDFS-5844.patch Attaching a patch. > Fix broken link in WebHDFS.apt.vm > - > > Key: HDFS-5844 > URL: https://issues.apache.org/jira/browse/HDFS-5844 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.2.0 >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: newbie > Attachments: HDFS-5844.patch > > > There is one broken link in WebHDFS.apt.vm. > {code} > {{{RemoteException JSON Schema}}} > {code} > should be > {code} > {{RemoteException JSON Schema}} > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)