[jira] [Updated] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HDFS-16540: - Fix Version/s: (was: 3.3.5) > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767668#comment-17767668 ] Michael Stack commented on HDFS-16540: -- Let me do the latter [~dmanning] ... I'll let folks ask for the backport before doing it for branch-3.3. Thanks for finding this one. > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767640#comment-17767640 ] Michael Stack commented on HDFS-16540: -- [~dmanning] You are right. Looks like I messed up the cherry-pick back to 3.3. I could open a new issue and retry the backport there? > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16755) Unit test can fail due to unexpected host resolution
[ https://issues.apache.org/jira/browse/HDFS-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598612#comment-17598612 ] Michael Stack commented on HDFS-16755: -- {quote}Could someone transform HADOOP-18431 to HDFS-*? {quote} Done (I think you need to reset 'fix version') > Unit test can fail due to unexpected host resolution > > > Key: HDFS-16755 > URL: https://issues.apache.org/jira/browse/HDFS-16755 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.9 > Environment: Running using both Maven Surefire and an IDE results in > a test failure. Switching the name to "bogus.invalid" results in the > expected behavior, which depends on an UnknownHostException. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Minor > Labels: pull-request-available > > Tests that want to use an unresolvable address may actually resolve in some > environments. Replacing host names like "bogus" with a IETF RFC 2606 domain > name avoids the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Moved] (HDFS-16755) Unit test can fail due to unexpected host resolution
[ https://issues.apache.org/jira/browse/HDFS-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack moved HADOOP-18431 to HDFS-16755: --- Component/s: test (was: test) Key: HDFS-16755 (was: HADOOP-18431) Target Version/s: (was: 3.4.0, 3.3.9) Affects Version/s: 3.4.0 3.3.9 (was: 3.4.0) (was: 3.3.9) Project: Hadoop HDFS (was: Hadoop Common) > Unit test can fail due to unexpected host resolution > > > Key: HDFS-16755 > URL: https://issues.apache.org/jira/browse/HDFS-16755 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.9 > Environment: Running using both Maven Surefire and an IDE results in > a test failure. Switching the name to "bogus.invalid" results in the > expected behavior, which depends on an UnknownHostException. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Minor > Labels: pull-request-available > > Tests that want to use an unresolvable address may actually resolve in some > environments. Replacing host names like "bogus" with a IETF RFC 2606 domain > name avoids the issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host
[ https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16684. -- Hadoop Flags: Reviewed Resolution: Fixed Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution [~svaughan] > Exclude self from JournalNodeSyncer when using a bind host > -- > > Key: HDFS-16684 > URL: https://issues.apache.org/jira/browse/HDFS-16684 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Affects Versions: 3.4.0, 3.3.9 > Environment: Running with Java 11 and bind addresses set to 0.0.0.0. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > The JournalNodeSyncer will include the local instance in syncing when using a > bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude > the local instance, but it doesn't recognize the meta-address as a local > address. > Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log > attempts to sync with itself as part of the normal syncing rotation. For an > HA configuration running 3 JournalNodes, the "other" list used by the > JournalNodeSyncer will include 3 proxies. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'
[ https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16586. -- Fix Version/s: 3.4.0 3.2.4 3.3.4 Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review [~hexiaoqiao] > Purge FsDatasetAsyncDiskService threadgroup; it causes > BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal > exception and exit' > - > > Key: HDFS-16586 > URL: https://issues.apache.org/jira/browse/HDFS-16586 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.0, 3.2.3 >Reporter: Michael Stack >Assignee: Michael Stack >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > The below failed block finalize is causing a downstreamer's test to fail when > it uses hadoop 3.2.3 or 3.3.0+: > {code:java} > 2022-05-19T18:21:08,243 INFO [Command processor] > impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica > FinalizedReplica, blk_1073741840_1016, FINALIZED > getNumBytes() = 52 > getBytesOnDisk() = 52 > getVisibleLength()= 52 > getVolume() = > /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2 > getBlockURI() = > file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840 > for deletion > 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] > metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 > (auth:SIMPLE) > 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] > top.TopAuditLogger(78): --- logged event for top service: > allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete > src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp > dst=null perm=null > 2022-05-19T18:21:08,243 DEBUG [PacketResponder: > BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): > PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS > downstreamAckTimeNanos: 0 flag: 0 > 2022-05-19T18:21:08,243 DEBUG [PacketResponder: > BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): > PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, > type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write. > 2022-05-19T18:21:08,243 ERROR [Command processor] > datanode.BPServiceActor$CommandProcessingThread(1276): Command processor > encountered fatal exception and exit. > java.lang.IllegalThreadStateException: null > at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?] > at java.lang.Thread.(Thread.java:430) ~[?:?] > at java.lang.Thread.(Thread.java:704) ~[?:?] > at java.lang.Thread.(Thread.java:525) ~[?:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113) > ~[hadoop-hdfs-3.2.3.jar:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) > ~[?:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103) > ~[hadoop-hdfs-3.2.3.jar:?] > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736) > ~[hadoop-hdf
[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously
[ https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540321#comment-17540321 ] Michael Stack commented on HDFS-14997: -- Linking an issue where the change in how we do command processing uncovers an old, existing problem. Perhaps it is of interest. > BPServiceActor processes commands from NameNode asynchronously > -- > > Key: HDFS-14997 > URL: https://issues.apache.org/jira/browse/HDFS-14997 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.3.0, 3.2.3, 3.2.4 > > Attachments: HDFS-14997-branch-3.2.001.patch, HDFS-14997.001.patch, > HDFS-14997.002.patch, HDFS-14997.003.patch, HDFS-14997.004.patch, > HDFS-14997.005.patch, HDFS-14997.addendum.patch, > image-2019-12-26-16-15-44-814.png > > > There are two core functions, report(#sendHeartbeat, #blockReport, > #cacheReport) and #processCommand in #BPServiceActor main process flow. If > processCommand cost long time it will block send report flow. Meanwhile > processCommand could cost long time(over 1000s the worst case I meet) when IO > load of DataNode is very high. Since some IO operations are under > #datasetLock, So it has to wait to acquire #datasetLock long time when > process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat > will not send to NameNode in-time, and trigger other disasters. > I propose to improve #processCommand asynchronously and not block > #BPServiceActor to send heartbeat back to NameNode when meet high IO load. > Notes: > 1. Lifeline could be one effective solution, however some old branches are > not support this feature. > 2. IO operations under #datasetLock is another issue, I think we should solve > it at another JIRA. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'
Michael Stack created HDFS-16586: Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit' Key: HDFS-16586 URL: https://issues.apache.org/jira/browse/HDFS-16586 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.2.3, 3.3.0 Reporter: Michael Stack Assignee: Michael Stack The below failed block finalize is causing a downstreamer's test to fail when it uses hadoop 3.2.3 or 3.3.0+: {code:java} 2022-05-19T18:21:08,243 INFO [Command processor] impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica FinalizedReplica, blk_1073741840_1016, FINALIZED getNumBytes() = 52 getBytesOnDisk() = 52 getVisibleLength()= 52 getVolume() = /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2 getBlockURI() = file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840 for deletion 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 (auth:SIMPLE) 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] top.TopAuditLogger(78): --- logged event for top service: allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp dst=null perm=null 2022-05-19T18:21:08,243 DEBUG [PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0 2022-05-19T18:21:08,243 DEBUG [PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write. 2022-05-19T18:21:08,243 ERROR [Command processor] datanode.BPServiceActor$CommandProcessingThread(1276): Command processor encountered fatal exception and exit. java.lang.IllegalThreadStateException: null at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?] at java.lang.Thread.(Thread.java:430) ~[?:?] at java.lang.Thread.(Thread.java:704) ~[?:?] at java.lang.Thread.(Thread.java:525) ~[?:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113) ~[hadoop-hdfs-3.2.3.jar:?] at java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) ~[?:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291) ~[hadoop-hdfs-3.2.3.jar:?] at org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274) ~[hadoop-hdfs-3.2.3.jar:?] 2022-05-19T18:21:08,243 DEBUG [DataXceiver for client
[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack resolved HDFS-16540. -- Hadoop Flags: Reviewed Resolution: Fixed Merged to branch-3.3. and to trunk. > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 7h > Remaining Estimate: 0h > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HDFS-16540: - Fix Version/s: 3.4.0 3.3.4 > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 7h > Remaining Estimate: 0h > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes
[ https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack reassigned HDFS-16540: Assignee: Huaxiang Sun > Data locality is lost when DataNode pod restarts in kubernetes > --- > > Key: HDFS-16540 > URL: https://issues.apache.org/jira/browse/HDFS-16540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.2 >Reporter: Huaxiang Sun >Assignee: Huaxiang Sun >Priority: Major > > We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod > restarts, we found that data locality is lost after we do a major compaction > of hbase regions. After some debugging, we found that upon pod restarts, its > ip changes. In DatanodeManager, maps like networktopology are updated with > the new info. host2DatanodeMap is not updated accordingly. When hdfs client > with the new ip tries to find a local DataNode, it fails. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts
[ https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HDFS-16090: - Fix Version/s: 3.3.2 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to branch-3.3+ (It didn't go in clean against branch-3.2). Resolving. Thanks for the improvement [~vjasani] . Thanks for reviews [~aajisaka] and [~weichiu] > Fine grained locking for datanodeNetworkCounts > -- > > Key: HDFS-16090 > URL: https://issues.apache.org/jira/browse/HDFS-16090 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > While incrementing DataNode network error count, we lock entire LoadingCache > in order to increment network count of specific host. We should provide fine > grained concurrency for this update because locking entire cache is redundant > and could impact performance while incrementing network count for multiple > hosts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"
[ https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969596#comment-16969596 ] Michael Stack commented on HDFS-13613: -- Thanks [~ndimiduk] and [~inigoiri] for taking a look. Thanks for your experience disabling hedged reads. Will try that too. I could add a check for DEBUG but just doing check in this old logging system of ours -- last release was more than 5 years ago -- requires our passing across a synchronized block. When this was log was spewing in a running process, as it will tend to do when HDFS is struggling and all threads are waiting on HDFS syncs to return, I changed the log level but then saw access to HDFS blocking on the log level-check (thread dumping so rough take only). Limiting the number of emissions would require a system to count and it'd have to be configurable, and so on. Seems a bit OTT. I was thinking that if you are interested in thread count for hedged reads, you'd study the metrics incremented on the line that follows; it'd give you better notion than what this bare log does. Thanks again for taking a look. > RegionServer log is flooded with "Execution rejected, Executing in current > thread" > -- > > Key: HDFS-13613 > URL: https://issues.apache.org/jira/browse/HDFS-13613 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read >Reporter: Wei-Chiu Chuang >Priority: Major > Attachments: > 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch > > > In the log of a HBase RegionServer with hedged read, we saw the following > message flooding the log file. > {noformat} > 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > > {noformat} > Sometimes the RS spits tens of thousands of lines of this message in a > minute. We should do something to stop this message flooding the log file. > Also, we should make this message more actionable. Discussed with > [~huaxiang], this message can appear if there are stale DataNodes. > I believe this issue existed since HDFS-5776. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"
[ https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967028#comment-16967028 ] Michael Stack commented on HDFS-13613: -- Attached proposed patch. The log is useless. Besides, we keep up a rejected execution metric. In my case no stale datanodes. Just load. > RegionServer log is flooded with "Execution rejected, Executing in current > thread" > -- > > Key: HDFS-13613 > URL: https://issues.apache.org/jira/browse/HDFS-13613 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read >Reporter: Wei-Chiu Chuang >Priority: Major > Attachments: > 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch > > > In the log of a HBase RegionServer with hedged read, we saw the following > message flooding the log file. > {noformat} > 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > > {noformat} > Sometimes the RS spits tens of thousands of lines of this message in a > minute. We should do something to stop this message flooding the log file. > Also, we should make this message more actionable. Discussed with > [~huaxiang], this message can appear if there are stale DataNodes. > I believe this issue existed since HDFS-5776. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"
[ https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Stack updated HDFS-13613: - Attachment: 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch > RegionServer log is flooded with "Execution rejected, Executing in current > thread" > -- > > Key: HDFS-13613 > URL: https://issues.apache.org/jira/browse/HDFS-13613 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read >Reporter: Wei-Chiu Chuang >Priority: Major > Attachments: > 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch > > > In the log of a HBase RegionServer with hedged read, we saw the following > message flooding the log file. > {noformat} > 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > > {noformat} > Sometimes the RS spits tens of thousands of lines of this message in a > minute. We should do something to stop this message flooding the log file. > Also, we should make this message more actionable. Discussed with > [~huaxiang], this message can appear if there are stale DataNodes. > I believe this issue existed since HDFS-5776. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"
[ https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967021#comment-16967021 ] Michael Stack edited comment on HDFS-13613 at 11/4/19 9:40 PM: --- Just ran into this one. Thread dump showded loads of threads BLOCKED here: {code} "RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 waiting for monitor entry [0x7f3dd21a9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0x8080c258> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904) {code} i.e. trying to log the above useless message. I turned off logging but we still go into the logging system and hit the BLOCKED section. The RS backs up, fills all call queues. Nothing can come in the front door. We start to burn all CPUs. For context, running heavy load. HDFS slows down. HBase spews complaint that syncs are costing > 100ms. Then this phenomenon takes off When load lightens, we seem to be get past the torrent but while it is going on, the backed up RS is not allowing access to any of its hosted data. This was in a hadoop3 deriviative. was (Author: stack): Just ran into this one. Thread dump showded loads of threads BLOCKED here: {code} "RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 waiting for monitor entry [0x7f3dd21a9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0x8080c258> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904) {code} i.e. trying to log the above useless message. I turned off logging but we still go into the logging system and hit the BLOCKED section. The RS backs up, fills all call queues. Nothing can come in the front door. We start to burn all CPUs. For context, running heavy load. HDFS slows down. HBase spews complaint that syncs are costing > 100ms. Then this phenomenon takes off When load lightens, we seem to be get past the torrent but while it is going on, the backed up RS is not allowing access to any of its hosted data. > RegionServer log is flooded with "Execution rejected, Executing in current > thread" > -- > > Key: HDFS-13613 > URL: https://issues.apache.org/jira/browse/HDFS-13613 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read >Reporter: Wei-Chiu Chuang >Priority: Major > > In the log of a HBase RegionServer with hedged read, we saw the following > message flooding the log file. > {noformat} > 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > > {noformat} > Sometimes the RS spits tens of thousands of lines of this message in a > minute. We should do something to stop this message flooding the log file. > Also, we should make this message more actionable. Discussed with > [~huaxiang], this message can appear if there are stale DataNodes. > I believe this issue existed since HDFS-5776. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"
[ https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967021#comment-16967021 ] Michael Stack edited comment on HDFS-13613 at 11/4/19 9:39 PM: --- Just ran into this one. Thread dump showded loads of threads BLOCKED here: {code} "RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 waiting for monitor entry [0x7f3dd21a9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0x8080c258> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904) {code} i.e. trying to log the above useless message. I turned off logging but we still go into the logging system and hit the BLOCKED section. The RS backs up, fills all call queues. Nothing can come in the front door. We start to burn all CPUs. For context, running heavy load. HDFS slows down. HBase spews complaint that syncs are costing > 100ms. Then this phenomenon takes off When load lightens, we seem to be get past the torrent but while it is going on, the backed up RS is not allowing access to any of its hosted data. was (Author: stack): Just ran into this one. Thread dump showded loads of threads BLOCKED here: {code} "RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 waiting for monitor entry [0x7f3dd21a9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0x8080c258> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904) {code} i.e. trying to log the above useless message. I turned off logging but we still go into the logging system and hit the BLOCKED section. The RS backs up, fills all call queues. Nothing can come in the front door. We start to burn all CPUs. > RegionServer log is flooded with "Execution rejected, Executing in current > thread" > -- > > Key: HDFS-13613 > URL: https://issues.apache.org/jira/browse/HDFS-13613 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read >Reporter: Wei-Chiu Chuang >Priority: Major > > In the log of a HBase RegionServer with hedged read, we saw the following > message flooding the log file. > {noformat} > 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > > {noformat} > Sometimes the RS spits tens of thousands of lines of this message in a > minute. We should do something to stop this message flooding the log file. > Also, we should make this message more actionable. Discussed with > [~huaxiang], this message can appear if there are stale DataNodes. > I believe this issue existed since HDFS-5776. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"
[ https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967021#comment-16967021 ] Michael Stack commented on HDFS-13613: -- Just ran into this one. Thread dump showded loads of threads BLOCKED here: {code} "RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 waiting for monitor entry [0x7f3dd21a9000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0x8080c258> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305) at org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904) {code} i.e. trying to log the above useless message. I turned off logging but we still go into the logging system and hit the BLOCKED section. The RS backs up, fills all call queues. Nothing can come in the front door. We start to burn all CPUs. > RegionServer log is flooded with "Execution rejected, Executing in current > thread" > -- > > Key: HDFS-13613 > URL: https://issues.apache.org/jira/browse/HDFS-13613 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 > Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read >Reporter: Wei-Chiu Chuang >Priority: Major > > In the log of a HBase RegionServer with hedged read, we saw the following > message flooding the log file. > {noformat} > 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution > rejected, Executing in current thread > > {noformat} > Sometimes the RS spits tens of thousands of lines of this message in a > minute. We should do something to stop this message flooding the log file. > Also, we should make this message more actionable. Discussed with > [~huaxiang], this message can appear if there are stale DataNodes. > I believe this issue existed since HDFS-5776. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926301#comment-16926301 ] stack commented on HDFS-14837: -- One question, is Long.hashCode same as (int)(blockId^(blockId>>>32)) (I've not looked..) > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14837) Review of Block.java
[ https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926214#comment-16926214 ] stack commented on HDFS-14837: -- +1 nice cleanup > Review of Block.java > > > Key: HDFS-14837 > URL: https://issues.apache.org/jira/browse/HDFS-14837 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Attachments: HDFS-14837.1.patch > > > The {{Block}} class is such a core class in the project, I just wanted to > make sure it was super clean and documentation was correct. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881339#comment-16881339 ] stack commented on HDFS-14483: -- Reverted from branch-2 and branch-2.9 and then reapplied to both branches with amended commit message. Original was missing the JIRA id > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 2.9.3 > > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14483: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.3 2.10.0 Status: Resolved (was: Patch Available) Pushed to branch-2.9 and branch-2. Thanks for the patch [~leosun08]. Mind filling out the release note on what this patch adds? Thanks. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 2.9.3 > > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880012#comment-16880012 ] stack commented on HDFS-14483: -- [~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in the build just before this one for the HDFS-13694 patch. Unrelated then. Let me push. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880012#comment-16880012 ] stack edited comment on HDFS-14483 at 7/8/19 3:52 AM: -- [~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in the build just before this one for the HDFS-13694 patch. Unrelated then. Let me push. Will do tomorrow in case someone else wants to comment in meantime. was (Author: stack): [~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in the build just before this one for the HDFS-13694 patch. Unrelated then. Let me push. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880008#comment-16880008 ] stack commented on HDFS-14483: -- ...and +1 on patch. Lets just figure the story on this last flakey...and then I'll commit (unless objection). Thanks [~leosun08] > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14483: - Attachment: HDFS-14585.branch-2.9.v3.patch > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880004#comment-16880004 ] stack commented on HDFS-14483: -- Thanks for fixing the short circuit unit test [~leosun08]. Seems to have worked. As said above.. hadoop.hdfs.web.TestWebHdfsTimeouts hadoop.hdfs.server.datanode.TestDirectoryScanner ... are for sure flakey. TestJournalNodeRespectsBindHostKeys I'm not so sure. Will do a survey of recent test history... Meantime let me get another run in. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14483: - Attachment: HDFS-14585.branch-2.9.v3.patch > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, > HDFS-14585.branch-2.9.v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14483: - Attachment: HDFS-14483.branch-2.9.v2 (2).patch > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, > HDFS-14483.branch-2.9.v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877963#comment-16877963 ] stack commented on HDFS-14483: -- I looked back over recent hdfs qa builds https://builds.apache.org/job/PreCommit-HDFS-Build/. I see that TestWebHdfsTimeouts TestDirectoryScanner ... are definetly flakey. The others I am not so sure. If I go back in build history, I see that they fail only w/ this patch in place seemingly (I went back through all builds before the first build above.. up here https://builds.apache.org/job/PreCommit-HDFS-Build/). Let me retry the patch. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14483.branch-2.9.v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877402#comment-16877402 ] stack commented on HDFS-14483: -- Are the test failures related [~leosun08]? Thanks. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14483.branch-2.9.v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14483: - Attachment: HDFS-14483.branch-2.9.v2.patch > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2.patch, HDFS-14483.branch-2.9.v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877023#comment-16877023 ] stack commented on HDFS-14483: -- What about my other comments [~leosun08]? Mind responding to them? There is no overlap here between the two test runs. Let me try another in meantime. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, > HDFS-14483.branch-2.9.v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876673#comment-16876673 ] stack commented on HDFS-14483: -- Retry. All but one failure look like they could be related. Lets see. > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14483: - Attachment: HDFS-14483.branch-2.9.v1.patch > Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9 > -- > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch, > HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-14585. -- Resolution: Fixed Reapplied w/ proper commit message. Re-resolving. > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 2.9.3 > > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HDFS-14585: -- Reopening. Commit message was missing the JIRA # so revert and reapply with fixed commit message. > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 2.9.3 > > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14585: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.3 2.10.0 Status: Resolved (was: Patch Available) Pushed to branch-2 and branch-2.9. Thanks for the patch [~leosun08] (and review [~jojochuang]). Shout if I mangled this (it has been a while). > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 2.10.0, 2.9.3 > > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875570#comment-16875570 ] stack commented on HDFS-14585: -- Yes. It passed on second attempt. Flakey. The findbugs has a covering JIRA HADOOP-16386 filed by the mighty [~jojochuang]. I'll commit this after Monday unless objection. Thanks [~leosun08]. > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-14585: - Attachment: HDFS-14585.branch-2.9.v2.patch > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875047#comment-16875047 ] stack commented on HDFS-14585: -- bq. Sorry, I don't quite understand thse meaning of this comments. I am offering praise on aspects of your work. No response required. Test failure looks unrelated but let me retry. Reviewing the patch, v2 looks good to me. > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9 > - > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14483) Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x
[ https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874426#comment-16874426 ] stack commented on HDFS-14483: -- Test failures seem related, no? nit: Was expecting ByteBufferPositionedReadable to be a sub-interface of ByteBufferReadable/PositionedReadable. Probably fine as is but mildly 'surprising'. Comparing the BB-based decrypt to the byte[] version, this confuses me: buf.limit(start + len + Math.min(n - len, localInBuffer.remaining())); In the byte[] version, its len - n vs n - len figuring how much to decrypt. I see the byte[] decrypt loop is n < length vs the bb decrypt which is len < n... so I think its fine but take a look please. Should these resets be in the finally block just below? 425 buf.position(start + len); 426 buf.limit(limit); Nice javadoc in ByteBufferPositionedReadable Appreciate the improvement in testPositionedRead nit: more permutations in testPositionedReadWithByteBuffer would be nice-to-have -- though the testByteBufferPread addition is good. Maybe a follow-on? Could test more edge cases... a failed read or a read that does not fill the read request amount? The positionedReadCheckWithByteBuffer is nice. Nice addition of the strncmp check in the test_libhdfs_ops.c file and bulking up of the pread tests. Yeah, skip the re-formatting of unrelated code (especially when adds mistake as in '916 method = "open";')... which adds an offset. Nice comments added to c function names. Patch looks great. > Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x > - > > Key: HDFS-14483 > URL: https://issues.apache.org/jira/browse/HDFS-14483 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Zheng Hu >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14483.branch-2.8.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch-2 and branch2.9
[ https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874403#comment-16874403 ] stack commented on HDFS-14585: -- nit: why the change in import order for nonnull and Preconditions? Is it about respecting checkstyle import ordering rules? nit: Best to not make formatting changes that are unrelated to your patch: e.g. wrapping exception declaration on blockSeekTo (there are a few of this type of change). Formatting bulks up your patch and distract the reviewer. nit: ByteBuffer bb = ByteBuffer.wrap(buffer, offset, length); is offset. Some nice cleanup and duplication removal; e.g. using BB to keep account on the read buffer (offsets and length), purge of the HDFS-8703 unused EC version of actualGetFromOneDataNode, pulling out long targetEnd = targetStart + bytesToRead - 1, etc. Is this going to be ok: 1249tmp.limit(tmp.position() + len); ? The EC version of actualGetFromOneDataNode had a checkReadPortions. Should there be a check we don't go over the end of the buffer here? Why do we drop the below in the patch? 1268 updateReadStatistics(readStatistics, nread, reader); 1269 dfsClient.updateFileSystemReadStats( 1270 reader.getNetworkDistance(), nread); Thanks. > Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch-2 and > branch2.9 > -- > > Key: HDFS-14585 > URL: https://issues.apache.org/jira/browse/HDFS-14585 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14585.branch-2.9.v1.patch, > HDFS-14585.branch-2.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path
[ https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781902#comment-16781902 ] stack commented on HDFS-3246: - This patch looks beautiful going by the Interface. Being able to ask HDFS fill ByteBuffers over byte arrays will benefit downstreamers. > pRead equivalent for direct read path > - > > Key: HDFS-3246 > URL: https://issues.apache.org/jira/browse/HDFS-3246 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client, performance >Affects Versions: 3.0.0-alpha1 >Reporter: Henry Robinson >Assignee: Sahil Takiar >Priority: Major > Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, > HDFS-3246.003.patch, HDFS-3246.004.patch > > > There is no pread equivalent in ByteBufferReadable. We should consider adding > one. It would be relatively easy to implement for the distributed case > (certainly compared to HDFS-2834), since DFSInputStream does most of the > heavy lifting. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13702) HTrace hooks taking 10-15% CPU in DFS client when disabled
[ https://issues.apache.org/jira/browse/HDFS-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525612#comment-16525612 ] stack commented on HDFS-13702: -- bq. I do want a trace layer in there; I do want it broader than just HDFS, and I do want it to be used from the layers above. Me too. bq. Otherwise: it'll get cut, nobody will replace it, and it'll get lost in folklore. Better this than a dead, disabled lib burning everyone's CPU to no end. Even when enabled, as is, it is of little to no value. Trace in hdfs is in need of work but it has been suffering neglect since Colin's add. bq. This is something to talk about at a broader level than a JIRA; I can start a thread. Suggest this discussion not block this patch? Or, add in placeholders/comments for the trace points removed here? Thanks [~ste...@apache.org] > HTrace hooks taking 10-15% CPU in DFS client when disabled > -- > > Key: HDFS-13702 > URL: https://issues.apache.org/jira/browse/HDFS-13702 > Project: Hadoop HDFS > Issue Type: Bug > Components: performance >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > Attachments: hdfs-13702.patch, hdfs-13702.patch, hdfs-13702.patch > > > I am seeing DFSClient.newReaderTraceScope take ~15% CPU in a teravalidate > workload even when HTrace is disabled. This is because it stringifies several > integers. We should avoid all allocation and stringification when htrace is > disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13643) Implement basic async rpc client
[ https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524429#comment-16524429 ] stack commented on HDFS-13643: -- Yeah, we can take a look at [~daryn] stuff when it shows up. On the patch, checkstyles? No need of this since its default 44 compile ? Otherwise, classes could do w/ a bit of class javadoc situating them (though they are @Private audience and its kinda plain what they are about adding basic client on netty). Fine in a follow-up. +1 to commit on branch from me. We should do a writeup on general approach as entrance for those who might be trying to follow-along Good stuff. > Implement basic async rpc client > > > Key: HDFS-13643 > URL: https://issues.apache.org/jira/browse/HDFS-13643 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: HDFS-13572 > > Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, > HDFS-13643.patch > > > Implement the basic async rpc client so we can start working on the DFSClient > implementation ASAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13702) HTrace hooks taking 10-15% CPU in DFS client when disabled
[ https://issues.apache.org/jira/browse/HDFS-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524156#comment-16524156 ] stack commented on HDFS-13702: -- bq. what do you think? I think we need to be able to trace end-to-end where time is being spent. I think that if htrace is not enabled, it should not add friction. I think harley-davidson's are awful motorcycles but even they don't deserve the abuse they are getting. I think those numbers you posted for the difference your patch makes in throughput stripping htrace are radical. Poor htrace has been added to the apache attic. It got no loving. htace in hdfs got no loving either post inital-commit; it was added and then let fester. I think we should commit this patch, +1, and then we can file another to review how to move forward with tracing in light of recent developments in htrace project; i.e. purge all other htrace references, look into alternatives, etc. > HTrace hooks taking 10-15% CPU in DFS client when disabled > -- > > Key: HDFS-13702 > URL: https://issues.apache.org/jira/browse/HDFS-13702 > Project: Hadoop HDFS > Issue Type: Bug > Components: performance >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > Attachments: hdfs-13702.patch, hdfs-13702.patch > > > I am seeing DFSClient.newReaderTraceScope take ~15% CPU in a teravalidate > workload even when HTrace is disabled. This is because it stringifies several > integers. We should avoid all allocation and stringification when htrace is > disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13643) Implement basic async rpc client
[ https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500387#comment-16500387 ] stack commented on HDFS-13643: -- [~daryn] Thanks boss. Would like to see it. > Implement basic async rpc client > > > Key: HDFS-13643 > URL: https://issues.apache.org/jira/browse/HDFS-13643 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: HDFS-13572 > > Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, > HDFS-13643.patch > > > Implement the basic async rpc client so we can start working on the DFSClient > implementation ASAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13643) Implement basic async rpc client
[ https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-13643: - Fix Version/s: HDFS-13572 > Implement basic async rpc client > > > Key: HDFS-13643 > URL: https://issues.apache.org/jira/browse/HDFS-13643 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: HDFS-13572 > > Attachments: HDFS-13643.patch > > > Implement the basic async rpc client so we can start working on the DFSClient > implementation ASAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13643) Implement basic async rpc client
[ https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497356#comment-16497356 ] stack commented on HDFS-13643: -- Thanks for the input [~daryn] bq. It's a nice POC but not implementing security is a non-starter [~daryn] you mean its a non-starter come merge vote, right? If so, I agree but if you are suggesting a 'basic rpc client' needs security to be committed on a feature branch, this seems a bit much. bq. ...a divergent ipc client... Its async. Its going to diverge, no? We'd like to make a pure async client untethered by the creaky synchronous predecessor if thats ok. Thanks. > Implement basic async rpc client > > > Key: HDFS-13643 > URL: https://issues.apache.org/jira/browse/HDFS-13643 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ipc >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Attachments: HDFS-13643.patch > > > Implement the basic async rpc client so we can start working on the DFSClient > implementation ASAP. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3
[ https://issues.apache.org/jira/browse/HDFS-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497220#comment-16497220 ] stack commented on HDFS-13572: -- I made a branch named for this issue by cloning trunk ('trunk' is head and where 3.2 will be cut from there is no branch-3, a base for 3.0 releases...). $ git checkout origin/trunk -b HDFS-13572 $ git push -u origin HDFS-13572 Let me now start a vote up on PMC to get [~Apache9] on this new feature branch. > [umbrella] Non-blocking HDFS Access for H3 > -- > > Key: HDFS-13572 > URL: https://issues.apache.org/jira/browse/HDFS-13572 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs async >Affects Versions: 3.0.0 >Reporter: stack >Priority: Major > Attachments: Nonblocking HDFS Access.pdf > > > An umbrella JIRA for supporting non-blocking HDFS access in h3. > This issue has provenance in the stalled HDFS-9924 but would like to vault > over what was going on over there, in particular, focus on an async API for > hadoop3+ unencumbered by worries about how to make it work in hadoop2. > Let me post a WIP design. Would love input/feedback (We make mention of the > HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was > thinking of cutting a feature branch if all good after a bit of chat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3
[ https://issues.apache.org/jira/browse/HDFS-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495291#comment-16495291 ] stack commented on HDFS-13572: -- Unless objection, was planning on creating a branch to work on this issue. Will file sub-issues here. Was going to get [~Apache9] as committer on the branch. > [umbrella] Non-blocking HDFS Access for H3 > -- > > Key: HDFS-13572 > URL: https://issues.apache.org/jira/browse/HDFS-13572 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs async >Affects Versions: 3.0.0 >Reporter: stack >Priority: Major > Attachments: Nonblocking HDFS Access.pdf > > > An umbrella JIRA for supporting non-blocking HDFS access in h3. > This issue has provenance in the stalled HDFS-9924 but would like to vault > over what was going on over there, in particular, focus on an async API for > hadoop3+ unencumbered by worries about how to make it work in hadoop2. > Let me post a WIP design. Would love input/feedback (We make mention of the > HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was > thinking of cutting a feature branch if all good after a bit of chat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-13565) [um
[ https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HDFS-13565. -- Resolution: Invalid Smile [~ebadger] Yeah, sorry about that lads. Bad wifi. Resolving as invalid. > [um > --- > > Key: HDFS-13565 > URL: https://issues.apache.org/jira/browse/HDFS-13565 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: stack >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476812#comment-16476812 ] stack commented on HDFS-9924: - I moved the design doc over to a new issue, HDFS-13572, for the new effort (hadoop3+ basis). > [umbrella] Nonblocking HDFS Access > -- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Duo Zhang >Priority: Major > Attachments: Async-HDFS-Performance-Report.pdf, > AsyncHdfs20160510.pdf, HDFS-9924-POC.patch > > > This is an umbrella JIRA for supporting Nonblocking HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support nonblocking calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3
stack created HDFS-13572: Summary: [umbrella] Non-blocking HDFS Access for H3 Key: HDFS-13572 URL: https://issues.apache.org/jira/browse/HDFS-13572 Project: Hadoop HDFS Issue Type: New Feature Components: fs async Affects Versions: 3.0.0 Reporter: stack An umbrella JIRA for supporting non-blocking HDFS access in h3. This issue has provenance in the stalled HDFS-9924 but would like to vault over what was going on over there, in particular, focus on an async API for hadoop3+ unencumbered by worries about how to make it work in hadoop2. Let me post a WIP design. Would love input/feedback (We make mention of the HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was thinking of cutting a feature branch if all good after a bit of chat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13565) [um
stack created HDFS-13565: Summary: [um Key: HDFS-13565 URL: https://issues.apache.org/jira/browse/HDFS-13565 Project: Hadoop HDFS Issue Type: New Feature Reporter: stack -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240584#comment-16240584 ] stack commented on HDFS-7240: - The posted document needs author, date, and ref to this issue. Can it be made a google doc so can comment inline rather than here? I skipped to the end, "So why put the Ozone in HDFS and not keep it a separate project". There is no argument here on why Ozone needs to be part of Apache Hadoop. As per [~shv] above, Ozone as separate project does not preclude its being brought in instead as a dependency nor does it dictate the shape of deploy (Bullet #3 is an aspiration, not an argument). > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, > HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, > HDFS-7240.004.patch, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238181#comment-16238181 ] stack commented on HDFS-12711: -- bq. For now, though, I'm sort of tired at looking at this problem and will go work on something else for a while. Thanks for putting Hadoop in a box. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238056#comment-16238056 ] stack commented on HDFS-12711: -- This is excellent work. Would a kill -QUIT before you do actual kill of the errant processes be of use? It'd do a dump of stack trace before process goes away (processes might not be connected to stdout/stderr anymore?). Thanks. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219178#comment-16219178 ] stack commented on HDFS-12711: -- Rah rah [~aw]! Thanks for digging in. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Allen Wittenauer > Attachments: HDFS-12711.branch-2.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11644) DFSStripedOutputStream should not implement Syncable
[ https://issues.apache.org/jira/browse/HDFS-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968113#comment-15968113 ] stack commented on HDFS-11644: -- bq. "The current behavior of throwing an exception is safer." ... but changes precedent? Semantics in general are messy here around sync, et al. It is a reflection of he torturous journey taken by sync/flush/hflush/hsync in HDFS. The blessed [~ste...@apache.org] tried writing a spec for DFS and got far on the read-side. Helps. Write-side is to do. I like Steves' comment that rather than "probe for interface, cast, query, maintain.." at each point at which we encounter a feature, rather, there'd be an upfront query that could be run before engaging w/ the fs implementation (though how does this work if tiering changes the underlying storage on us at runtime?). Meantime, having DFSStripedOutputStream throw an exception breaking all that run on top (with no means of querying whether support or not) seems disruptive. > DFSStripedOutputStream should not implement Syncable > > > Key: HDFS-11644 > URL: https://issues.apache.org/jira/browse/HDFS-11644 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-must-do > > FSDataOutputStream#hsync checks if a stream implements Syncable, and if so, > calls hsync. Otherwise, it just calls flush. This is used, for instance, by > YARN's FileSystemTimelineWriter. > DFSStripedOutputStream extends DFSOutputStream, which implements Syncable. > However, DFSStripedOS throws a runtime exception when the Syncable methods > are called. > We should refactor the inheritance structure so DFSStripedOS does not > implement Syncable. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11170) Add create API in filesystem public class to support assign parameter through builder
[ https://issues.apache.org/jira/browse/HDFS-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927487#comment-15927487 ] stack commented on HDFS-11170: -- I took a look at 03 * CreateBuilder.java needs license. * I was going to suggest that CreateBuilder is too generic a name but it seems like we are misusing builder and that is why the confusion. For example: 1427DistributedFileSystemCreateBuilder builder = 1428fs.newCreateBuilder(testFilePath).build(); 1429FSDataOutputStream out = fs.create(builder); i.e. I build a 'create' builder to pass to a create function that then 'builds' the wanted object. I'd think that when I called build on the builder, that I'd get back a FSDataOutputStream -- not a 'Builder' (this is what [~xiaobingo] says above now I've read those comments). > Add create API in filesystem public class to support assign parameter through > builder > - > > Key: HDFS-11170 > URL: https://issues.apache.org/jira/browse/HDFS-11170 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: SammiChen >Assignee: Wei Zhou > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-11170-00.patch, HDFS-11170-01.patch, > HDFS-11170-02.patch, HDFS-11170-03.patch > > > FileSystem class supports multiple create functions to help user create file. > Some create functions has many parameters, it's hard for user to exactly > remember these parameters and their orders. This task is to add builder > based create functions to help user more easily create file. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6450) Support non-positional hedged reads in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902444#comment-15902444 ] stack commented on HDFS-6450: - [~elgoiri] Ignore my comment above. I thought this the resolved positional hedged read issue. My bad. > Support non-positional hedged reads in HDFS > --- > > Key: HDFS-6450 > URL: https://issues.apache.org/jira/browse/HDFS-6450 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Colin P. McCabe >Assignee: Liang Xie > Attachments: HDFS-6450-like-pread.txt > > > HDFS-5776 added support for hedged positional reads. We should also support > hedged non-position reads (aka regular reads). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6450) Support non-positional hedged reads in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901691#comment-15901691 ] stack commented on HDFS-6450: - Open a new issue I'd say [~elgoiri]. Link it back here. > Support non-positional hedged reads in HDFS > --- > > Key: HDFS-6450 > URL: https://issues.apache.org/jira/browse/HDFS-6450 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.4.0 >Reporter: Colin P. McCabe >Assignee: Liang Xie > Attachments: HDFS-6450-like-pread.txt > > > HDFS-5776 added support for hedged positional reads. We should also support > hedged non-position reads (aka regular reads). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851725#comment-15851725 ] stack commented on HDFS-9924: - [~Apache9] Good luck (I didn't fully grok #3. It would be coolio if you could interpolate an async access by implementing pb Service async interface using its callback and controller). > [umbrella] Nonblocking HDFS Access > -- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf, Async-HDFS-Performance-Report.pdf > > > This is an umbrella JIRA for supporting Nonblocking HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support nonblocking calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839393#comment-15839393 ] stack commented on HDFS-9924: - bq. Add a port unification service in front of the grpc server and the old rpc server to support both grpc client and old client. When you say port unification service, what are you thinking? It'd be in-process listening on the DN port reading a few bytes to figure which RPC? Reading https://www.cockroachlabs.com/blog/a-tale-of-two-ports/ would advocate listening on a new port altogether; an option 5 which is probably too much to ask. We should probably perf test grpc (going by the citation). Thanks [~Apache9] > [umbrella] Nonblocking HDFS Access > -- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf, Async-HDFS-Performance-Report.pdf > > > This is an umbrella JIRA for supporting Nonblocking HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support nonblocking calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-11368) LocalFS does not allow setting storage policy so spew running in local mode
stack created HDFS-11368: Summary: LocalFS does not allow setting storage policy so spew running in local mode Key: HDFS-11368 URL: https://issues.apache.org/jira/browse/HDFS-11368 Project: Hadoop HDFS Issue Type: Bug Reporter: stack Assignee: stack Priority: Minor commit f92a14ade635e4b081f3938620979b5864ac261f Author: Yu Li Date: Mon Jan 9 09:52:58 2017 +0800 HBASE-14061 Support CF-level Storage Policy ...added setting storage policy which is nice. Being able to set storage policy came in in hdfs 2.6.0 (HDFS-6584 Support Archival Storage) but you can only do this for DFS, not for local FS. Upshot is that starting up hbase in standalone mode, which uses localfs, you get this exception every time: {code} 2017-01-25 12:26:53,400 WARN [StoreOpener-93375c645ef2e649620b5d8ed9375985-1] fs.HFileSystem: Failed to set storage policy of [file:/var/folders/d8/8lyxycpd129d4fj7lb684dwhgp/T/hbase-stack/hbase/data/hbase/namespace/93375c645ef2e649620b5d8ed9375985/info] to [HOT] java.lang.UnsupportedOperationException: Cannot find specified method setStoragePolicy at org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:209) at org.apache.hadoop.hbase.fs.HFileSystem.setStoragePolicy(HFileSystem.java:161) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:207) at org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:198) at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:237) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5265) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:988) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:985) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocalFileSystem.setStoragePolicy(org.apache.hadoop.fs.Path, java.lang.String) at java.lang.Class.getMethod(Class.java:1786) at org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:205) ... {code} It is distracting at the least. Let me fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed
[ https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836371#comment-15836371 ] stack commented on HDFS-11303: -- Hey [~alicezhangchen] You got an email when Andrew made state changes to this issue. He set its state to 'patch available' which triggered a run of the CI system. It looks like one test failed. Do you think it related (let me trigger a rerun... some tests are flakey and fail on occasion irregardless of what the attached patch does). Andrew also flagged this JIRA as affecting 3.0.0-alpha1 which probably make it fit for commit as fix for next hadoop3 release. Let me trigger another run and see how the patch does. I'll leave this issue open another few days in the hope that someone else will chime in with a review. Will commit whether-or-which in a day or two. Thanks for the patch. > Hedged read might hang infinitely if read data from all DN failed > -- > > Key: HDFS-11303 > URL: https://issues.apache.org/jira/browse/HDFS-11303 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0-alpha1 >Reporter: Chen Zhang >Assignee: Chen Zhang > Attachments: HDFS-11303-001.patch > > > Hedged read will read from a DN first, if timeout, then read other DNs > simultaneously. > If read all DN failed, this bug will cause the future-list not empty(the > first timeout request left in list), and hang in the loop infinitely -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed
[ https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-11303: - Attachment: HDFS-11303-001.patch Retry > Hedged read might hang infinitely if read data from all DN failed > -- > > Key: HDFS-11303 > URL: https://issues.apache.org/jira/browse/HDFS-11303 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0-alpha1 >Reporter: Chen Zhang >Assignee: Chen Zhang > Attachments: HDFS-11303-001.patch, HDFS-11303-001.patch > > > Hedged read will read from a DN first, if timeout, then read other DNs > simultaneously. > If read all DN failed, this bug will cause the future-list not empty(the > first timeout request left in list), and hang in the loop infinitely -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed
[ https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812572#comment-15812572 ] stack commented on HDFS-11303: -- Patch LGTM. Your patch allows that the primary read might still complete before the new hedged reads whereas what was there previous would discard anything that came in after timeout. Good. The test is just to verify we time out? W/o your fix, the test hangs? > Hedged read might hang infinitely if read data from all DN failed > -- > > Key: HDFS-11303 > URL: https://issues.apache.org/jira/browse/HDFS-11303 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.0.0-alpha1 >Reporter: Chen Zhang > Attachments: HDFS-11303-001.patch > > > Hedged read will read from a DN first, if timeout, then read other DNs > simultaneously. > If read all DN failed, this bug will cause the future-list not empty(the > first timeout request left in list), and hang in the loop infinitely -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java
[ https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531228#comment-15531228 ] stack commented on HDFS-10690: -- Skimmed. Patch LGTM. Unfortunate we leave behind some perf but agree on avoiding custom data structure unless large benefit. Nice work. > Optimize insertion/removal of replica in ShortCircuitCache.java > --- > > Key: HDFS-10690 > URL: https://issues.apache.org/jira/browse/HDFS-10690 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.0.0-alpha2 >Reporter: Fenghua Hu >Assignee: Fenghua Hu > Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, > HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, > HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > Currently in ShortCircuitCache, two TreeMap objects are used to track the > cached replicas. > private final TreeMap evictable = new TreeMap<>(); > private final TreeMap evictableMmapped = new > TreeMap<>(); > TreeMap employs Red-Black tree for sorting. This isn't an issue when using > traditional HDD. But when using high-performance SSD/PCIe Flash, the cost > inserting/removing an entry becomes considerable. > To mitigate it, we designed a new list-based for replica tracking. > The list is a double-linked FIFO. FIFO is time-based, thus insertion is a > very low cost operation. On the other hand, list is not lookup-friendly. To > address this issue, we introduce two references into ShortCircuitReplica > object. > ShortCircuitReplica next = null; > ShortCircuitReplica prev = null; > In this way, lookup is not needed when removing a replica from the list. We > only need to modify its predecessor's and successor's references in the lists. > Our tests showed up to 15-50% performance improvement when using PCIe flash > as storage media. > The original patch is against 2.6.4, now I am porting to Hadoop trunk, and > patch will be posted soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336228#comment-15336228 ] stack commented on HDFS-9924: - [~steve_l] Thanks for the context. Need to make sure this usecase and its variants are talked up loudly over in HADOOP-12910 > [umbrella] Nonblocking HDFS Access > -- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: Async-HDFS-Performance-Report.pdf, AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Nonblocking HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support nonblocking calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335420#comment-15335420 ] stack commented on HDFS-9924: - [~xiaobingo] Thanks for posting the compare. It helps. It looks like the difference between 'async' and thread pool is negligible; 10% at the extreme. Is that how you interpret the results? As per [~andrew.wang], would be interested in what happens when less threads (especially as NN is set up with 300 handlers...); tendency seems to be the less threads you use, the better it does. Thanks for the report. > [umbrella] Nonblocking HDFS Access > -- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: Async-HDFS-Performance-Report.pdf, AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Nonblocking HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support nonblocking calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333922#comment-15333922 ] stack commented on HDFS-9924: - +1 for a branch. Otherwise we have this mess going on late in a mature branch where folks are piecemealing APIs and renaming stuff on the fly because there is no consensus. > [umbrella] Nonblocking HDFS Access > -- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Nonblocking HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support nonblocking calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332670#comment-15332670 ] stack commented on HDFS-9924: - bq. Quoting Tsz Wo Nicholas Sze words, I understand your concern but it is a different problem. We should not protect NN by making the client slow. We should add protection in NN instead The above quote is magical-thinking (see the response to the above quote given by Daryn, an operator of one of our largest deploys). We are talking branch-2 here for this Future hack. The NN is not going to sprout scale of a sudden in the branch-2 line to support 'thousands' of concurrent ops coming in from an adjacent, Hive metadata server blame-shifting. Some form of parsimony, concern for NN loading, is in order. Rereading this issue from the top down (including the design doc -- it needs numbers... what is a large number of calls?; why wouldn't a thread pool work given you need to throttle) and seeing where we have arrived, this issue is not about 'Asynchronous HDFS Access' as the summary and original description advertises but instead is an expedient hack-for-hive, for late in branch-2 only. The 'change' will have a short shelf-life it seems given it arrives in 2.9.0+ (?) and branch-3 is looking to be a different API (See discussion on HADOOP-12910). The two distinct positions I discern in the discussion so far -- those who want a true async API on HDFS and those working on a hive fix -- are having trouble finding a common ground. If this characterization is correct, I'd suggest lets just call this issue a hack-for-hive explicitly and annotate it as such. A good few of the participants in this issue are likely not much interested in the latter (e.g. myself) as long as this work does not get in the way of our having a 'real' async API (HADOOP-12910) or confuse downstreamers on what the async story on HDFS is. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330867#comment-15330867 ] stack commented on HDFS-9924: - I see. Thank you. I see what you want now. You just need renames or you need more than rename? You want to do thousands of concurrent renames this way? Is that even going to work? Are you going to knock over the NN? Or, aren't you just have a bunch of outstanding calls blocked on remote NN locks? Won't you want to constrict how many ongoing calls there are? > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330173#comment-15330173 ] stack commented on HDFS-9924: - bq. There are multiple comments from both sides indicating that CompletableFuture is the ideal option for 3.x. [~arpiagariu] Please leave off concluding a discussion that is still ongoing (CF is not 'ideal' and is not a given). It doesn't help sir. bq. You mean just like we recently added 'avoid local nodes' because another downstream component wanted to try it? You misrepresent, again. HBase ran for years with a workaround while waiting on the behavior to show up in HDFS; i.e. the hbase project did not have an 'interest' in 'avoid local nodes'; they required this behavior of the filesystem and ran with a suboptimal hack until it showed up. In this case all we have is 'interest' and requests for technical justification go unanswered. bq. The Hive engineers think they can make it work for them and there was a compromise proposed to introduce the API as unstable. I'm interested in how Hive will do async w/ only a Future and in how this suboptimal API in particular will solve their issue (is it described anywhere?). In my experience, a bunch of rigging (threads) for polling, rather than notification, is required when all you have is a Future to work with. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328974#comment-15328974 ] stack commented on HDFS-9924: - Your summary and characterization of where the discussion is at is not correct [~arpit99]. The discussion is ongoing still (CompletableFuture is a significant undertaking, ListenableFuture copied local or something like is a possible candidate, etc.) bq. Since some downstream developers have expressed an interest in trying out a 2.x Future-based API even if it's tagged as Unstable/Experimental, is there a compelling reason to deny it? I'd hope that it takes more than 'interest' to get code committed to HDFS. bq. If Future turns to be of no use to anyone we can evolve the API in a later 2.x release or just revert it completely while the way forward (3.x) remains unaffected. If a technical argument on why Future will fix a codebases's scaling problem can't be produced, we can just skip the above evolutions and reverts altogether. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328290#comment-15328290 ] stack commented on HDFS-9924: - [~ashutoshc] Can you make a bit of a better argument than citing the mighty [~steve_l] please? Dealing with a mess of returned Futures will also complicate Hive codebase, no? Can you explain why an half-an-async HDFS API would be easier for you to deal with? Thanks. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322864#comment-15322864 ] stack commented on HDFS-9924: - bq. This JIRA and the current implementation originally aim to support a basic async access to HDFS without callback support and without chaining support. The JIRA is about HDFS async access. There is no exception in the summary nor description to rule out the basic async callback primitive. You could rule it out via fiat -- can you even call it an 'async' API if it doesn't do callback? -- but why not do it right from the get-go. Do it once only too. bq. When we change to return XxxFuture in the future, it is a backward compatible change. You and [~jnp] have said this a few times but for downstreamers, a Future-only API is not worth engaging with. It means each of us has to build parking structures to keep the unfinished the Futures in, polling to look for completions to react too. This is a performance-killer. Been there. Done that. I like the [~mingma] summary/suggestion with the [~andrew.wang] caveat; revert and dev in a feature branch against trunk. I know of a few downstreamers that are interested, myself included, and would be up for helping out. Thanks. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317717#comment-15317717 ] stack commented on HDFS-9924: - Ugh. I meant to say, we risk having different 'async' API/implementations if we piecemeal the implementation ahead of our figuring a general approach for async'ing the Filesystem. What is committed currently is inadequate according to the discussion so far missing callback. Retrofitting callback on Future, as I understand it, will require a different implementation; therefore the commits are premature. Revert in the meantime seems like the right thing to do. bq. I'm much more worried about API correctness Waiting a while to actually let more folks play with it before pushing it into a release (including the 3.x release that we're working to cut from trunk) just seems like an obvious, common sense thing to do. Above makes sense to me. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access
[ https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316850#comment-15316850 ] stack commented on HDFS-9924: - I'd suggest we work out a coherent, global filesystem async API/strategy before we start committing implementations (piecemeal) otherwise we will frustrate our users. > [umbrella] Asynchronous HDFS Access > --- > > Key: HDFS-9924 > URL: https://issues.apache.org/jira/browse/HDFS-9924 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Tsz Wo Nicholas Sze >Assignee: Xiaobing Zhou > Attachments: AsyncHdfs20160510.pdf > > > This is an umbrella JIRA for supporting Asynchronous HDFS Access. > Currently, all the API methods are blocking calls -- the caller is blocked > until the method returns. It is very slow if a client makes a large number > of independent calls in a single thread since each call has to wait until the > previous call is finished. It is inefficient if a client needs to create a > large number of threads to invoke the calls. > We propose adding a new API to support asynchronous calls, i.e. the caller is > not blocked. The methods in the new API immediately return a Java Future > object. The return value can be obtained by the usual Future.get() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302344#comment-15302344 ] stack commented on HDFS-7240: - bq. It is unfair to say that you are being rebuffed. Can we please move to discussion of the design. Back and forth on what is 'fair', 'tone', and how folks got commit bits is corrosive and derails what is important here; i.e. landing this big one. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302191#comment-15302191 ] stack commented on HDFS-7240: - bq. Now, can people stop being territorial or making any form of criticism of each other. It is fundamentally against the ASF philosophy of collaborative, community development, doesn't help long term collaboration and makes the entire project look bad. Thanks. Amen. Thanks for posting design [~anu] bq. Datanodes provide a shared generic storage service called the container layer . Is this HDFS Datanode? We'd add block manager functionality to the Datanode? (Did we answer the [~zhz] question, "How about "why an object store as part of HDFS"?) Thanks > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264590#comment-15264590 ] stack commented on HDFS-3702: - Any chance of getting this on 2.8 branch? Thanks. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Fix For: 3.0.0, 2.9.0 > > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239291#comment-15239291 ] stack commented on HDFS-3702: - bq. Suppose we find that the CreateFlag.NO_LOCAL_WRITE is bad. How do we remove it, i.e. what is the procedure to remove it? I believe we cannot simply remove it since it probably will break HBASE compilation. Just remove it. HBase has loads of practice dealing with stuff being moved/removed and changed under it by HDFS. You could also just leave the flag in place since there is no obligation that any filesystem respect the flag. It is a suggestion only (See http://linux.die.net/man/2/open / create for the long, interesting set of flags it has) bq. Another possible case: suppose that we find the disfavorNodes feature is very useful later on. How do we add it? Same way you'd add any feature.. and HBase would look for it the way it does now peeking for presence of extra facility with if/else hdfs, reflection, try/catches of nosuchmethod, etc. We have lots of practice doing this also. We'd keep using the NO_LOCAL_WRITE flag though, unless it purged, since it does what we want. As I understand it, disfavoredNodes would require a lot more work of hbase to get the same functionality as NO_LOCAL_WRITE provides. bq. It seems that the "whatever proofing" is to let the community try the features for a period of time. Then, we may add it to the FileSystem API. Sorry. 'whatever proofing' is overly expansive. We are just adding a flag. I just meant, if the tests added here are not sufficient or you want some other proof it works, pre-commit, just say. No problem. Also, the community has been running with this 'feature' for years (See HBASE-6435) so no need of our taking the suggested disruptive 'indirection' just to add a filesystem 'hint' with attendant mess in HDFS -- extra params on create -- that cannot subsequently be removed. Thanks [~szetszwo] What do you think of our adding the attributes LimitedPrivate and Evolving to the flag. Would that be indicator enough for you? > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237558#comment-15237558 ] stack commented on HDFS-3702: - bq. I suggest HBase should do the same way of how it today is using favoredNodes. Thanks [~szetszwo] for the response, but you did not answer the question. The question was if you thought the process of first staging a 'hidden' API is a fair burden to put on your favorite downstream project (not to mention the mess it makes inside HDFS -- see note above this one for graphic detail). Lets back up. I think it will help us make some progress here again. You say: bq. So let add this flag later so that it allows us to test the feature and see if it is good enough or we may actually need disfavoredNodes. Sound good? No. It does not sound good. There is no need to stage a feature as hidden first, one that is reasonable (see above discussion with the opinion of many), and has an immediate need/user. If any concern that the feature is lacking or does not work as advertised, lets do whatever proofing of the feature is needed here as part of this issue and just get it done. If the bundled tests are unsatisfactory or if you'd like me to try and report result of running this facility at scale, just say... no problem. If the implementation has a bug, lets fix in a follow-up. As we would do any other feature in HDFS. On your concern that a new 'hint' to the create method exposes new API, an API that by definition does not put a burden on any FS implementation that they need implement the suggested operation -- i.e. the amount of API 'surface' is miniscule -- it has been suggested above that we flag it @InterfaceAudience.LimitedPrivate(HBase) for a probationary period. How about we also add @InterfaceStability.Evolving on the flag so it can be yanked anytime if for some unforeseen reason, it a total mistake. Would this assuage your exposure concern [~szetszwo]? Thanks for your time. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233363#comment-15233363 ] stack commented on HDFS-3702: - bq. I suggest adding a new create(..) method to DistributedFileSystem, either with a new boolean or with the AddBlockFlag, in this JIRA so that the community can try out the feature. We may add the CreateFlag.NO_LOCAL_WRITE once the feature has been stabilized and we has decided that it is the right API. Tell me more how this will process would work please [~szetszwo]? IIUC, a downstream project, say HBase which already has an awful hack in place to try and simulate a poor-man's version of this feature, would via reflection, look first for the presence of this new create override IFF the implementation is HDFS (don't look if LocalFS or S3, etc.)? If HDFS and if present, we'd drop our hack and use the new method (via reflection). Later, after it is 'proven' that a feature, one that hbase has wanted for years now, has 'merit', we would then add a new path w/ more reflection (IFF the FS implementation is HDFS) that would use the NO_LOCAL_WRITE when it becomes available? (Would we remove the create override when the NO_LOCAL_WRITE FS hint gets added?) Are you suggesting that downstream projects do this? Regards favorednodes, thats an unfinished topic and of a different character to what is being suggested here as it added overrides rather than a 'hint' flag as this patch does here. Thanks [~szetszwo] > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231686#comment-15231686 ] stack commented on HDFS-3702: - bq. So let add this flag later so that it allows us to test the feature and see if it is good enough or we may actually need disfavoredNodes. Sound good? [~szetszwo] Isn't CreateFlag.NO_LOCAL_WRITE how this facility gets exposed to clients? If it is not present, how does the feature get exercised at all? Thanks. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, > HDFS-3702.011.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209177#comment-15209177 ] stack commented on HDFS-3702: - I skimmed #9 patch. Seems good to me other than issues [~szetszwo] raises (we are using the AddBlockFlag rather than the client flag... and I think AddBlockFlag should be in hdfs as he suggests given your remark above on difference between client-facing flag and hdfs flag. Thanks. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209164#comment-15209164 ] stack commented on HDFS-3702: - bq. stack, let's add a boolean noLoaclWrite to DistributedFileSystem or just reuse the new AddBlockFlag there. On adding a flag to DFS, taking a look, it would be 'odd' given what is there currently, and adding a public method to set a hint for a particular operation only would be tough to explain to the reader of the API ("Why flag here... when create takes flags already..."). Then there is the fact that the user has to do {code}if HDFS, then{code} and if we are on GPFS, an FS supported by one of our committers, then it is {code} if HDFS || GPFS{code} and so on. I think you also mean 'and' in the above rather than 'or'. AddBlockFlag is internal to HDFS and marked Private so...not useable by clients maybe you are talking of how it will be implemented. I'm not sure what you are suggesting here. Pardon me. bq. You know, once it is in FileSystem, it is forever. I know that for the client to ask for a behavior that is not there presently, yes, FileSystem has to change. We are talking about a self-described advisory, not a required new operation of the underlying FS. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207327#comment-15207327 ] stack commented on HDFS-3702: - And more(getting an emotional) a downstreamer is hampered spending unnecessary i/o and cpu for years now and the patch is being blocked because we'd add an enum to public API! Help us out mighty [~szetszwo]! Thanks. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207307#comment-15207307 ] stack commented on HDFS-3702: - bq. I am very uncomfortable to add CreateFlag.NO_LOCAL_WRITE and AddBlockFlag since we cannot remove them once they are added to the public FileSystem API. The AddBlockFlag would have @InterfaceAudience.Private so it is not being added to the public API. The CreateFlag.NO_LOCAL_WRITE is an advisory enum. Something has to be available in the API for users like HBase to pull on. This seems to be most minimal intrusion possible. Being a hint by nature, it'd be undoable. Thanks for your consideration [~szetszwo] > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207267#comment-15207267 ] stack commented on HDFS-3702: - [~szetszwo] [~arpitagarwal] is -0 if bq. AddBlockFlag should be tagged as @InterfaceAudience.Private if we proceed with the .008 patch. ... and then what if CreateFlag.NO_LOCAL_WRITE was marked LimitedPrivate with HBase denoted as the consumer? Would that be sufficient accommodation of your concern? > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207034#comment-15207034 ] stack commented on HDFS-3702: - bq. If the region server has write permissions on /hbase/.logs, which I assume it does, it should be able to set policies on that directory. Makes sense [~arpitagarwal] Thanks. We can mess with this stuff when/if an accommodating block policy shows up. Meantime, you still -0 on this patch going in in meantime? [~szetszwo] You against commit still sir? @nkeywal reminds me of the price we are currently paying not being able to ask HDFS to avoid local replicas. Seems easy enough to revisit given the way this is implemented should favoredNodes stabilize, and then a subsequent disfavoredNodes facility. Thanks. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205361#comment-15205361 ] stack commented on HDFS-3702: - bq. ...could you comment on the usability of providing node lists to this API? Usually nodes and NN can agree on what they call machines but we've all seen plenty of clusters where this is not so. Both HDFS and HBase have their own means of insulating themselves against dodgy named setups. These systems are not in alignment. bq. My impression was that tracking this in HBase was onerous, and is part of why favored nodes fell out of favor. No. It was never fully plumbed in HBase (it was plumbed into a balancer that no one used and would not swap into place because the default was featureful). Regards the FB experience, we need to get them to do us a post-mortem. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205343#comment-15205343 ] stack commented on HDFS-3702: - bq. Hi stack, the attribute could be set by an installer script or an API call at process startup [~arpitagarwal]Thanks. Yeah, vendors could ensure installers set the attribute. There are a significant set of installs where HBase shows up post-HDFS install and/or where HBase does not have sufficient permissions to set attributes on HDFS. I don't know the percentage. Would be just easier all around if it could be managed internally by HBase so no need to get scripts and/or operators involved. bq. ...so if you think HBase needs a solution now, ... Smile. The issue was opened in July 2012 so we not holding our breath (smile). Would be cool if we could ask HDFS to not write local. Anyone doing WAL-on-HDFS will appreciate this in HDFS. Thanks [~arpitagarwal] > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205337#comment-15205337 ] stack commented on HDFS-3702: - bq. How would DFSClient know which nodes are disfavored nodes? How could it enforce disfavored nodes? You postulated an application that wanted to 'distribute its files uniformly in a cluster.' I was just trying to suggest that users would prefer that HDFS would just do it for them. HDFS would know how to do it better being the arbiter of what is happening in the cluster. An application will do a poor job compared. 'distribute its files uniformly...' sounds like a good feature to implement with a block placement policy. bq. Since we already have favoredNodes, adding disfavoredNodes seems more natural than adding a flag. As noted above at 'stack added a comment - 12/Mar/16 15:20', favoredNodes is an unexercised feature that has actually been disavowed by the originators of the idea, FB, because it proved broken in practice. I'd suggest we not build more atop a feature-under-review as adding disfavoredNodes would (or at least until we hear of successful use of favoredNodes -- apparently our Y! are trying it). bq. In addition, the new FileSystem CreateFlag does not look clean to me since it is too specific to HDFS. How would other FileSystems such as LocalFileSystem implement it? The flag added by the attached patch is qualified throughout as a 'hint'. When set against LFS, it'll just be ignored. No harm done. The 'hint' didn't take. If we went your suggested route and added a disfavoredNodes route, things get a bit interesting when hbase, say, passes localhost. What'll happen? Does the user now have to check the FS implementation type before they select DFSClient method to call? I don't think you are objecting to the passing of flags on create, given this seems pretty standard fare in FSs. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client
[ https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205142#comment-15205142 ] stack commented on HDFS-3702: - For uniform distribution of files over a cluster, I think users would prefer that DFSClient managed it for them (a new flag on CreateFlag?) rather than do calculation figuring how to populate favoredNodes and disfavoredNodes using imperfect knowledge of the cluster, something the NN will always do better at. Unless you have other possible uses, disfavoredNodes seems like a more intrusive and roundabout route -- with its overrides, possible builders, and global interpretation of 'localhost' string -- to the clean flag this patch carries? What you think [~szetszwo]? Thanks Nicolas. > Add an option for NOT writing the blocks locally if there is a datanode on > the same box as the client > - > > Key: HDFS-3702 > URL: https://issues.apache.org/jira/browse/HDFS-3702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.5.1 >Reporter: Nicolas Liochon >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, > HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, > HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, > HDFS-3702.008.patch, HDFS-3702_Design.pdf > > > This is useful for Write-Ahead-Logs: these files are writen for recovery > only, and are not read when there are no failures. > Taking HBase as an example, these files will be read only if the process that > wrote them (the 'HBase regionserver') dies. This will likely come from a > hardware failure, hence the corresponding datanode will be dead as well. So > we're writing 3 replicas, but in reality only 2 of them are really useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)